<font size="+3" color='#053c96'><h2><center> Classically Punk</h2></center></font>
<figure>
<center><img src ="https://images.unsplash.com/photo-1487180144351-b8472da7d491?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=872&q=80" width = "750" height = '600' alt="Classical Punk"/>
<font size="0" color='#053c96'><h4><center> Photo Credit: Unsplash</h4></center></font>

<font size="+2" color='#053c96'><b> Contributor</b></font>  
<font size="+0" ><b> Umar Kabir</b></font>  

<a id='table-of-contents'></a>
[Table of Contents](#table-of-contents)

- [Introduction](#introduction)
  * [Overview](#overview)
  * [Problem Statement](#problem-statement)
  * [Objectives](#goals)
- [Importing Libraries](#importing-dependencies)
- [Data](#data)
- [Exploratory Data Analysis](#exploratory-data-analysis)
  * [Data Exploration](#data-exploration)
  * [Data Visualization](#data-visualization)
  * [Summary Statistics](#summary-statistics)
  * [Feature Correlation](#feature-correlation)
- [Data Preparation](#data-preparation)
  * [Data Cleaning](#data-cleaning)
  * [Feature Engineering](#feature-engineering)
  * [Data Transformation](#data-transformation)
- [Modeling](#modeling)
  * [Model Selection](#model-selection)
  * [Model Training](#model-training)
  * [Model Evaluation](#model-evaluation)
  * [Hyperparameter Tuning](#hyperparameter-tuning)
- [Results](#results)
  * [Analysis Results](#analysis-results)
  * [Model Performance](#model-performance)
  * [Feature Importance](#feature-importance)
  * [Implications](#implications)
- [Conclusion](#conclusion)
  * [Summary](#summary)
  * [Limitations](#limitations)
  * [Recommendations](#recommendations)
- [References](#references)

<a id='introduction'></a>
<font size="+2" color='#053c96'><b> 1. Introduction</b></font>  
[back to top](#table-of-contents)  

Classically Punk is a music genre classification project that combines different elements of music to develop a machine learning model that predicts music genre.  

In this project, we aim to use machine learning techniques to automatically classify different musical genres, from audio snippets. To achieve this, we will need to find a library that can read music files and extract features from them, such as tempo, pitch, and melody. We will then use these features to train a machine learning model that can classify different genres of music.  

The deliverables for this project include a presentation with slides on how we classified the music, as well as assumptions, implications, and other important information, and code that the DevOps team can push to production.

<a id='overview'></a>
<font size="+1" color='#780404'><b> 1.1 Overview</b></font>  
[back to top](#table-of-contents)  

The project aims to develop a machine learning application that can automatically classify different musical genres from audio snippets. The main steps involved in the project include finding a library that can read music files and extract features from them, identifying relevant features for classification, and training a machine learning model to classify different genres of music. The project also involves handling large data sets and analyzing media files to generate data and identify patterns. 

<a id='problem-statement'></a>
<font size="+1" color='#780404'><b> 1.2 Problem Statement</b></font>  
[back to top](#table-of-contents)  

The problem statement for this project is the difficulty in manually classifying large collections of music into different genres. This process can be time-consuming and prone to errors, as it requires a deep understanding of the characteristics of each genre. Furthermore, as the amount of music available online continues to grow, it becomes increasingly challenging to keep up with the task of categorizing music by hand. The goal of this project is to develop a machine learning application that can automate the process of music classification, making it faster, more accurate, and scalable. The application will use features extracted from audio files to train a machine learning model that can classify different genres of music, including Classically Punk, without the need for human intervention.

<a id='goals'></a>
<font size="+1" color='#780404'><b> 1.3 Objectives</b></font>  
[back to top](#table-of-contents)  

1. To identify a suitable library for reading music files and extracting features from them.
2. To determine relevant features that can be used for music genre classification, such as tempo, pitch, and melody.
3. To preprocess the audio data, such as removing noise and converting it into a format suitable for machine learning algorithms.
4. To train a machine learning model on the audio data using a suitable algorithm, such as neural networks or decision trees.
5. To evaluate the performance of the machine learning model using appropriate metrics, such as accuracy, precision, and recall.
6. To optimize the model's performance by tuning hyperparameters and experimenting with different algorithms.
7. To develop a user-friendly interface for the application that allows users to upload audio files and receive genre classification results.
8. To present the results of the project, including the classification accuracy and the features that were most relevant for genre classification.
9. To deliver code that can be easily deployed by the DevOps team for use in a production environment.
10. - Speech activity detection: Speech activity detection is the task of identifying the segments of an audio signal that contain speech. This can be done by looking for changes in the energy of the signal, the zero crossing rate, or the pitch.
- Speaker identification: Speaker identification is the task of identifying the speaker of an audio signal. This can be done by looking for features that are unique to each speaker, such as the vocal tract shape or the way that they pronounce certain words.
- Music genre classification: Music genre classification is the task of classifying an audio signal into a particular genre of music. This can be done by looking for features that are commonly associated with different genres of music, such as the tempo, the pitch, or the rhythm.
- Sound event detection: Sound event detection is the task of identifying the different types of sounds that are present in an audio signal. This can be done by looking for changes in the energy of the signal, the spectral content of the signal, or the temporal structure of the signal.
- Automatic music transcription: Automatic music transcription is the task of converting an audio recording of music into a musical score. This can be done by looking for features that are associated with different musical notes, such as the pitch, the duration, and the timbre.

<a id='importing-dependencies'></a>
<font size="+2" color='#053c96'><b> 2. Importing Libraries</b></font>  
[back to top](#table-of-contents)

In [None]:
import sys
# Insert the parent path relative to this notebook so we can import from the src folder.
sys.path.insert(0, "..")

from src.dependencies import *

<a id='data'></a>
<font size="+2" color='#053c96'><b> 3. Data</b></font>  
[back to top](#table-of-contents)  

The dataset was used for the well-known paper in genre classification "Musical genre classification of audio signals" by G. Tzanetakis and P. Cook in IEEE Transactions on Audio and Speech Processing 2002.

In [None]:
genres = ['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'pop', 'reggae', 'rock']
data = []

for genre in genres:
    genre_dir = f'../data/genres/{genre}'
    for filename in os.listdir(genre_dir):
        if filename.endswith('.wav'):
            audio_path = os.path.join(genre_dir, filename)
            y, sr = librosa.load(audio_path)

            # Extracting the Spectral features for each file
            # MFCCs
            mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)

            # Spectral centroid
            spectral_centroid = librosa.feature.spectral_centroid(y=y, sr=sr)

            # Spectral bandwidth
            spectral_bandwidth = librosa.feature.spectral_bandwidth(y=y, sr=sr)

            # Spectral rolloff
            spectral_rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)

            # Spectral contrast
            spectral_contrast = librosa.feature.spectral_contrast(y=y, sr=sr)

            # Spectral flatness
            spectral_flatness = librosa.feature.spectral_flatness(y=y)

            # Harmonicity
            harmonicity = librosa.feature.chroma_cens(y=y, sr=sr)

            # Percussiveness
            percussiveness = librosa.feature.chroma_cqt(y=y, sr=sr)

            # Spectral flux
            spectral_flux = librosa.feature.chroma_stft(y=y, sr=sr)

            # Spectral slope
            spectral_slope = librosa.feature.chroma_vqt(y=y, sr=sr, intervals='equal')

            
            # Extracting the Temporal features for each file
            # Zero crossing rate
            zcr = librosa.feature.zero_crossing_rate(y=y)

            # Energy
            energy = librosa.feature.delta(data=y)

            # Tempo
            tempo = librosa.feature.tempo(y=y, sr=sr)

            # Pitch
            pitch = librosa.feature.poly_features(y=y, sr=sr)

            # Loudness
            loudness = librosa.feature.tonnetz(y=y)

            # Duration
            duration = librosa.get_duration(y=y)

            # Complexity
            complexity = librosa.feature.tempogram_ratio(y=y)


            # Timbre
            timbre = librosa.feature.tempogram(y=y)
            
            # Extracting the Energy features for each file

            

            # Spectral energy
            stft = librosa.stft(y=y)

            spectral_energy = np.abs(stft) ** 2

            # Temporal energy
            # Set parameters for windowing
            window_size = 1024  # Adjust as needed
            hop_size = 512     # Adjust as needed

            # Calculate the number of windows
            num_windows = (len(y) - window_size) // hop_size + 1

            # Initialize an array to store temporal energy values
            temporal_energy = np.zeros(num_windows)

            # Calculate temporal energy for each window
            for i in range(num_windows):
                window_start = i * hop_size
                window_end = window_start + window_size
                audio_window = y[window_start:window_end]
                
                # Calculate squared amplitude as temporal energy
                temporal_energy[i] = np.sum(audio_window ** 2)


            # Energy flux
            energy_flux = []
            for i in range(len(y) - window_size):
                window = y[i : i + window_size]
                energy_diff = np.diff(window)  # Calculate the difference between consecutive samples
                energy_flux.append(0.5 * np.sum(energy_diff ** 2))

            # Energy distribution
            energy_distribution = np.abs(stft) ** 2

            # Sum energy across time frames to get frequency-based energy distribution
            frequency_energy_distribution = np.sum(energy_distribution, axis=1)

            # Normalize energy distribution to get a probability distribution
            normalized_energy_distribution = frequency_energy_distribution / np.sum(frequency_energy_distribution)

            # Calculate energy entropy
            energy_entropy = -np.sum(normalized_energy_distribution * np.log2(normalized_energy_distribution + 1e-12))  # Avoid log(0)



            # Extracting the Chroma features for each file
            chroma = librosa.feature.chroma_stft(y=y, sr=sr)
            features = [genre, filename, mfccs, spectral_centroid, spectral_bandwidth, spectral_rolloff, spectral_contrast, spectral_flatness,
            harmonicity, percussiveness, spectral_flux, spectral_slope, zcr, energy, tempo, pitch, loudness, duration, complexity, timbre,
            spectral_energy, temporal_energy, energy_flux, energy_distribution, energy_entropy]

            # Adding all the features to a dataframe
            #features = [genre, filename, beats, sr, central_moments, zero_crossing_rate[0], rmse[0], tempo, spectral_contrast, spectral_rolloff, mfccs, chroma, spectral_centroid, spectral_bandwidth]
            data.append(features)
df = pd.DataFrame(data, columns=[
    "genre",
    "filename",
    "mfccs",
    "spectral_centroid",
    "spectral_bandwidth",
    "spectral_rolloff",
    "spectral_contrast",
    "spectral_flatness",
    "harmonicity",
    "percussiveness",
    "spectral_flux",
    "spectral_slope",
    "zcr",
    "energy",
    "tempo",
    "pitch",
    "loudness",
    "duration",
    "complexity",
    "timbre",
    "spectral_energy",
    "temporal_energy",
    "energy_flux",
    "energy_distribution",
    "energy_entropy",
])

#df = pd.DataFrame(data, columns=['Genre', 'Filename', 'Beats', 'SR', 'Central Moments', 'Zero Crossing Rate', 'RMSE', 'Tempo', 'Spectral Contrast', 'Spectral Roll-off', 'MFCC', 'Chroma', 'Spectral Centroid', 'Spectral Bandwidth'])


#### Spectral features extracted from the audio files:

- Mel-frequency cepstral coefficients (MFCCs): MFCCs are a popular set of spectral features that are often used in speech recognition and music classification tasks. MFCCs are derived from the Mel-scale spectrogram, which is a representation of the frequency content of the audio signal that is more perceptually relevant than the linear frequency spectrogram.
- Spectral centroid: The spectral centroid is the frequency at which the spectral energy is concentrated. The spectral centroid can be used to identify different instruments in a piece of music, or to track the pitch of a voice.
- Spectral bandwidth: The spectral bandwidth is the width of the frequency band that contains most of the spectral energy. The spectral bandwidth can be used to identify different types of sounds, such as vowels and consonants.
- Spectral rolloff: The spectral rolloff is the frequency below which a certain percentage of the spectral energy is concentrated. The spectral rolloff can be used to identify different types of sounds, such as male and female voices.
- Spectral contrast: Spectral contrast is a measure of the difference between the spectral energy at different frequencies. Spectral contrast can be used to identify different types of sounds, such as musical instruments and speech.
- Spectral flatness: Spectral flatness is a measure of the uniformity of the spectral energy. Spectral flatness can be used to identify different types of sounds, such as noise and speech.
- Harmonicity: Harmonicity is a measure of the ratio of the harmonic energy to the total energy in the audio signal. Harmonicity can be used to identify different types of sounds, such as musical instruments and speech.
- Percussiveness: Percussiveness is a measure of the energy in the audio signal that is concentrated at low frequencies. Percussiveness can be used to identify different types of sounds, such as drums and footsteps.
- Spectral flux: Spectral flux is a measure of the change in the spectral energy over time. Spectral flux can be used to identify different events in an audio recording, such as the start of a word or the end of a musical phrase.
- Spectral slope: Spectral slope is a measure of the rate of change of the spectral energy over time. Spectral slope can be used to identify different types of sounds, such as vowels and consonants.
- Spectral kurtosis: Spectral kurtosis is a measure of the peakedness of the spectral energy distribution. Spectral kurtosis can be used to identify different types of sounds, such as musical instruments and speech.

#### Temporal features extracted from the audio files:

- Zero crossing rate: The zero crossing rate is the number of times the audio signal crosses the zero axis per second. The zero crossing rate can be used to identify different types of sounds, such as percussive sounds and vowels.
- Energy: The energy of the audio signal is a measure of its overall loudness. The energy can be used to identify different types of sounds, such as speech and music.
- Tempo: The tempo is the speed of the audio signal. The tempo can be used to identify different types of music, such as classical music and dance music.
- Pitch: The pitch of the audio signal is the frequency at which the sound is oscillating. The pitch can be used to identify different types of sounds, such as male and female voices.
- Loudness: The loudness of the audio signal is a measure of its perceived intensity. The loudness can be used to identify different types of sounds, such as loud noises and quiet sounds.
- Duration: The duration of the audio signal is the length of time it takes for the sound to play. The duration can be used to identify different types of sounds, such as short sounds and long sounds.
- Complexity: The complexity of the audio signal is a measure of the number of different frequencies that are present in the sound. The complexity can be used to identify different types of sounds, such as simple sounds and complex sounds.
- Rhythm: The rhythm of the audio signal is the pattern of the sound's loudness and pitch over time. The rhythm can be used to identify different types of music, such as rock music and jazz music.
- Timbre: The timbre of the audio signal is the quality of the sound that distinguishes it from other sounds of the same pitch and loudness. The timbre can be used to identify different types of musical instruments, such as a violin and a piano.

#### Energy features extracted from the audio files:

- Root mean square energy (rmse): The root mean square energy is the square root of the mean of the squared signal values. The rmse is a measure of the overall loudness of the signal.
- Peak signal-to-noise ratio (PSNR): The peak signal-to-noise ratio is the ratio between the peak signal value and the noise level. The PSNR is a measure of the quality of the signal.
- Spectral energy: The spectral energy is the energy of the signal at each frequency. The spectral energy can be used to identify different types of sounds, such as speech and music.
- Temporal energy: The temporal energy is the energy of the signal over time. The temporal energy can be used to identify different types of sounds, such as percussive sounds and vowels.
- Energy flux: The energy flux is the rate of change of the energy of the signal over time. The energy flux can be used to identify different types of sounds, such as speech and music.
- Energy distribution: The energy distribution is the distribution of the energy of the signal over time. The energy distribution can be used to identify different types of sounds, such as speech and music.
- Energy entropy: The energy entropy is a measure of the randomness of the energy of the signal over time. The energy entropy can be used to identify different types of sounds, such as speech and music.
- Energy autocorrelation: The energy autocorrelation is a measure of the similarity of the energy of the signal at different time points. The energy autocorrelation can be used to identify different types of sounds, such as speech and music.
- Energy kurtosis: The energy kurtosis is a measure of the peakedness of the energy distribution. The energy kurtosis can be used to identify different types of sounds, such as speech and music.
- Energy skewness: The energy skewness is a measure of the asymmetry of the energy distribution. The energy skewness can be used to identify different types of sounds, such as speech and music.

In [None]:
df.to_csv('classical_punk.csv')

In [None]:
df = pd.read_csv('classical_punk.csv')

<a id='exploratory-data-analysis'></a>
<font size="+2" color='#053c96'><b> 4. Exploratory Data Anaysis</b></font>  
[back to top](#table-of-contents)

<a id='data-exploration'></a>
<font size="+1" color='#780404'><b> 4.1 Data Exploration</b></font>  
[back to top](#table-of-contents)

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.head()

In [None]:
df['Genre'].unique()

This function loads and plays an audio file of a specific genre and number using the librosa library. It takes two arguments, genre and num, which specify the genre of the audio and the number of the audio file within that genre, respectively.

In [None]:
play_audio('blues', '00024')

In [None]:
play_audio('classical', '00024')

In [None]:
play_audio('country', '00024')

In [None]:
play_audio('disco', '00024')

In [None]:
play_audio('hiphop', '00024')

In [None]:
play_audio('jazz', '00024')

In [None]:
play_audio('metal', '00024')

In [None]:
play_audio('pop', '00024')

In [None]:
play_audio('reggae', '00024')

In [None]:
play_audio('rock', '00024')

<a id='data-visualization'></a>
<font size="+1" color='#780404'><b> 4.2 Data Visualization</b></font>  
[back to top](#table-of-contents)

This code generates a frequency bar chart of the 'Tempo' column in a pandas DataFrame.

In [None]:
if isinstance(df, (pd.DatetimeIndex, pd.MultiIndex)):
    df = df.to_frame(index=False)
df = df.reset_index().drop('index', axis=1, errors='ignore')
df.columns = [str(c) for c in df.columns]
tempo_counts = df['Tempo'].value_counts().reset_index()
tempo_counts.columns = ['Tempo', 'Frequency']
tempo_counts = tempo_counts.sort_values(['Frequency', 'Tempo'], ascending=[False, True])
tempo_counts = tempo_counts[:100]
sns.set(style="whitegrid")
plt.figure(figsize=(10, 8))
ax = sns.barplot(x='Frequency', y='Tempo', data=tempo_counts)
ax.set(xlabel='Frequency', ylabel='Tempo', title='Tempo Value Counts')
plt.show()


In [None]:
def show_waveform(x, num):
    # Load WAV file
    wav_file = f'/Users/umarkabir/Documents/Qwasar/Classical Punk/genres/{x}/{x}.{num}.wav'
    y, sr = librosa.load(wav_file)
# Create x-axis values
    time = librosa.times_like(y, sr=sr)

    sns.set(style="whitegrid")
    plt.figure(figsize=(10, 8))
    sns.lineplot(x=time, y=y)
    plt.xlabel('Time (s)')
    plt.ylabel('Amplitude')
    plt.title(f'Sample waveform for {x}')
    plt.show()


This function loads a WAV file of a specific genre and number, and generates a sample waveform plot using the librosa and plotly libraries. It takes two arguments, x and num, which specify the genre of the audio and the number of the audio file within that genre, respectively.

In [None]:
show_waveform('blues', '00090')

In [None]:
show_waveform('classical', '00090')

In [None]:
show_waveform('country', '00090')

In [None]:
show_waveform('disco', '00090')

In [None]:
show_waveform('hiphop', '00090')

In [None]:
show_waveform('jazz', '00090')

In [None]:
show_waveform('metal', '00090')

In [None]:
show_waveform('pop', '00090')

In [None]:
show_waveform('reggae', '00090')

In [None]:
show_waveform('rock', '00090')

In [None]:
def show_spectogram(x, num):
    # Load audio file
    audio_path = f'/Users/umarkabir/Documents/Qwasar/Classical Punk/genres/{x}/{x}.{num}.wav'
    y, sr = librosa.load(audio_path)

    # Calculate spectrogram
    S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128, fmax=8000)

    # Convert to decibels
    S_dB = librosa.power_to_db(S, ref=np.max)

    # Create figure
    plt.figure(figsize=(10, 4))
    ax = sns.heatmap(S_dB, cmap='viridis')

    # Set x and y axis labels
    ax.set_xlabel('Time')
    ax.set_ylabel('Frequency (Hz)')

    # Set figure title
    ax.set_title(f'Sample spectrogram for {x}')

    # Show figure
    plt.show()

This function loads a WAV file of a specific genre and number, and generates a sample spectrogram plot using the librosa and plotly libraries. It takes two arguments, x and num, which specify the genre of the audio and the number of the audio file within that genre, respectively.

In [None]:
show_spectogram('blues', '00090')

In [None]:
show_spectogram('classical', '00090')

In [None]:
show_spectogram('country', '00090')

In [None]:
show_spectogram('disco', '00090')

In [None]:
show_spectogram('hiphop', '00090')

In [None]:
show_spectogram('jazz', '00090')

In [None]:
show_spectogram('metal', '00090')

In [None]:
show_spectogram('pop', '00090')

In [None]:
show_spectogram('reggae', '00090')

In [None]:
show_spectogram('rock', '00090')

In [None]:
import seaborn as sns

def show_sr(x, num):
    # Load audio file
    audio_path = f'/Users/umarkabir/Documents/Qwasar/Classical Punk/genres/{x}/{x}.{num}.wav'
    y, sr = librosa.load(audio_path)

    # Compute spectral rolloff
    spectral_rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)[0]

    # Create plot
    sns.set(style="whitegrid")
    fig, ax = plt.subplots(figsize=(10, 6))
    ax.plot(spectral_rolloff, color='blue')
    ax.set(title=f'Sample spectral rolloff for {x}', xlabel='Frame', ylabel='Frequency (Hz)')
    plt.show()


This function show_sr(x, num) loads an audio file and computes the spectral rolloff. It then creates a Plotly line plot of the spectral rolloff values with the x-axis representing the frame and the y-axis representing frequency in Hz. The title of the plot is set to "Sample spectral rolloff for x", where x is the name of the audio file.

In [None]:
show_sr('blues', '00090')

In [None]:
show_sr('classical', '00090')

In [None]:
show_sr('country', '00090')

In [None]:
show_sr('disco', '00090')

In [None]:
show_sr('hiphop', '00090')

In [None]:
show_sr('jazz', '00090')

In [None]:
show_sr('metal', '00090')

In [None]:
show_sr('pop', '00090')

In [None]:
show_sr('reggae', '00090')

In [None]:
show_sr('rock', '00090')

In [None]:
def show_chroma(x, num):
    # Load audio file
    audio_path = f'/Users/umarkabir/Documents/Qwasar/Classical Punk/genres/{x}/{x}.{num}.wav'
    y, sr = librosa.load(audio_path)

    # Compute chroma feature
    chroma = librosa.feature.chroma_stft(y=y, sr=sr)

    # Create time axis in seconds
    time = librosa.frames_to_time(np.arange(chroma.shape[1]), sr=sr)

    # Create chroma note names
    chroma_note_names = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']

    # Create dataframe
    df = pd.DataFrame(chroma, index=chroma_note_names, columns=time)

    # Plot heatmap using seaborn
    plt.figure(figsize=(10, 8))
    sns.heatmap(df, cmap='viridis', xticklabels=50, yticklabels=1)
    plt.title(f'Sample chroma feature for {x}')
    plt.xlabel('Time (s)')
    plt.ylabel('Chroma Note')
    plt.show()

This function takes in two arguments x and num representing the music genre and the song number, respectively. It then loads the corresponding audio file and computes the chroma feature using librosa's chroma_stft function. It creates a time axis in seconds using frames_to_time function, and chroma note names as a list. It then creates a heatmap trace using go.Heatmap with time as the x-axis, chroma_note_names as the y-axis, and chroma as the z-axis. Finally, it sets the layout with appropriate x and y axis titles, and a title for the figure. It shows the resulting figure using fig.show().

In [None]:
show_chroma('blues', '00090')

In [None]:
show_chroma('classical', '00090')

In [None]:
show_chroma('country', '00090')

In [None]:
show_chroma('disco', '00090')

In [None]:
show_chroma('hiphop', '00090')

In [None]:
show_chroma('jazz', '00090')

In [None]:
show_chroma('metal', '00090')

In [None]:
show_chroma('pop', '00090')

In [None]:
show_chroma('reggae', '00090')

In [None]:
show_chroma('rock', '00090')

In [None]:
def show_zcr(x, num):
    # Load audio file
    audio_path = f'/Users/umarkabir/Documents/Qwasar/Classical Punk/genres/{x}/{x}.{num}.wav'
    y, sr = librosa.load(audio_path)

    # Compute zero crossing rate
    zcr = librosa.feature.zero_crossing_rate(y)

    # Plot with Seaborn
    fig, ax = plt.subplots(figsize=(10,6))
    sns.lineplot(data=zcr[0], ax=ax)
    ax.set(title=f'Zero Crossing Rate for {x} Genre', xlabel='Time (s)', ylabel='ZCR')
    plt.show()


The show_zcr function takes in two arguments: x, which represents the genre of the music file, and num, which represents the number of the music file. It calculates the zero-crossing rate of the audio file and plots it using Plotly. 

In [None]:
show_zcr('blues', '00090')

In [None]:
show_zcr('classical', '00090')

In [None]:
show_zcr('country', '00090')

In [None]:
show_zcr('disco', '00090')

In [None]:
show_zcr('hiphop', '00090')

In [None]:
show_zcr('jazz', '00090')

In [None]:
show_zcr('pop', '00090')

In [None]:
show_zcr('reggae', '00090')

In [None]:
show_zcr('rock', '00090')

<a id='summary-statistics'></a>
<font size="+1" color='#780404'><b> 4.3 Summary Statistics</b></font>  
[back to top](#table-of-contents)

In [None]:
df.describe(include='all')

In [None]:
def skew_kurt(data, col):
    # Calculate skewness and kurtosis of Income column
    _skewness = skew(data[col])
    _kurtosis = kurtosis(data[col])

    # Create histogram of Income column with mean, median, and mode
    sns.histplot(data=data, x=col, kde=True)
    plt.axvline(data[col].mean(), color='r', linestyle='--', label='Mean')
    plt.axvline(data[col].median(), color='g', linestyle='--', label='Median')
    plt.axvline(data[col].mode()[0], color='b', linestyle='--', label='Mode')
    plt.legend()

    # Add text annotation for skewness and kurtosis values
    plt.annotate('Skewness: {:.2f}'.format(_skewness), xy=(0.5, 0.9), xycoords='axes fraction')
    plt.annotate('Kurtosis: {:.2f}'.format(_kurtosis), xy=(0.5, 0.85), xycoords='axes fraction')

    plt.show()

In [None]:
skew_kurt(df[df['Genre'] == 'blues'], 'Tempo')

In [None]:
skew_kurt(df[df['Genre'] == 'classical'], 'Tempo')

In [None]:
skew_kurt(df[df['Genre'] == 'country'], 'Tempo')

In [None]:
skew_kurt(df[df['Genre'] == 'disco'], 'Tempo')

In [None]:
skew_kurt(df[df['Genre'] == 'hiphop'], 'Tempo')

In [None]:
skew_kurt(df[df['Genre'] == 'jazz'], 'Tempo')

In [None]:
skew_kurt(df[df['Genre'] == 'metal'], 'Tempo')

In [None]:
skew_kurt(df[df['Genre'] == 'pop'], 'Tempo')

In [None]:
skew_kurt(df[df['Genre'] == 'reggae'], 'Tempo')

In [None]:
skew_kurt(df[df['Genre'] == 'rock'], 'Tempo')

<a id='feature-correlation'></a>
<font size="+1" color='#780404'><b> 4.4 Feature Correlation</b></font>  
[back to top](#table-of-contents)

In [None]:
df_corr = df.corr()

In [None]:
# Compute correlation matrix

# Set figure size and font sizes
fig, ax = plt.subplots(figsize=(50, 50))
sns.set(font_scale=1.9)

# Plot heatmap with adjusted color map
sns.heatmap(df_corr, cmap='coolwarm', annot=True, center=0, square=True)

# Adjust font size of features
ax.set_xticklabels(ax.get_xticklabels(), fontsize=35)
ax.set_yticklabels(ax.get_yticklabels(), fontsize=35)

# Add title and axis labels
plt.title('Correlation Matrix', fontsize=30)
plt.xlabel('Features', fontsize=20)
plt.ylabel('Features', fontsize=20)

# Show plot
plt.show()

<a id='data-preparation'></a>
<font size="+2" color='#053c96'><b> 5. Data Preparation</b></font>  
[back to top](#table-of-contents)

<a id='data-cleaning'></a>
<font size="+1" color='#780404'><b> 5.1 Data Cleaning</b></font>  
[back to top](#table-of-contents)

<a id='feature-engineering'></a>
<font size="+1" color='#780404'><b> 5.2 Feature Engineering</b></font>  
[back to top](#table-of-contents)

In [None]:
df['Beats Mean'] = df['Beats'].apply(lambda x: np.mean(x))
df['Central Moments Mean'] = df['Central Moments'].apply(lambda x: np.mean(x))
df['Zero Crossing Rate Mean'] = df['Zero Crossing Rate'].apply(lambda x: np.mean(x))
df['RMSE Mean'] = df['RMSE'].apply(lambda x: np.mean(x))
df['Spectral Contrast Mean'] = df['Spectral Contrast'].apply(lambda x: np.mean(x))
df['Spectral Roll-off Mean'] = df['Spectral Roll-off'].apply(lambda x: np.mean(x))
df['MFCC Mean'] = df['MFCC'].apply(lambda x: np.mean(x))
df['Chroma Mean'] = df['Chroma'].apply(lambda x: np.mean(x))
df['Spectral Centroid Mean'] = df['Spectral Centroid'].apply(lambda x: np.mean(x))
df['Spectral Bandwidth Mean'] = df['Spectral Bandwidth'].apply(lambda x: np.mean(x))

In [None]:
df['Beats Var'] = df['Beats'].apply(lambda x: np.var(x))
df['Central Moments Var'] = df['Central Moments'].apply(lambda x: np.var(x))
df['Zero Crossing Rate Var'] = df['Zero Crossing Rate'].apply(lambda x: np.var(x))
df['RMSE Var'] = df['RMSE'].apply(lambda x: np.var(x))
df['Spectral Contrast Var'] = df['Spectral Contrast'].apply(lambda x: np.var(x))
df['Spectral Roll-off Var'] = df['Spectral Roll-off'].apply(lambda x: np.var(x))
df['MFCC Var'] = df['MFCC'].apply(lambda x: np.var(x))
df['Chroma Var'] = df['Chroma'].apply(lambda x: np.var(x))
df['Spectral Centroid Var'] = df['Spectral Centroid'].apply(lambda x: np.var(x))
df['Spectral Bandwidth Var'] = df['Spectral Bandwidth'].apply(lambda x: np.var(x))

In [None]:
df['Beats Std'] = df['Beats'].apply(lambda x: np.std(x))
df['Central Moments Std'] = df['Central Moments'].apply(lambda x: np.std(x))
df['Zero Crossing Rate Std'] = df['Zero Crossing Rate'].apply(lambda x: np.std(x))
df['RMSE Std'] = df['RMSE'].apply(lambda x: np.std(x))
df['Spectral Contrast Std'] = df['Spectral Contrast'].apply(lambda x: np.std(x))
df['Spectral Roll-off Std'] = df['Spectral Roll-off'].apply(lambda x: np.std(x))
df['MFCC Std'] = df['MFCC'].apply(lambda x: np.std(x))
df['Chroma Std'] = df['Chroma'].apply(lambda x: np.std(x))
df['Spectral Centroid Std'] = df['Spectral Centroid'].apply(lambda x: np.std(x))
df['Spectral Bandwidth Std'] = df['Spectral Bandwidth'].apply(lambda x: np.std(x))

<a id='data-transformation'></a>
<font size="+1" color='#780404'><b> 5.3 Data Transformation</b></font>  
[back to top](#table-of-contents)
<a id='modeling'></a>

In [None]:
genre_col = df['Genre']
le = LabelEncoder()
y = le.fit_transform(genre_col)

In [None]:
# create target dataset
#y = df['Genre']

In [None]:
x = df.drop(['Genre', 'Filename', 'Unnamed: 0', 'Beats', 'SR',
             'Central Moments', 'Zero Crossing Rate', 'RMSE',
             'Spectral Contrast', 'Spectral Roll-off', 'MFCC',
             'Chroma', 'Spectral Centroid', 'Spectral Bandwidth'], axis=1)

In [None]:
x.shape

In [None]:
#Create a TF-IDF object instance
scaler = StandardScaler()
X = scaler.fit_transform(x)

In [None]:
#Pareto Principle Split
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size= 0.3,
                                                    random_state= 42)

<a id='modeling'></a>

<font size="+2" color='#053c96'><b> 6. Modeling</b></font>  
[back to top](#table-of-contents)

<a id='model-selection'></a>

<font size="+1" color='#780404'><b> 6.1 Model Selection</b></font>  
[back to top](#table-of-contents)

In [None]:
# Multiclass Support Vector Machines (SVM)
svm_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', OneVsRestClassifier(SVC(kernel='rbf', C=1, gamma='scale')))
])

# K-Means Clustering
kmeans_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('kmeans', KMeans(n_clusters=10))
])

# K-Nearest Neighbors (KNN)
knn_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('knn', KNeighborsClassifier(n_neighbors=5))
])

# Convolutional Neural Networks (CNNs)
cnn_pipeline = Pipeline([
    ('clf', Sequential([
        Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
        MaxPooling2D((2, 2)),
        Flatten(),
        Dense(10, activation='softmax')
    ]))
])

# Compile the CNN model
cnn_pipeline.named_steps['clf'].compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


<a id='model-training'></a>

<font size="+1" color='#780404'><b> 6.2 Model Training</b></font>  
[back to top](#table-of-contents)

In [None]:
# Fit SVM pipeline
svm_pipeline.fit(X_train, y_train)

# Fit K-Means pipeline
kmeans_pipeline.fit(X_train)

# Fit KNN pipeline
knn_pipeline.fit(X_train, y_train)


<a id='model-evaluation'></a>

<font size="+1" color='#780404'><b> 6.3 Model Evaluation</b></font>  
[back to top](#table-of-contents)

In [None]:
# Initialize the lists for performance metrics and confusion matrices
f1_list = []
pr_list = []
rc_list = []
acc_list = []
cm_list = []

# Evaluate additional models
# evaluate the SVM pipeline

y_pred_svm = svm_pipeline.predict(X_test)
f1_list.append(f1_score(y_test, y_pred_svm, average='weighted'))
pr_list.append(precision_score(y_test, y_pred_svm, average='weighted'))
rc_list.append(recall_score(y_test, y_pred_svm, average='weighted'))
acc_list.append(accuracy_score(y_test, y_pred_svm))
cm_list.append(confusion_matrix(y_test, y_pred_svm))

# evaluate the K-Means pipeline
y_pred_kmeans = kmeans_pipeline.predict(X_test)
f1_list.append(f1_score(y_test, y_pred_kmeans, average='weighted'))
pr_list.append(precision_score(y_test, y_pred_kmeans, average='weighted'))
rc_list.append(recall_score(y_test, y_pred_kmeans, average='weighted'))
acc_list.append(accuracy_score(y_test, y_pred_kmeans))
cm_list.append(confusion_matrix(y_test, y_pred_kmeans))

# evaluate the KNN pipeline
y_pred_knn = knn_pipeline.predict(X_test)
f1_list.append(f1_score(y_test, y_pred_knn, average='weighted'))
pr_list.append(precision_score(y_test, y_pred_knn, average='weighted'))
rc_list.append(recall_score(y_test, y_pred_knn, average='weighted'))
acc_list.append(accuracy_score(y_test, y_pred_knn))
cm_list.append(confusion_matrix(y_test, y_pred_knn))

# evaluate the CNN pipeline
y_pred_cnn = cnn_pipeline.predict(X_test)
f1_list.append(f1_score(y_test, y_pred_cnn, average='weighted'))
pr_list.append(precision_score(y_test, y_pred_cnn, average='weighted'))
rc_list.append(recall_score(y_test, y_pred_cnn, average='weighted'))
acc_list.append(accuracy_score(y_test, y_pred_cnn))
cm_list.append(confusion_matrix(y_test, y_pred_cnn))

In [None]:
model_list = ['SVM', 'Kmeans', 'KNN']

<a id='results'></a>
<font size="+2" color='#053c96'><b> 7. Results</b></font>  
[back to top](#table-of-contents)

In [None]:
result = pd.DataFrame({'Model': model_list, 'F1': f1_list, 'Precision': pr_list, 'Recall': rc_list, 'Accuracy': acc_list})

In [None]:
result

<a id='analysis-results'></a>

<font size="+1" color='#780404'><b> 7.1 Analysis of Results</b></font>  
[back to top](#table-of-contents)  

The results suggest that the SVM and KNN models perform better than the Kmeans model, based on F1-score, precision, recall, and accuracy.  

Overall, the SVM and KNN models appear to be more suitable for this task than the Kmeans model. However, the performance of the models may depend on the specific characteristics of the dataset, and further analysis may be necessary to confirm the generalizability of the models.

<a id='model-performance'></a>

<font size="+1" color='#780404'><b> 7.2 Model Performance</b></font>  
[back to top](#table-of-contents)  

From the given results, the SVM and KNN models perform better than the Kmeans model in terms of F1-score, precision, and recall. The SVM model has the highest F1-score and recall, while the KNN model has the highest precision. However, the SVM model has slightly lower precision than the KNN model.

The Kmeans model has the lowest F1-score, precision, and recall among the three models, indicating that it performs poorly in predicting positive cases. Although the accuracy of the Kmeans model is the same as that of the SVM model, accuracy alone is not a reliable metric for imbalanced datasets where the majority class dominates the prediction.

Therefore, based on the given results, the SVM and KNN models appear to be more suitable for the given task than the Kmeans model. However, it is important to note that the performance of the models may depend on the specific characteristics of the dataset, and further analysis may be necessary to confirm the generalizability of the models.

<a id='implications'></a>

<font size="+1" color='#780404'><b> 7.3 Implications</b></font>  
[back to top](#table-of-contents)  

The given results have several implications, such as:

Model Selection: The results suggest that the SVM and KNN models may be more suitable for this task than the Kmeans model. Therefore, the SVM or KNN model may be selected for further analysis or implementation.

Hyperparameter Tuning: The performance of the models may depend on the hyperparameters chosen. Therefore, hyperparameter tuning can be performed to optimize the model performance.

Dataset Characteristics: The performance of the models may depend on the specific characteristics of the dataset, such as the distribution of the classes or the presence of outliers. Therefore, further analysis may be necessary to confirm the generalizability of the models to other datasets.

Performance Metrics: The selection of performance metrics can have implications for the evaluation of the models. For example, the accuracy metric may not be reliable for imbalanced datasets. Therefore, multiple metrics should be used to evaluate the models comprehensively.

Overall, the given results provide insights into the performance of the models and can guide further analysis or implementation of the models.

<a id='conclusion'></a>

<font size="+2" color='#053c96'><b> 8. Conclusion</b></font>  
[back to top](#table-of-contents)

<a id='summary'></a>

<font size="+1" color='#780404'><b> 8.1 Summary</b></font>  
[back to top](#table-of-contents)  

In conclusion, the results of the machine learning models developed for music genre classification suggest that the SVM and KNN models perform better than the Kmeans model. Both the SVM and KNN models have higher F1-scores, precision, and recall than the Kmeans model, indicating better performance in predicting the different music genres, including Classically Punk. This implies that the developed machine learning application can automate the process of music genre classification, making it faster, more accurate, and scalable. The application can extract features from audio files and train an SVM or KNN model to classify music into different genres without the need for human intervention. However, further analysis is necessary to confirm the generalizability of the models to other datasets, and hyperparameter tuning may be required to optimize the performance of the models. Overall, the developed machine learning application has the potential to revolutionize the music industry by streamlining the process of music genre classification.

<a id='recommendations'></a>

<font size="+1" color='#780404'><b> 8.2 Recommendations</b></font>  
[back to top](#table-of-contents)  

Based on the problem statement and the results obtained from the machine learning models, here are some recommendations that could help to further improve the music genre classification application:

1. Data Augmentation: To improve the performance of the machine learning models, it may be helpful to augment the training data by adding variations of the existing data. This can be done by applying audio effects such as pitch shifting, time stretching, or adding noise to create a more diverse training set.

2. Feature Engineering: Feature engineering involves selecting and extracting meaningful features from audio files to train the machine learning model. By experimenting with different audio features and selecting the most informative ones, the performance of the machine learning model can be improved.

3. Ensemble Learning: Ensemble learning involves combining the predictions of multiple machine learning models to improve the overall performance. By training multiple models using different algorithms or hyperparameters and combining their predictions, the overall accuracy of the application can be improved.

4. User Feedback: Collecting feedback from users of the application can help to improve the accuracy of the machine learning model over time. By allowing users to provide feedback on the accuracy of the classification results, the application can learn from its mistakes and continuously improve its performance.

5. Continuous Monitoring: It is important to continuously monitor the performance of the machine learning model to ensure that it is accurately classifying the different music genres. Regularly updating the model with new data and retraining it can help to improve its accuracy and keep up with changes in the music industry.

<a id='references'></a>

<font size="+2" color='#053c96'><b> 9. References</b></font>  
[back to top](#table-of-contents)  

1. Arsh Chowdhry. Music Genre Classification Using CNN. https://blog.clairvoyantsoft.com/music-genre-classification-using-cnn-ef9461553726
2. Faisal Ahmed, Padma Polash Paul, Marina Gavrilova. Music Genre Classification Using a Gradient-Based Local Texture Descriptor https://www.researchgate.net/publication/303860385_Music_Genre_Classification_Using_a_Gradient-Based_Local_Texture_Descriptor
3. Albert Jiménez. Music Genre Classification with Deep Learning https://github.com/jsalbert/Music-Genre-Classification-with-Deep-Learning
4. Insiyah Hajoori. Music Genre Classification https://github.com/Insiyaa/Music-Genre-Classification
5. A. Elbir, N. Aydin. Music genre classification and music recommendation by using deep learning https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/el.2019.4202