# Music classification and generation with spectrograms

**By Neuromatch Academy**

__Content creators:__ Beatrix Benko, Lina Teichmann

**Our 2021 Sponsors, including Presenting Sponsor Facebook Reality Labs**

<p align='center'><img src='https://github.com/NeuromatchAcademy/widgets/blob/master/sponsors.png?raw=True'/></p>

## This notebook
This notebook loads the GTZAN dataset which includes audiofiles and spectrograms. You can use this dataset or find your own. The first part of the notebook is all about data visualization and show how to make spectrograms from audiofiles. The second part of the notebook includes a CNN that is trained on the spectrograms to predict music genre. Below we also provide links to tutorials and other resources if you want to try to do some of the harder project ideas. 

Have fun :) 


## Acknowledgements
This notebook was written by Beatrix Benkő and Lina Teichmann.

**Useful code examples:** 

https://towardsdatascience.com/music-genre-classification-with-python-c714d032f0d8

[https://pytorch.org/vision/stable/models.html](https://pytorch.org/vision/stable/models.html)

[https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py)

https://github.com/kamalesh0406/Audio-Classification 

https://github.com/zcaceres/spec_augment

https://musicinformationretrieval.com/ipython_audio.html 

---
# Setup

In [1]:
# @title Install dependencies
!sudo apt-get install -y ffmpeg --quiet
!pip install librosa --quiet
!pip install imageio --quiet
!pip install imageio-ffmpeg --quiet

Reading package lists...
Building dependency tree...
Reading state information...
ffmpeg is already the newest version (7:3.4.8-0ubuntu0.2).
0 upgraded, 0 newly installed, 0 to remove and 40 not upgraded.
[K     |████████████████████████████████| 26.9 MB 94 kB/s 
[?25h

In [2]:
# Import necessary libraries.
import os
import glob
import imageio
import random, shutil
import torch
import torch.nn as nn
from tqdm.notebook import tqdm
import torch.nn.functional as F
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as display
import librosa
import librosa.display

In [3]:
import requests

fname = "music.zip"
url = "https://osf.io/drjhb/download"

if not os.path.isfile(fname):
  try:
    r = requests.get(url)
  except requests.ConnectionError:
    print("!!! Failed to download data !!!")
  else:
    if r.status_code != requests.codes.ok:
      print("!!! Failed to download data !!!")
    else:
      with open(fname, "wb") as fid:
        fid.write(r.content)

## Loading GTZAN dataset (includes spectrograms)

The GTZAN dataset for music genre classification can be dowloaded from Kaggle: https://www.kaggle.com/andradaolteanu/gtzan-dataset-music-genre-classification. 

To download from Kaggle using this code you need to download and copy over your api token. In Kaggle go to the upper right side -> account -> API -> create API token. This downloads a json file. Copy the content into api_token. It should look like this: 

api_token = {"username":"johnsmith","key":"123a123a123"}


In [4]:
from zipfile import ZipFile

with ZipFile(fname, 'r') as zipObj:
  # Extract all the contents of zip file in different directory
  zipObj.extractall()

# **Have a look at the data**

In this section we are looking at an example of an audio waveform. Then we'll transform the sound wave to a spectrogram and compare it with the spectrogram that was included with the downloaded dataset.

In [11]:
import pandas as pd
import shutil

In [64]:
dir= 'Data/genres_original/' # the dir with all the genres
new_dir= 'new_music' # new dir for us

shutil.rmtree(new_dir, ignore_errors = False)
os.mkdir(new_dir)
genre_dict={}
for genre in os.listdir( dir ):
    genre_dir= dir+genre 
  
    os.mkdir(f'{new_dir}/{genre}')

    genre_dict[genre]= []
    print(f'we are in {genre} directory')
    use_this_dir= f'{new_dir}/{genre}' # move files here
    for music_file in os.listdir(genre_dir): # this is the music file name
        music_loc =os.path.join( genre_dir ,music_file) # the address for the music file
        
        try:
            y, sr = librosa.load(music_loc)
            y_resamble= librosa.resample(y, 22050, 11025) # using subsampling

            S = librosa.feature.melspectrogram( y_resamble, sr=11025 )
            S_DB = librosa.amplitude_to_db(S, ref=np.max)

            genre_dict[genre].append( S_DB )  # tries to add the signal to the genre

        except:
            print('error')

we are in classical directory
we are in reggae directory
we are in rock directory
we are in country directory
we are in jazz directory




error
we are in blues directory
we are in disco directory
we are in pop directory
we are in metal directory
we are in hiphop directory


In [156]:
type( S_DB[0][0] )

numpy.float32

In [158]:
type( genre_dict['jazz'][0][0] )

numpy.ndarray

In [50]:
# np.array( genre_dict['classical'][0] ).shape

(128, 647)

In [54]:
#len( genre_dict['classical'] )

100

In [57]:
#len( genre_dict['classical'][:-1] )

99

In [65]:
backup_dict=genre_dict.copy()

In [66]:
# taking only 99 signals instead of 100 because we have 1 faulty file inside genre: jazz
for arr in genre_dict.keys():
    if arr == 'jazz':
        pass
    else:
        del genre_dict[arr][-1]

In [67]:
 for arr in genre_dict.keys() :
    print( len(genre_dict[arr] ) ) #= genre_dict[arr]

99
99
99
99
99
99
99
99
99
99


In [161]:
df= pd.DataFrame( genre_dict)# , dtype= np.float32 )

In [162]:
df.describe()

Unnamed: 0,classical,reggae,rock,country,jazz,blues,disco,pop,metal,hiphop
count,99,99,99,99,99,99,99,99,99,99
unique,99,99,99,99,99,99,99,99,99,99
top,"[[-80.0, -80.0, -80.0, -80.0, -80.0, -80.0, -8...","[[-80.0, -80.0, -80.0, -80.0, -76.52898, -66.3...","[[-53.572636, -63.776825, -70.671234, -73.9513...","[[-50.659946, -62.22157, -68.24938, -62.283287...","[[-77.21803, -80.0, -80.0, -80.0, -72.27919, -...","[[-45.802464, -52.50522, -48.651497, -50.63896...","[[-53.371067, -63.823544, -69.51855, -70.93449...","[[-56.789703, -67.73527, -80.0, -80.0, -80.0, ...","[[-42.893692, -48.554314, -69.68238, -68.8456,...","[[-80.0, -80.0, -80.0, -63.684887, -38.709923,..."
freq,1,1,1,1,1,1,1,1,1,1


In [168]:
type( df.iloc[0,0][0] )

numpy.ndarray

In [79]:
df.shape

(99, 10)

In [80]:
df.head()

Unnamed: 0,classical,reggae,rock,country,jazz,blues,disco,pop,metal,hiphop
0,"[[-56.169285, -51.565807, -53.041862, -57.9066...","[[-22.948582, -34.874153, -59.623585, -80.0, -...","[[-80.0, -71.236534, -64.92233, -70.93353, -77...","[[-55.022278, -67.0602, -80.0, -80.0, -80.0, -...","[[-49.520855, -61.457485, -80.0, -80.0, -80.0,...","[[-54.446106, -65.96031, -80.0, -80.0, -80.0, ...","[[-80.0, -80.0, -80.0, -80.0, -80.0, -80.0, -5...","[[-49.887108, -50.27855, -53.37818, -58.324833...","[[-37.505733, -48.246883, -58.930107, -52.3785...","[[-65.22782, -75.527054, -80.0, -80.0, -80.0, ..."
1,"[[-65.06656, -64.6403, -66.99222, -64.071014, ...","[[-28.089111, -39.634796, -56.31231, -47.22966...","[[-80.0, -80.0, -80.0, -80.0, -73.34844, -63.9...","[[-45.519897, -54.378494, -68.088135, -76.5962...","[[-75.08989, -72.537346, -61.03471, -66.81137,...","[[-56.534782, -62.605957, -68.94722, -74.14601...","[[-59.511555, -58.501495, -67.2402, -80.0, -80...","[[-77.8993, -59.018692, -56.494858, -76.02396,...","[[-64.871185, -70.61197, -73.46569, -75.072845...","[[-80.0, -68.52204, -57.228115, -64.56891, -63..."
2,"[[-65.852036, -72.035774, -80.0, -80.0, -80.0,...","[[-80.0, -79.159966, -77.46018, -80.0, -80.0, ...","[[-49.03448, -61.152744, -80.0, -80.0, -79.504...","[[-80.0, -80.0, -63.975315, -45.75179, -46.260...","[[-60.805717, -72.986404, -80.0, -80.0, -80.0,...","[[-45.1009, -55.695137, -68.64857, -42.568092,...","[[-73.52011, -72.99132, -73.915634, -73.27798,...","[[-54.31614, -53.601448, -60.40483, -64.333466...","[[-58.680965, -45.77376, -49.96103, -59.26811,...","[[-73.01895, -70.211136, -66.077484, -67.89532..."
3,"[[-52.188923, -62.116604, -69.53473, -68.55293...","[[-30.585752, -36.768913, -57.213676, -80.0, -...","[[-64.26881, -57.198017, -57.116177, -71.72972...","[[-62.06629, -74.05279, -80.0, -80.0, -80.0, -...","[[-70.6375, -77.56033, -80.0, -80.0, -80.0, -7...","[[-80.0, -71.65339, -59.64942, -61.843056, -73...","[[-73.13697, -80.0, -80.0, -80.0, -80.0, -80.0...","[[-22.615208, -26.546936, -30.665283, -33.8948...","[[-51.67262, -53.97884, -63.815582, -78.71507,...","[[-68.76893, -77.34634, -80.0, -80.0, -80.0, -..."
4,"[[-80.0, -80.0, -78.219925, -80.0, -80.0, -80....","[[-74.92324, -64.09145, -68.459274, -80.0, -74...","[[-80.0, -80.0, -80.0, -80.0, -80.0, -80.0, -8...","[[-80.0, -80.0, -55.73512, -50.668514, -63.956...","[[-50.657246, -62.851837, -80.0, -80.0, -79.75...","[[-57.860146, -55.983177, -49.027184, -41.9949...","[[-59.896927, -69.328415, -80.0, -71.97351, -5...","[[-48.575806, -60.09677, -80.0, -80.0, -74.953...","[[-79.39782, -80.0, -69.799545, -61.424614, -6...","[[-80.0, -80.0, -80.0, -80.0, -80.0, -80.0, -8..."


In [169]:
df_as_array= df.to_numpy()

# save the data as array

In [170]:
np.save('data_array.npy' , df_as_array  )

bad don't save it as csv

In [81]:
# saving the csv data
df.to_csv('signals_df.csv', )

In [84]:
df['classical'][0].shape

(128, 647)

In [87]:
for i in df : # this is the labels
    print(i)

classical
reggae
rock
country
jazz
blues
disco
pop
metal
hiphop


In [89]:
# creating X for the signals
# y for the labels
y=[]
x=[]
for label in df:
    for sig in df[label]:
        y.append(label)
        x.append(sig)


In [90]:
y[0]

'classical'

In [149]:
type( x[0] )

str

In [92]:
len(y)==len(x)

True

In [100]:
x[0].shape

(128, 647)

In [103]:
type(x[0])

numpy.ndarray

In [105]:
from  sklearn.model_selection import train_test_split 

In [106]:
x_train , x_test , y_train , y_test = train_test_split( x , y , train_size=0.8)

In [108]:
len(x_train) == len(y_train)

True

In [109]:
len(x_train)

792

# for loading the data

In [None]:
df= pd.read_csv( '/content/signals_df.csv', dtype=float ).drop(columns='Unnamed: 0')

In [122]:
# creating X for the signals
# y for the labels
y=[]
x=[]
for label in df:
    for sig in df[label]:
        y.append(label)
        x.append(sig)


In [123]:
from  sklearn.model_selection import train_test_split 
x_train , x_test , y_train , y_test = train_test_split( x , y , train_size=0.8)

In [124]:
train_df= pd.DataFrame(x_train,y_train)

In [125]:
test_df= pd.DataFrame(x_test,y_test)
test_df.head()

Unnamed: 0,0
pop,[[-50.162785 -60.512604 -71.810326 ... -60.938...
country,[[-45.721294 -57.762127 -80. ... -80. ...
jazz,[[-80. -76.86156 -57.081234 ... -53.789...
disco,[[-59.511555 -58.501495 -67.2402 ... -68.254...
hiphop,[[-61.437412 -73.42264 -80. ... -80. ...


In [126]:
train_df.index.unique()

Index(['reggae', 'rock', 'classical', 'pop', 'disco', 'blues', 'jazz',
       'hiphop', 'country', 'metal'],
      dtype='object')

In [128]:
train_df.shape

(792, 1)

In [145]:
train_df.iloc[0].values

array(['[[-21.685291 -32.59768  -68.05658  ... -80.       -76.73924  -66.61024 ]\n [-24.346855 -36.347206 -70.20903  ... -69.52498  -46.145756 -34.493423]\n [-10.908199 -10.183983 -10.519562 ... -36.794903 -29.28776  -20.9455  ]\n ...\n [-80.       -80.       -80.       ... -80.       -80.       -80.      ]\n [-80.       -80.       -80.       ... -80.       -80.       -80.      ]\n [-80.       -80.       -80.       ... -80.       -80.       -80.      ]]'],
      dtype=object)

In [146]:
np.array( x_train, dtype=np.float32) 

ValueError: ignored

In [129]:
x_train = torch.from_numpy(x_train)
x_test = torch.from_numpy(x_test)

y_train =torch.from_numpy(y_train).type(torch.IntTensor) 
y_test = torch.from_numpy(y_test).type(torch.IntTensor)

train = torch.utils.data.TensorDataset(X_train,Y_train)
test = torch.utils.data.TensorDataset(X_test,Y_test)

TypeError: ignored

## THIS1 : trying to load the data

In [173]:
data_tmp = np.load('/content/data_array.npy' , allow_pickle=True)

In [174]:
data= pd.DataFrame(data_tmp)

In [176]:
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,"[[-56.169285, -51.565807, -53.041862, -57.9066...","[[-22.948582, -34.874153, -59.623585, -80.0, -...","[[-80.0, -71.236534, -64.92233, -70.93353, -77...","[[-55.022278, -67.0602, -80.0, -80.0, -80.0, -...","[[-49.520855, -61.457485, -80.0, -80.0, -80.0,...","[[-54.446106, -65.96031, -80.0, -80.0, -80.0, ...","[[-80.0, -80.0, -80.0, -80.0, -80.0, -80.0, -5...","[[-49.887108, -50.27855, -53.37818, -58.324833...","[[-37.505733, -48.246883, -58.930107, -52.3785...","[[-65.22782, -75.527054, -80.0, -80.0, -80.0, ..."
1,"[[-65.06656, -64.6403, -66.99222, -64.071014, ...","[[-28.089111, -39.634796, -56.31231, -47.22966...","[[-80.0, -80.0, -80.0, -80.0, -73.34844, -63.9...","[[-45.519897, -54.378494, -68.088135, -76.5962...","[[-75.08989, -72.537346, -61.03471, -66.81137,...","[[-56.534782, -62.605957, -68.94722, -74.14601...","[[-59.511555, -58.501495, -67.2402, -80.0, -80...","[[-77.8993, -59.018692, -56.494858, -76.02396,...","[[-64.871185, -70.61197, -73.46569, -75.072845...","[[-80.0, -68.52204, -57.228115, -64.56891, -63..."
2,"[[-65.852036, -72.035774, -80.0, -80.0, -80.0,...","[[-80.0, -79.159966, -77.46018, -80.0, -80.0, ...","[[-49.03448, -61.152744, -80.0, -80.0, -79.504...","[[-80.0, -80.0, -63.975315, -45.75179, -46.260...","[[-60.805717, -72.986404, -80.0, -80.0, -80.0,...","[[-45.1009, -55.695137, -68.64857, -42.568092,...","[[-73.52011, -72.99132, -73.915634, -73.27798,...","[[-54.31614, -53.601448, -60.40483, -64.333466...","[[-58.680965, -45.77376, -49.96103, -59.26811,...","[[-73.01895, -70.211136, -66.077484, -67.89532..."
3,"[[-52.188923, -62.116604, -69.53473, -68.55293...","[[-30.585752, -36.768913, -57.213676, -80.0, -...","[[-64.26881, -57.198017, -57.116177, -71.72972...","[[-62.06629, -74.05279, -80.0, -80.0, -80.0, -...","[[-70.6375, -77.56033, -80.0, -80.0, -80.0, -7...","[[-80.0, -71.65339, -59.64942, -61.843056, -73...","[[-73.13697, -80.0, -80.0, -80.0, -80.0, -80.0...","[[-22.615208, -26.546936, -30.665283, -33.8948...","[[-51.67262, -53.97884, -63.815582, -78.71507,...","[[-68.76893, -77.34634, -80.0, -80.0, -80.0, -..."
4,"[[-80.0, -80.0, -78.219925, -80.0, -80.0, -80....","[[-74.92324, -64.09145, -68.459274, -80.0, -74...","[[-80.0, -80.0, -80.0, -80.0, -80.0, -80.0, -8...","[[-80.0, -80.0, -55.73512, -50.668514, -63.956...","[[-50.657246, -62.851837, -80.0, -80.0, -79.75...","[[-57.860146, -55.983177, -49.027184, -41.9949...","[[-59.896927, -69.328415, -80.0, -71.97351, -5...","[[-48.575806, -60.09677, -80.0, -80.0, -74.953...","[[-79.39782, -80.0, -69.799545, -61.424614, -6...","[[-80.0, -80.0, -80.0, -80.0, -80.0, -80.0, -8..."


In [178]:
type( data.iloc[0,0][0] )

numpy.ndarray

In [179]:
# creating X for the signals
# y for the labels
y=[]
x=[]
for label in data:
    for sig in data[label]:
        y.append(label)
        x.append(sig)


## min index to clip by

In [251]:
np.array( tmp_list_for_len ).min()

645

journey to the clipped value 

In [293]:
tmp_df= pd.DataFrame( x, y )

  values = np.array([convert(v) for v in values])


In [294]:
tmp_df.shape

(990, 1)

In [296]:
tmp_df.index # labels

Int64Index([0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            ...
            9, 9, 9, 9, 9, 9, 9, 9, 9, 9],
           dtype='int64', length=990)

# creating our new clipped data

In [328]:
tmp_arr=np.zeros((990,1,128,645))
for i in range(tmp_df.shape[0]):
    tmp_arr[i]=tmp_df.iloc[i][0][:,:645]

In [306]:
np.shape( tmp_df.iloc[5][0][:,:645])

(128, 645)

In [331]:
tmp_arr.shape

(990, 1, 128, 645)

In [342]:
X= np.squeeze( tmp_arr.copy())

In [343]:
type(X)

numpy.ndarray

In [337]:
y=tmp_df.index.to_numpy()

In [338]:
type(y)

numpy.ndarray

In [344]:
X.shape

(990, 128, 645)

In [340]:
y.shape

(990,)

In [348]:
np.save( 'features.npy',X)
np.save('labels.npy' ,y)

In [345]:
from  sklearn.model_selection import train_test_split 
x_train , x_test , y_train , y_test = train_test_split( X , y , train_size=0.8)

In [347]:
x_train = torch.tensor(x_train )
x_test = torch.tensor(x_test)

y_train =torch.tensor(y_train).type(torch.IntTensor) 
y_test = torch.tensor(y_test).type(torch.IntTensor)

train = torch.utils.data.TensorDataset(x_train,y_train)
test = torch.utils.data.TensorDataset(x_test,y_test)

  """Entry point for launching an IPython kernel.
  
  after removing the cwd from sys.path.
  """


# Saving the files onto the drive

In [365]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [366]:
!cp /content/features.npy /content/drive/MyDrive

In [367]:
!cp /content/labels.npy /content/drive/MyDrive

# Loading the files 
## you may want to get the new address from your drive
[Data here](https://drive.google.com/drive/folders/1T1ox5SBkEf5QDn1w07f2w2teXbpPOjQX?usp=sharing
)

In [349]:
x_load= np.load('features.npy')
y_load= np.load('labels.npy')

In [350]:
from  sklearn.model_selection import train_test_split 
x_train , x_test , y_train , y_test = train_test_split( x_load , y_load , train_size=0.8)

In [351]:
x_train = torch.tensor(x_train )
x_test = torch.tensor(x_test)

y_train =torch.tensor(y_train).type(torch.IntTensor) 
y_test = torch.tensor(y_test).type(torch.IntTensor)

train = torch.utils.data.TensorDataset(x_train,y_train)
test = torch.utils.data.TensorDataset(x_test,y_test)