In [1]:
from src.audio_analysis import AudioData, load_audio, read_audio_data
import pandas as pd
import librosa.display
from IPython.display import display

First, we need a list with file names to be analyzed.
Loading the meta data of our samples...

In [2]:

df = pd.read_csv('./sampled/sampled.csv', index_col='Unnamed: 0')
df = df[['id', 'name', 'artists', 'valence', 'filename']]
df.head()

Unnamed: 0,id,name,artists,valence,filename
0,7lR4CRvHJ6IgX3oHbGeqlQ,Hilux,João Boiadeiro,0.923,./sampled/7lR4CRvHJ6IgX3oHbGeqlQ.mp3
1,35qOGx1CWvCkQFOv4BkVIV,Esquema Preferido - Ao Vivo,Os Barões Da Pisadinha,0.968,./sampled/35qOGx1CWvCkQFOv4BkVIV.mp3
2,2vPRrOaM3zvPwrOHoIPjcF,"Ele É Ele, Eu Sou Eu",Wesley Safadão \ Os Barões Da Pisadinha,0.862,./sampled/2vPRrOaM3zvPwrOHoIPjcF.mp3
3,7as7OL7cmgFZDADgVjQZjz,Meia Noite (Ce Tem Meu Whatsapp) - Ao Vivo,Os Barões Da Pisadinha,0.96,./sampled/7as7OL7cmgFZDADgVjQZjz.mp3
4,4KyqnztwWZB3Kw1bJAEqPS,Sala de Aula,João Boiadeiro,0.85,./sampled/4KyqnztwWZB3Kw1bJAEqPS.mp3


Audio analysis will be done and the following signals will be extracted:

- 'chroma'
- 'rmse'
- 'spec_cent'
- 'spec_bw'
- 'rolloff'
- 'zcr'
- 'mfcc'
- 'melspectrogram'
- 'y' (raw audio data if further analysis is needed)

We consider the files to be 30 second previews, but the duration and offset (ignoring beginning of song) can be adjusted with additional args in load_audio.

Please check the method definition.

In [3]:
audio_analysis = load_audio(df['filename'])

100%|██████████| 54/54 [01:37<00:00,  1.81s/it]

Audio data succesfully processed!





The audio analysis becomes the X data after dropping the raw audio data column.

Y data comes from the reference df loaded at the beginning.

We keep the librosa_raw in case we need further analysis.

In [4]:
X = pd.DataFrame(audio_analysis)
Y = df['valence']
librosa_raw = X['y'].copy()
X = X.drop('y', axis=1)

In [5]:
X.head()

Unnamed: 0,chroma,rmse,spec_cent,spec_bw,rolloff,zcr,mfcc,melspectrogram
0,"[[0.12775837, 0.06570292, 0.06905485, 0.369096...","[0.1615236, 0.20659907, 0.27349138, 0.31197056...","[[2253.9517900311957, 2863.5931170588915, 3274...","[[2712.5691600486643, 3028.931489955659, 3198....","[[6061.5966796875, 6836.7919921875, 7967.28515...","[[0.044921875, 0.0830078125, 0.1357421875, 0.1...","[[-80.60605, -47.257175, -24.116472, 51.960297...","[[2.222499, 0.52993447, 0.25253808, 2.4793954,..."
1,"[[0.37827095, 0.14475623, 0.6449782, 1.0, 1.0,...","[0.0690847, 0.14975865, 0.284021, 0.4023466, 0...","[[4218.713414699616, 5071.574053055507, 4634.2...","[[3303.640143255119, 3248.3543594811204, 3251....","[[8247.216796875, 8656.34765625, 8537.91503906...","[[0.18603515625, 0.28125, 0.31396484375, 0.243...","[[-96.8412, -63.54257, 15.9676075, 52.225468, ...","[[0.16965969, 0.08529728, 37.817314, 912.44116..."
2,"[[0.01418765, 0.021669295, 0.076516874, 0.2691...","[0.06827622, 0.07664133, 0.121490404, 0.183671...","[[2425.1127880004624, 2526.586162028486, 3282....","[[2130.3626052534214, 2157.8840339572275, 2590...","[[4575.8056640625, 4855.7373046875, 6169.26269...","[[0.0517578125, 0.0830078125, 0.125, 0.1157226...","[[-143.25423, -141.37218, -107.94575, -23.2684...","[[0.032627393, 0.009511277, 0.09821674, 5.0485..."
3,"[[0.27429864, 0.14574945, 0.19059972, 0.598107...","[0.15967563, 0.22078463, 0.35893396, 0.4414945...","[[2387.029319917244, 2571.326107438398, 3133.7...","[[2526.122578858845, 2618.180269375585, 2761.5...","[[4995.703125, 6115.4296875, 6352.294921875, 5...","[[0.0595703125, 0.10595703125, 0.1162109375, 0...","[[-23.37339, -7.9882774, 22.877676, 55.631496,...","[[1.518844, 1.3209215, 27.384771, 730.37836, 2..."
4,"[[0.26253396, 0.1730196, 0.09896742, 0.2867989...","[0.16210034, 0.2073524, 0.3078415, 0.42302924,...","[[5499.96745762831, 5426.714831030211, 3952.56...","[[3506.9254109602275, 3445.2027330478722, 3242...","[[8893.212890625, 8839.3798828125, 8139.550781...","[[0.166015625, 0.19482421875, 0.25048828125, 0...","[[-44.362534, -30.872753, 16.966043, 87.66868,...","[[0.23404446, 0.23236667, 1.3214282, 174.60545..."


In [6]:
Y.head()

0    0.923
1    0.968
2    0.862
3    0.960
4    0.850
Name: valence, dtype: float64

In [7]:
df.head()

Unnamed: 0,id,name,artists,valence,filename
0,7lR4CRvHJ6IgX3oHbGeqlQ,Hilux,João Boiadeiro,0.923,./sampled/7lR4CRvHJ6IgX3oHbGeqlQ.mp3
1,35qOGx1CWvCkQFOv4BkVIV,Esquema Preferido - Ao Vivo,Os Barões Da Pisadinha,0.968,./sampled/35qOGx1CWvCkQFOv4BkVIV.mp3
2,2vPRrOaM3zvPwrOHoIPjcF,"Ele É Ele, Eu Sou Eu",Wesley Safadão \ Os Barões Da Pisadinha,0.862,./sampled/2vPRrOaM3zvPwrOHoIPjcF.mp3
3,7as7OL7cmgFZDADgVjQZjz,Meia Noite (Ce Tem Meu Whatsapp) - Ao Vivo,Os Barões Da Pisadinha,0.96,./sampled/7as7OL7cmgFZDADgVjQZjz.mp3
4,4KyqnztwWZB3Kw1bJAEqPS,Sala de Aula,João Boiadeiro,0.85,./sampled/4KyqnztwWZB3Kw1bJAEqPS.mp3


Now we can create an AudioData object that will store all info we need.

This object stores all necessary information and also creates training, validation and test sets.

The split can be adjusted in split_df method of the object - it has been defaulted to 4, 1, 1.

In [8]:
audio = AudioData(X, Y, df, librosa_raw)

chroma is shape (54,)
rmse is shape (54,)
spec_cent is shape (54,)
spec_bw is shape (54,)
rolloff is shape (54,)
zcr is shape (54,)
mfcc is shape (54,)
melspectrogram is shape (54,)
rmse is shape (54,)


When AudioData object is created, it also fixes the shapes so that the data is ready for training.

This is necessary because the raw audio_data can have different lengths although they are all 30 second previews.

It also calculates means and vars of features, as well as first order diff means and vars.

In [15]:
print('Shapes')
display(audio.X_train.shape)
display(audio.X_valid.shape)
display(audio.X_test.shape)

audio.X_test.head()


Shapes


(36, 104)

(8, 104)

(8, 104)

Unnamed: 0,chroma_mean,chroma_var,chroma_meandif,chroma_vardif,rmse_mean,rmse_var,rmse_meandif,rmse_vardif,spec_cent_mean,spec_cent_var,...,mfcc17_meandif,mfcc17_vardif,mfcc18_mean,mfcc18_var,mfcc18_meandif,mfcc18_vardif,mfcc19_mean,mfcc19_var,mfcc19_meandif,mfcc19_vardif
53,0.335798,0.073539,-0.00014,0.042796,0.309283,0.011491,5.2e-05,0.003613,2859.132767,451819.038458,...,0.002307,21.747654,-6.085621,50.355755,0.008724,25.351391,8.70872,47.82275,-9.8e-05,22.238066
20,0.343088,0.084259,0.000161,0.050041,0.219217,0.00317,7.5e-05,0.000853,2772.924194,475362.841809,...,0.007036,20.649643,-10.787173,49.831371,0.011565,16.83968,2.399181,71.978027,0.006535,21.948231
7,0.408218,0.124302,5.9e-05,0.041728,0.254981,0.008761,4.4e-05,0.002581,2225.483645,425800.035114,...,-0.000754,24.788101,-3.522094,83.469788,-0.004439,22.101126,0.202282,112.795372,0.004171,24.98498
42,0.265665,0.053756,0.000439,0.034396,0.33209,0.006442,4e-06,0.001834,2780.412396,542693.325558,...,0.008788,25.702549,-3.680893,109.029617,0.005904,27.898499,-0.649366,112.115471,0.010379,30.592411
14,0.311158,0.070788,0.000134,0.041889,0.237023,0.017518,0.000179,0.003464,2542.114877,531116.503902,...,0.002272,20.152248,-3.600922,63.182987,-0.004914,21.264416,2.516004,80.436584,0.001275,25.889658


In [16]:
print('Shapes')
display(audio.Y_train.shape)
display(audio.Y_valid.shape)
display(audio.Y_test.shape)

audio.Y_test.head()

Shapes


(36,)

(8,)

(8,)

53    0.939
20    0.937
7     0.887
42    0.800
14    0.778
Name: valence, dtype: float64

In [11]:
audio.X_valid_mel.shape
audio.X_test_mel.shape
audio.X_train_mel.shape

(36, 90, 1281, 1)

This last data is the X that can be used for CNN models. It is the whole melspectrogram before using the librosa power_to_db method

1st dimension is number of songs, 2nd dimension is number of mels, 3rd dimension is related to length of raw audio data

4th dimension is used for model training on CNN 

Below is the original meta data that we loaded

In [12]:
audio.df.head()

Unnamed: 0,id,name,artists,valence,filename
0,7lR4CRvHJ6IgX3oHbGeqlQ,Hilux,João Boiadeiro,0.923,./sampled/7lR4CRvHJ6IgX3oHbGeqlQ.mp3
1,35qOGx1CWvCkQFOv4BkVIV,Esquema Preferido - Ao Vivo,Os Barões Da Pisadinha,0.968,./sampled/35qOGx1CWvCkQFOv4BkVIV.mp3
2,2vPRrOaM3zvPwrOHoIPjcF,"Ele É Ele, Eu Sou Eu",Wesley Safadão \ Os Barões Da Pisadinha,0.862,./sampled/2vPRrOaM3zvPwrOHoIPjcF.mp3
3,7as7OL7cmgFZDADgVjQZjz,Meia Noite (Ce Tem Meu Whatsapp) - Ao Vivo,Os Barões Da Pisadinha,0.96,./sampled/7as7OL7cmgFZDADgVjQZjz.mp3
4,4KyqnztwWZB3Kw1bJAEqPS,Sala de Aula,João Boiadeiro,0.85,./sampled/4KyqnztwWZB3Kw1bJAEqPS.mp3


And the raw audio data for each song

In [13]:
audio.raw.head()

0    [-0.0020327817, 0.027678424, 0.03545276, -0.01...
1    [-0.040720813, -0.037129246, -0.021754216, 0.0...
2    [0.13886842, 0.2718351, 0.21805392, 0.16034429...
3    [0.24515587, 0.3735583, 0.17928192, 0.07805224...
4    [0.18225138, 0.43271464, 0.19823687, 0.3303496...
Name: y, dtype: object

The preprocessed data can be gathered and used for various models.

For access to the model training results, please contact lucasmouragomes@outlook.com

Disclaimer: Audio files were acquired from previews of Spotify API.