# GOAL 

* Generating a list of 200 sample songs and creating a dataset for their spectrograms. The original dataset is very large.  
* Bundling the image and label data in a format compatible with fastai (pytorch under the hood).

<br/>
<br/>
<br/><br/>

In [1]:
import pandas as pd
import re
import numpy as np
import random

In [100]:
df = pd.read_csv("tagger_tutorial_dataset/index.csv", header=0)

In [101]:
df.head()

Unnamed: 0,start,end,name,spectrogram,Angry,Busy & Frantic,Casino,Changing Tempo,Chasing,Countryside,...,Sentimental,Sexy,Smooth,Sneaking,Snowy Holiday,Sports Arena,Sunny Holiday,Suspense,Water,Weird
0,61.453417,91.453417,100 Years,1/100 Years.png,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,123.559667,153.559667,1901,1/1901.png,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,29.93775,59.93775,3 AM,3/3 AM.png,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
3,48.711,78.711,3 Corners,3/3 Corners.png,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,69.05925,99.05925,300 Years Old,3/300 Years Old.png,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [102]:
#Looking into the names of songs 
# we will be sampling only certain songs, so have to pick up all the spectograms belonging to that song. 
df.loc[:, ["name", "spectrogram"]].head()

Unnamed: 0,name,spectrogram
0,100 Years,1/100 Years.png
1,1901,1/1901.png
2,3 AM,3/3 AM.png
3,3 Corners,3/3 Corners.png
4,300 Years Old,3/300 Years Old.png


In [103]:
df = df.sort_values(by='name', axis = 0)
df.loc[:, ["name", "spectrogram"]].head()

Unnamed: 0,name,spectrogram
20581,100 Days Of Sunshine 1,1/100 Days Of Sunshine 1.png
20582,100 Days Of Sunshine 2,1/100 Days Of Sunshine 2.png
20717,100 Days Of Sunshine 3,1/100 Days Of Sunshine 3.png
0,100 Years,1/100 Years.png
25559,15s In A Trunk,1/15s In A Trunk.png


In [104]:
df.head().spectrogram.str.rsplit('/', 1).apply(lambda x: x[-1])

20581    100 Days Of Sunshine 1.png
20582    100 Days Of Sunshine 2.png
20717    100 Days Of Sunshine 3.png
0                     100 Years.png
25559            15s In A Trunk.png
Name: spectrogram, dtype: object

In [106]:
df["title"] = df.name.str.extract(r'(^[0-9a-z]+[^0-9]*)', re.IGNORECASE)
df.title = df.title.str.strip()

df["fast_ai_path"] = df.spectrogram.str.rsplit('/', 1).apply(lambda x: re.sub(' ', '_', x[-1]))

df.loc[:, ["name", "spectrogram", "fast_ai_path", "title"]].head()

Unnamed: 0,name,spectrogram,fast_ai_path,title
20581,100 Days Of Sunshine 1,1/100 Days Of Sunshine 1.png,100_Days_Of_Sunshine_1.png,100 Days Of Sunshine
20582,100 Days Of Sunshine 2,1/100 Days Of Sunshine 2.png,100_Days_Of_Sunshine_2.png,100 Days Of Sunshine
20717,100 Days Of Sunshine 3,1/100 Days Of Sunshine 3.png,100_Days_Of_Sunshine_3.png,100 Days Of Sunshine
0,100 Years,1/100 Years.png,100_Years.png,100 Years
25559,15s In A Trunk,1/15s In A Trunk.png,15s_In_A_Trunk.png,15s In A Trunk


In [107]:
song_titles = df.title.unique().tolist()
print(f'Number of unique songs is : {len(song_titles)}')

Number of unique songs is : 12746


<br/>

Lets sample ~200 songs and then we'll go fetch their spectograms. Some songs have several spectrograms. 

In [108]:
n_songs = 200 # number of songs to sample

In [109]:
#random.seed(42)
sample_songs = random.sample(song_titles, n_songs)

In [110]:
sample_songs[0:10]

['I Wanna Be With You',
 'Twentyfour Knobs',
 'String Quartet In Es Major No',
 'Urban Transitions',
 'A Little Goes A Long Way',
 'Happy Children',
 'Spring Is Coming',
 'Walz From Vienna',
 'Time To Act',
 'Once Only']

In [111]:
#locations of the spectrograms of the sampled songs 
sample_spectrograms = df[df['title'].isin(sample_songs)]
sample_spectrograms = sample_spectrograms.sort_values(by= 'title', axis=0)

<br/>

Lets go fetch their spectrogram images and put them in a separate dataset called sample_dataset_200.

In [112]:
sample_spectrograms.loc[:, ["title", "spectrogram"]].head(10)

Unnamed: 0,title,spectrogram
24,A Good Day,A/A Good Day.png
13985,A Horse Called Toto,A/A Horse Called Toto 1.png
13986,A Horse Called Toto,A/A Horse Called Toto 2.png
13987,A Horse Called Toto,A/A Horse Called Toto 3.png
13988,A Horse Called Toto,A/A Horse Called Toto 4.png
13989,A Horse Called Toto,A/A Horse Called Toto 5.png
24263,A Little Goes A Long Way,A/A Little Goes A Long Way.png
12190,A Serial Mind,A/A Serial Mind.png
63,Above The Surface,A/Above The Surface.png
5597,Acoustic Lullaby,A/Acoustic Lullaby 2.png


In [113]:
# we need the entire path A/Song1.png to create the folder
sample_specs = sample_spectrograms.spectrogram.tolist() 
print(f'Going to copy over {len(sample_specs)} spectrograms to the sample dataset')

Going to copy over 491 spectrograms to the sample dataset


In [114]:
with open('sample_spectrograms.txt', 'w') as filehandle:
    for spec in sample_specs:
        filehandle.write('%s\n' % spec)

<br/>
<br/>
<br/>
<br/>

### Creating the subset folder
Only tampering with sample subset, not changing anything about the original dataset.

In [115]:
%%bash

if [ -d sample_dataset_200 ] ; then
    rm -r sample_dataset_200
    echo "Deleting older dataset"
fi

mkdir -p sample_dataset_200
echo "Creating a new dataset"

while IFS= read -r file; 
    do cp tagger_tutorial_dataset/"$file" sample_dataset_200; 
done < sample_spectrograms.txt

Deleting older dataset
Creating a new dataset


Removing white space from file names and substituting with _ to facilitate easier processing later

In [116]:
%%bash
cd sample_dataset_200/
for f in *; do mv "$f" `echo $f | tr ' ' '_'`; done

In [117]:
%%bash
ls sample_dataset_200/ | wc -l

     491


Now that we have the data, lets format the data labels.  
I'll be using fastai for the image classification, so Ill be preparing my dataset according to the DataBunch class referenced [here](https://docs.fast.ai/basic_data.html#DataBunch).

<br/>
<br/>
<br/>
<br/>

### ONE ROW PER LABEL (Mood of song)

Going from a WIDE to LONG format to get a comma separated list of moods (tags) per song.

In [118]:
col= ["start", "end", "name"]
x = sample_spectrograms.drop(columns=col, axis='columns')
x["id"] = np.arange(0, len(x), 1)

In [119]:
x.head()

Unnamed: 0,spectrogram,Angry,Busy & Frantic,Casino,Changing Tempo,Chasing,Countryside,Dark,Dreamy,Eccentric,...,Sneaking,Snowy Holiday,Sports Arena,Sunny Holiday,Suspense,Water,Weird,title,fast_ai_path,id
24,A/A Good Day.png,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,A Good Day,A_Good_Day.png,0
13985,A/A Horse Called Toto 1.png,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,A Horse Called Toto,A_Horse_Called_Toto_1.png,1
13986,A/A Horse Called Toto 2.png,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,A Horse Called Toto,A_Horse_Called_Toto_2.png,2
13987,A/A Horse Called Toto 3.png,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,A Horse Called Toto,A_Horse_Called_Toto_3.png,3
13988,A/A Horse Called Toto 4.png,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,A Horse Called Toto,A_Horse_Called_Toto_4.png,4


In [120]:
x1 = x.drop(["title", "spectrogram"], axis= 1)
x1.head()

Unnamed: 0,Angry,Busy & Frantic,Casino,Changing Tempo,Chasing,Countryside,Dark,Dreamy,Eccentric,Elegant,...,Smooth,Sneaking,Snowy Holiday,Sports Arena,Sunny Holiday,Suspense,Water,Weird,fast_ai_path,id
24,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,A_Good_Day.png,0
13985,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,A_Horse_Called_Toto_1.png,1
13986,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,A_Horse_Called_Toto_2.png,2
13987,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,A_Horse_Called_Toto_3.png,3
13988,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,A_Horse_Called_Toto_4.png,4


In [121]:
melted_spec = x.melt(id_vars=['id', 'title', 'spectrogram', "fast_ai_path"], var_name = 'tags', value_name='value')
melted_spec.head()

Unnamed: 0,id,title,spectrogram,fast_ai_path,tags,value
0,0,A Good Day,A/A Good Day.png,A_Good_Day.png,Angry,0
1,1,A Horse Called Toto,A/A Horse Called Toto 1.png,A_Horse_Called_Toto_1.png,Angry,0
2,2,A Horse Called Toto,A/A Horse Called Toto 2.png,A_Horse_Called_Toto_2.png,Angry,0
3,3,A Horse Called Toto,A/A Horse Called Toto 3.png,A_Horse_Called_Toto_3.png,Angry,0
4,4,A Horse Called Toto,A/A Horse Called Toto 4.png,A_Horse_Called_Toto_4.png,Angry,0


In [122]:
# Filter the 1s 
m2 = melted_spec[melted_spec.value == 1]
m2.head()

Unnamed: 0,id,title,spectrogram,fast_ai_path,tags,value
14,14,Agressive And Dark,A/Agressive And Dark 2.png,Agressive_And_Dark_2.png,Angry,1
15,15,Agressive And Dark,A/Agressive And Dark.png,Agressive_And_Dark.png,Angry,1
20,20,Back On The Horse,B/Back On The Horse.png,Back_On_The_Horse.png,Angry,1
88,88,Danger Street,D/Danger Street 4.png,Danger_Street_4.png,Angry,1
91,91,Danger Street,D/Danger Street 1.png,Danger_Street_1.png,Angry,1


In [123]:
#  for a song we will have one row for evey tag
# one song --2 moods -- 2 rows. 
exploded_tags = pd.merge(x1, m2, on='id').loc[:, ["id", "title", "spectrogram", "tags", "fast_ai_path_y"]]
exploded_tags.columns = ["id", "title", "spectrogram", "tags", "fast_ai_path"]

In [125]:
exploded_tags[exploded_tags.title == 'Agressive And Dark']

Unnamed: 0,id,title,spectrogram,tags,fast_ai_path
25,13,Agressive And Dark,A/Agressive And Dark 3.png,Dark,Agressive_And_Dark_3.png
26,13,Agressive And Dark,A/Agressive And Dark 3.png,Fear,Agressive_And_Dark_3.png
27,14,Agressive And Dark,A/Agressive And Dark 2.png,Angry,Agressive_And_Dark_2.png
28,14,Agressive And Dark,A/Agressive And Dark 2.png,Dark,Agressive_And_Dark_2.png
29,15,Agressive And Dark,A/Agressive And Dark.png,Angry,Agressive_And_Dark.png
30,15,Agressive And Dark,A/Agressive And Dark.png,Chasing,Agressive_And_Dark.png


<br/>
<br/>
<br/>
<br/>

### FINAL DATA BUNDLING
Will use this object for creating ImageDataBunch object in fastai.

<br/>
<br/>
<br/>
<br/>

### APPENDIX (multiple labels per row)

In [79]:
tags_per_song = exploded_tags.groupby(['id', 'title', 'spectrogram'])['tags']\
                                              .apply(','.join)\
                                              .reset_index()

In [89]:
tags_per_song = tags_per_song.sort_values(by='spectrogram')
tags_per_song

Unnamed: 0,id,title,spectrogram,tags
0,0,A Bitter Love (Instrumental Version),A/A Bitter Love (Instrumental Version).png,"Laid Back,Romantic"
15,15,Acoustic Delight,A/Acoustic Delight 1.png,"Peaceful,Romantic"
12,12,Acoustic Delight,A/Acoustic Delight 10.png,"Peaceful,Sentimental"
13,13,Acoustic Delight,A/Acoustic Delight 11.png,"Peaceful,Relaxing"
14,14,Acoustic Delight,A/Acoustic Delight 12.png,"Romantic,Sentimental"
...,...,...,...,...
409,409,Wreak Havoc,W/Wreak Havoc 03.png,"Lounge,Weird"
410,410,Wreak Havoc,W/Wreak Havoc 04.png,"Happy,Sunny Holiday"
411,411,Wreak Havoc,W/Wreak Havoc 05.png,"Busy & Frantic,Restless"
412,412,Yammerer,Y/Yammerer.png,Restless
