# GOAL 

The original dataset is very large.  
Generating a list of 200 sample songs and creating a dataset for their spectrograms.  

In [153]:
import pandas as pd
import re
import numpy as np
import random

In [137]:
df = pd.read_csv("tagger_tutorial_dataset/index.csv", header=0)

In [138]:
df.head()

Unnamed: 0,start,end,name,spectrogram,Angry,Busy & Frantic,Casino,Changing Tempo,Chasing,Countryside,...,Sentimental,Sexy,Smooth,Sneaking,Snowy Holiday,Sports Arena,Sunny Holiday,Suspense,Water,Weird
0,61.453417,91.453417,100 Years,1/100 Years.png,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,123.559667,153.559667,1901,1/1901.png,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,29.93775,59.93775,3 AM,3/3 AM.png,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
3,48.711,78.711,3 Corners,3/3 Corners.png,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,69.05925,99.05925,300 Years Old,3/300 Years Old.png,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [139]:
#Looking into the names of songs 
# we will be sampling only certain songs, so have to pick up all the spectograms belonging to that song. 
df.loc[:, ["name", "spectrogram"]].head()

Unnamed: 0,name,spectrogram
0,100 Years,1/100 Years.png
1,1901,1/1901.png
2,3 AM,3/3 AM.png
3,3 Corners,3/3 Corners.png
4,300 Years Old,3/300 Years Old.png


In [140]:
df = df.sort_values(by='name', axis = 0)
df.loc[:, ["name", "spectrogram"]].head()

Unnamed: 0,name,spectrogram
20581,100 Days Of Sunshine 1,1/100 Days Of Sunshine 1.png
20582,100 Days Of Sunshine 2,1/100 Days Of Sunshine 2.png
20717,100 Days Of Sunshine 3,1/100 Days Of Sunshine 3.png
0,100 Years,1/100 Years.png
25559,15s In A Trunk,1/15s In A Trunk.png


In [146]:
df["title"] = df.name.str.extract(r'(^[0-9a-z]+[^0-9]*)', re.IGNORECASE)
#df.title = df.title.str.strip()
df.loc[:, ["name", "spectrogram", "title"]].head()

Unnamed: 0,name,spectrogram,title
20581,100 Days Of Sunshine 1,1/100 Days Of Sunshine 1.png,100 Days Of Sunshine
20582,100 Days Of Sunshine 2,1/100 Days Of Sunshine 2.png,100 Days Of Sunshine
20717,100 Days Of Sunshine 3,1/100 Days Of Sunshine 3.png,100 Days Of Sunshine
0,100 Years,1/100 Years.png,100 Years
25559,15s In A Trunk,1/15s In A Trunk.png,15s In A Trunk


In [147]:
song_titles = df.title.unique().tolist()
print(f'Number of unique songs is : {len(song_titles)}')

Number of unique songs is : 13100


<br/>

Lets sample ~200 songs and then we'll go fetch their spectograms. Some songs have several spectrograms. 

In [151]:
n_songs = 200 # number of songs to sample

In [172]:
#random.seed(42)
sample_songs = sample(song_titles, n_songs)

In [245]:
sample_songs[0:10]

['The Experiment (Indie Pop Version)',
 'Swap Meet',
 'Heavy Hearts',
 'Mystic Riff',
 'House Of Go',
 'Walk To Prison Cell',
 'Behind The Clouds ',
 'A New Day Begins',
 'Name Tag',
 'Stellar Finale']

In [247]:
#locations of the spectrograms of the sampled songs 
sample_spectrograms = df[df['title'].isin(sample_songs)]
sample_spectrograms = sample_spectrograms.sort_values(by= 'title', axis=0)

<br/>

Lets go fetch their spectrogram images and put them in a separate dataset called sample_dataset_200.

In [249]:
sample_spectrograms.loc[:, ["title", "spectrogram"]].head(10)

Unnamed: 0,title,spectrogram
4576,60's Secretary,6/60's Secretary.png
19667,A Brighter Form Of Life,A/A Brighter Form Of Life 1.png
19779,A Brighter Form Of Life,A/A Brighter Form Of Life 2.png
24978,A Long Way From Home,A/A Long Way From Home.png
38,A New Day Begins,A/A New Day Begins.png
18401,A New Frontier,A/A New Frontier 1.png
18402,A New Frontier,A/A New Frontier 2.png
18403,A New Frontier,A/A New Frontier 3.png
107,Aftertouch,A/Aftertouch.png
11519,Agressive And Dark,A/Agressive And Dark 2.png


In [253]:
sample_specs = sample_spectrograms.spectrogram.tolist()
print(f'Going to copy over {len(sample_specs)} songs to the sample dataset')

Going to copy over 401 songs to the sample dataset


In [254]:
with open('sample_spectrograms.txt', 'w') as filehandle:
    for spec in sample_specs:
        filehandle.write('%s\n' % spec)

In [255]:
%%bash

if [ -d sample_dataset_200 ] ; then
    rm -r sample_dataset_200
    echo "Deleting older dataset"
fi

mkdir -p sample_dataset_200
echo "Creating a new dataset"

while IFS= read -r file; 
    do cp tagger_tutorial_dataset/"$file" sample_dataset_200; 
done < sample_spectrograms.txt

Deleting older dataset
Creating a new dataset
       1


In [256]:
%%bash
ls sample_dataset_200/ | wc -l

     401
