<a href="https://www.bigdatauniversity.com"><img src = "https://ibm.box.com/shared/static/jvcqp2iy2jlx2b32rmzdt0tx8lvxgzkp.png" width = 300, align = "center"></a>

<h1 align=center><font size = 5> MUSIC GENRE CLASSIFICATION USING TENSORFLOW - PART 1</font></h1>

<div class="alert alert-block alert-info" style="margin-top: 20px">
<font size = 3><strong>In this notebook we provide a brief overview of the music project, explain the data set, and help you extract features from the raw data</strong></font>
<br>
- <p><a href="#ref2">Overview of the project</a></p>
- <p><a href="#ref3">Brief explanation of the data set</a></p>
- <p><a href="#ref4">Feature extraction</a></p>
<p></p>
</div>
<br>

----------------

<a id="ref2"></a>
# Overview of the project

In this project, our objective is to classify a sample of music into a specific genre using convolutional networks. Similar to the way images of handwritten numbers are classified into any one of 10 digit classes in the MNIST data set, here we take samples/excerpts of music and classify these excerpts into any one of 9 genres - Alternative, Blues, Electronic, Folk/Country, Funk/Soul/R&B, Jazz, Pop, Rap/Hiphop, Rock.

The project is broken into two modules. In this first of two modules, we will download the data set, understand how the data set is structured, and extract relevant features. These features will be used later in the second module to train convolutional networks for the purpose of genre classification. 

<a id="ref3"></a>
# Brief explanation of the data set

We will use the [Music Audio Benchmark Data Set](http://www-ai.cs.uni-dortmund.de/audio.html) for this project, created by [Homburg et al. (2005)](http://www-ai.cs.uni-dortmund.de/audio.html) and [Mierswa et al. (2005)](http://www-ai.cs.uni-dortmund.de/audio.html). The data set contains 1886 song excerpts, from 9 genres (listed above), encoded in mp3 format. The frequency and bitrate of these files are 44,100 Hz and 128 kb respectively. Each excerpt is 10 seconds in duration. For ease of use, the data has already been imported and split for training the networks and testing how the networks perform. The train:test data split is approximately 80:20, respectively.

In the next section, we will extract the necessary features for genre classification.

<a id="ref4"></a>
# Feature extraction

In a computer vision problem such as handwritten digit recognition using MNIST, we can use features such as pixel values or RGB values. Audio signals, particularly music excerpts, are temporal and more complex. As such, they offer a rich variety of features ranging from low-level acoustic to high-level musical features. Low-level features could be spectral and spectrotemporal features while high-level features could be tempo, rhythm, key, pitch, and harmonic information. Since deep learning is primarily designed for feature learning, it makes sense to use lower level features in this project. So, we will use mel frequency cepstral coefficients (MFCCs) as our features.

MFCCs are supposed to mimic the logarithmic perception of loudness and pitch of the human auditory system. Although MFCCs have been used mostly in speech recognition, they have become very popular as features for problems within the music information retrieval community. For a deeper understanding of MFCCs, we highly recommend going through the following sources: 1) [This excellent video](https://archive.org/details/SpectrogramCepstrumAndMel-frequency_636522) by Kishore Prahallad, and 2) this short [explanatory tutorial](http://www.practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/).

We will use [Librosa](http://librosa.github.io/librosa/), a python package for music and audio analysis, for extracting MFCC features from music.

In [None]:
# Put this before all imports for the librosa package    
!pip install librosa

In [None]:
# Get the links for all the Music Training data (part1) and Music Testing data (part2) 
!wget -q -c --progress=bar:force https://ibm.box.com/shared/static/zv4o44z5wzpyqgjvq07ed00dqlpb7mho.gz -O part1.tar.gz
!wget -q -c --progress=bar:force https://ibm.box.com/shared/static/mfa7hosvxe179713jgnpt2m7z1nwrxc1.gz -O part2.tar.gz

In [None]:
# List the downloaded tar zipped files
!ls -lah part*.tar.gz

In [None]:
# Import all necessary packages

import warnings
warnings.filterwarnings('ignore')

import os
import tarfile
import csv
import numpy as np
import pandas as pd
import librosa as lr

import matplotlib.pyplot as plt
import IPython

In [None]:
# Open tar files for the training and testing data

part1Tar = tarfile.open(name="part1.tar.gz", mode="r:*")
part2Tar = tarfile.open(name="part2.tar.gz", mode="r:*")

allTrainFiles = part1Tar.getnames()
allTestFiles = part2Tar.getnames()

csvTrainFiles = [f for f in allTrainFiles if ".csv" in f and os.path.basename(f)[0]!="."]
mp3TrainFiles = [f for f in allTrainFiles if ".mp3" in f and os.path.basename(f)[0]!="."]

csvTestFiles = [f for f in allTestFiles if ".csv" in f and os.path.basename(f)[0]!="."]
mp3TestFiles = [f for f in allTestFiles if ".mp3" in f and os.path.basename(f)[0]!="."]

print "Total CSV Files in Training Set: {0}".format(len(csvTrainFiles))
print "Total MP3 Files in Training Set: {0}".format(len(mp3TrainFiles))
print("\n")
print "Total CSV Files in Test Set: {0}".format(len(csvTestFiles))
print "Total MP3 Files in Test Set: {0}".format(len(mp3TestFiles))


## Parse CSV files first to get the genre labels for each song

In [None]:
# Obtain genre labels for the training data
# Store data into a DataFrame
_framesTrain = []
for f in csvTrainFiles:    
    fObjTrain = part1Tar.extractfile(f)    
    _framesTrain.append(pd.read_csv(fObjTrain))
    
trainLabels = pd.concat(_framesTrain)

In [None]:
# Obtain genre labels for the test data
# Store data into a DataFrame
_framesTest = []
for f in csvTestFiles:    
    fObjTest = part2Tar.extractfile(f)    
    _framesTest.append(pd.read_csv(fObjTest))
    
testLabels = pd.concat(_framesTest)

### A few samples of these dictionaries:

In [None]:
# Training data - Print song names and genre labels of first 5 rows
trainLabels.head()

In [None]:
# Test data - Print song names and genre labels of first 5 rows
testLabels.head()

### Map music genre categories to numbers

In [None]:
categories = list(set(trainLabels.Genre))
categories.sort()
categories = pd.Series(categories)

categories

### Parse each mp3 file to compute MFCC features

In [None]:
# Here, we get the feature data for the training set
mfcc = []
y = []
sr = []
labels = []
labelNums = []

# Librosa loads and decodes the audio as a time series 'y', 
# represented as a one-dimensional NumPy floating point array. 
# The variable 'sr' contains the sampling rate of 'y', i.e., the number of samples per second of audio. 
# By default, all audio is mixed to mono and resampled to 22050 Hz at load time. 

dataSetTrain = pd.DataFrame(columns=['Song','mfcc','y','sr','label','Genre'])

for f in mp3TrainFiles:
    fObjTrain = part1Tar.extract(f)
    
    _songTrain = os.path.basename(f)
    _yTrain, _srTrain = lr.load(f)
    _mfccTrain = lr.feature.mfcc(y=_yTrain, sr=_srTrain, n_mfcc=40)   
    _labelTrain = trainLabels.Genre[trainLabels.Song==_songTrain].values[0]
    _labelNumTrain = categories[categories==_labelTrain].index[0]

    mfcc.append(_mfccTrain)
    y.append(_yTrain)
    sr.append(_srTrain)
    labels.append(_labelTrain) 
    labelNums.append(_labelNumTrain)
    
    _dfTrain = pd.DataFrame([[_songTrain, _mfccTrain, _yTrain , _srTrain, _labelNumTrain, 
                              _labelTrain]], columns=['Song','mfcc','y','sr','label','Genre'])
    dataSetTrain = dataSetTrain.append(_dfTrain, ignore_index=True)    
    print os.path.basename(_dfTrain.Song.values[-1]),

In [None]:
# Look at the last 5 rows of the Training data
dataSetTrain.tail()

In [None]:
# Play a random excerpt back from the training set! (randomized)
sample = np.random.randint(len(dataSetTrain.index))

print "Playing:", dataSetTrain.Song[sample]
print "Genre: ", dataSetTrain.Genre[sample]
print "mfcc: ", dataSetTrain.mfcc.values[sample]

IPython.display.Audio(data=dataSetTrain.y[sample], rate=sr[sample])

In [None]:
# Visualize the MFCCs for the randomly selected excerpt
%matplotlib inline
lr.display.specshow(dataSetTrain.mfcc.values[sample], x_axis='time')
plt.colorbar()
plt.title('MFCC')
plt.tight_layout()
plt.show

In [None]:
# Get the feature data for the test set
mfcc = []
y = []
sr = []
labels = []
labelNums = []

# Librosa loads and decodes the audio as a time series 'y', 
# represented as a one-dimensional NumPy floating point array. 
# The variable 'sr' contains the sampling rate of 'y', i.e., the number of samples per second of audio. 
# By default, all audio is mixed to mono and resampled to 22050 Hz at load time. 
dataSetTest = pd.DataFrame(columns=['Song','mfcc','y','sr','label','Genre'])

for f in mp3TestFiles:
    fObjTest = part2Tar.extract(f)
    
    _songTest = os.path.basename(f)
    _yTest, _srTest = lr.load(f)
    _mfccTest = lr.feature.mfcc(y=_yTest, sr=_srTest, n_mfcc=40)   
    _labelTest = testLabels.Genre[testLabels.Song==_songTest].values[0]
    _labelNumTest = categories[categories==_labelTest].index[0]

    mfcc.append(_mfccTest)
    y.append(_yTest)
    sr.append(_srTest)
    labels.append(_labelTest) 
    labelNums.append(_labelNumTest)
    
    _dfTest = pd.DataFrame([[_songTest, _mfccTest, _yTest, _srTest, _labelNumTest, 
                              _labelTest]], columns=['Song','mfcc','y','sr','label','Genre'])
    dataSetTest = dataSetTest.append(_dfTest, ignore_index=True)    
    print os.path.basename(_dfTest.Song.values[-1]),

In [None]:
# Look at the last 5 rows of the Test data
dataSetTest.tail()

In [None]:
# Save the training feature DataFrame as a CSV file
dataSetTrain.to_csv('trainMFCC.csv')

# Save the test feature DataFrame as a CSV file
dataSetTest.to_csv('testMFCC.csv')

In [None]:
# A quick test to see if the CSV files can be re-imported as pandas DataFrames
reimp = pd.read_csv('trainMFCC.csv')
reimp.head()

## Want to learn more?

Running deep learning programs usually needs a high performance platform. PowerAI speeds up deep learning and AI. Built on IBM's Power Systems, PowerAI is a scalable software platform that accelerates deep learning and AI with blazing performance for individual users or enterprises. The PowerAI platform supports popular machine learning libraries and dependencies including Tensorflow, Caffe, Torch, and Theano. You can download a [free version of PowerAI](https://cocl.us/ML0120EN_PAI).

Also, you can use Data Science Experience to run these notebooks faster with bigger datasets. Data Science Experience is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, DSX enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of DSX users today with a free account at [Data Science Experience](https://cocl.us/ML0120EN_DSX)This is the end of this lesson. Hopefully, now you have a deeper and intuitive understanding regarding the LSTM model. Thank you for reading this notebook, and good luck on your studies.