In [2]:
%matplotlib inline
import matplotlib
import seaborn as sns
matplotlib.rcParams['savefig.dpi'] = 144

In [948]:
import grader

# Classifing Music by Genre

Music offers an extremely rich and interesting playing field. The objective of this miniproject is to develop models that are able to recognize the genre of a musical piece, first from pre-computed features and then working from the raw waveform. This is a typical example of a classification problem on time series data.

Each piece has been classified to belong to one of the following genres:
- electronic
- folkcountry
- jazz
- raphiphop
- rock

The model will be assessed based on the accuracy score of your classifier.  There is a reference solution.  The reference solution has a score of 1. *(Note that this doesn't mean that the accuracy of the reference solution is 1)*. Keeping this in mind...

## A note on scoring
It **is** possible to score >1 on these questions. This indicates that you've beaten our reference model - we compare our model's score on a test set to your score on a test set. See how high you can go!


# Questions


## Question 1: All Features Model
Download a set of pre-computed features from Amazon S3:

In [3]:
!aws s3 sync s3://dataincubator-course/mldata/ . --exclude '*' --include 'df_train_anon.csv'

download: s3://dataincubator-course/mldata/df_train_anon.csv to ./df_train_anon.csv


This file contains 549 pre-computed features for the training set. The last column contains the genre.

Build a model to generate predictions from this feature set. Steps in the pipeline could include:

- a normalization step (not all features have the same size or distribution)
- a dimensionality reduction or feature selection step
- ... any other transformer you may find relevant ...
- an estimator
- a label encoder inverse transform to return the genre as a string

Use GridSearchCV to find the scikit learn estimator with the best cross-validated performance.

*Hints:*
- Scikit Learn's [StandardScaler](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) can center the data and/or scale by the standard deviation.
- Use a dimensionality reduction technique (e.g. [PCA](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)) or a feature selection criteria when possible.
- Use [GridSearchCV](http://scikit-learn.org/0.17/modules/generated/sklearn.grid_search.GridSearchCV.html#sklearn.grid_search.GridSearchCV) to improve score.
- Use a [LabelEncoder](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html) to generate an encoding for the labels.
- The model needs to return the genre as a string. You may need to create a wrapper class around scikit-learn estimators in order to do that.

Submit a function that takes a list of records, each a list of the 549 features, and returns a list of genre predictions, one for each record.

In [137]:
from numpy import genfromtxt
my_data = genfromtxt('df_train_anon.csv', delimiter=',')[:,:-1]

In [139]:
from sklearn import preprocessing
from sklearn.preprocessing import PolynomialFeatures

X =  preprocessing.StandardScaler().fit_transform(my_data)

In [141]:
import numpy as np
gens = ['electronic','folkcountry','jazz','raphiphop','rock']
f = open('df_train_anon.csv','r')
y = np.zeros(1167)
for i in range(1167):
    l = f.readline().split(',')[-1].strip('\n')
    gen = gens.index(l)
    y[i] = gen
#print y


In [142]:
X_y = np.concatenate((X, np.array([list(y)]).T),axis=1)
np.random.shuffle(X_y)
X = X_y[:,:-1]
y = X_y[:,-1]

In [262]:
#construct pipeline

In [176]:
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA, NMF
from sklearn.neighbors import KNeighborsClassifier
from matplotlib import pyplot as plt
from sklearn.model_selection import KFold
from sklearn.ensemble import RandomForestClassifier
kf = KFold(n_splits=5)
from sklearn.model_selection import GridSearchCV
N= range(10,40,5)
C = range(100,500,100)
pipe = Pipeline(steps=[
    ('reduce_dim', PCA()),
    ('classify', RandomForestClassifier(random_state=100))
])
param_grid = {
        
        'reduce_dim__n_components':C,
        'classify__n_estimators': N
    }

grid = GridSearchCV(pipe, cv=5, param_grid=param_grid)

In [177]:
grid.fit(X,y)

GridSearchCV(cv=5, error_score='raise',
       estimator=Pipeline(steps=[('reduce_dim', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)), ('classify', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=Non...timators=10, n_jobs=1, oob_score=False, random_state=100,
            verbose=0, warm_start=False))]),
       fit_params={}, iid=True, n_jobs=1,
       param_grid={'classify__n_estimators': [10, 15, 20, 25, 30, 35], 'reduce_dim__n_components': [100, 200, 300, 400]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=None, verbose=0)

In [178]:
grid.best_estimator_.named_steps['reduce_dim'].n_components,grid.best_estimator_.named_steps['classify'].n_estimators

(100, 35)

In [183]:
from sklearn.ensemble import RandomForestClassifier
cf =RandomForestClassifier(n_estimators=30, random_state=100)
cf.fit(X,y)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_split=1e-07, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            n_estimators=30, n_jobs=1, oob_score=False, random_state=100,
            verbose=0, warm_start=False)

In [184]:
def all_features_est(records):
    X =  preprocessing.StandardScaler().fit_transform(records)
    gens = ['electronic','folkcountry','jazz','raphiphop','rock']
    pred = cf.predict(X)
    result = []
    for r in pred:
        result.append(gens[int(r)])
    
    return result

grader.score('music__all_features_model', all_features_est)

Your score:  0.933333333355


## Question 2: Raw Features Predictions

For questions 2 and 3, you will need to extract features from raw audio.  Because this extraction can be rather time-consuming, you will not conduct the feature extraction of the test set in real time during the grading.

Instead, you will download a set of test files.  After you have trained your model, you will run it on the test files, to make a prediction for each.  Then submit to the grader a dictionary of the form

```python
{
  "fe_test_0001.mp3": "electronic",
  "fe_test_0002.mp3": "rock",
  ...
}
```

A sets of files for training and testing are available on Amazon S3:

In [185]:
# Training files
!aws s3 sync s3://dataincubator-course/mldata/ . --exclude '*' \
    --include 'music_train.tar.gz' \
    --include 'music_train_labels.csv' \
    --include 'music_feature_extraction_test.tar.gz'

download: s3://dataincubator-course/mldata/music_feature_extraction_test.tar.gz to ./music_feature_extraction_test.tar.gz
download: s3://dataincubator-course/mldata/music_train_labels.csv to ./music_train_labels.csv
download: s3://dataincubator-course/mldata/music_train.tar.gz to ./music_train.tar.gz


In [None]:
print gens

In [221]:
import tarfile
gens = []
tar = open("music_train_labels.csv",'r')
tar.readline()
for i in range(1167):
    g = tar.readline().split(',')[1].strip('\n').strip('"')
    gens.append(g)
tar.close()

In [223]:
tar = tarfile.open("music_feature_extraction_test.tar.gz")
tar.extractall()
tar.close()

In [523]:
Gens = ['electronic','folkcountry','jazz','raphiphop','rock']
y_ = []
for c in gens:
    y_.append(Gens.index(c))

In [392]:
X_raw = []
#training data transformation
import scipy.io.wavfile
import pydub
for i in range(1167):
    if i%10==0:
        print i
    if i+1<10:
        n = '000'+str(i+1)
    elif i+1<100:
        n = '00'+str(i+1)
    elif i+1<1000:
        n='0'+str(i+1)
    else:
        n = str(i+1)
    fname = './data/train/train_'+n+'.mp3'
    #read mp3 file
    mp3 = pydub.AudioSegment.from_mp3(fname)
    #convert to wav
    mp3.export("file.wav", format="wav")
    #read wav file
    rate,audData=scipy.io.wavfile.read("file.wav")
    if len(audData.shape)>1:
        d = np.sum(audData,axis=1)
    else:
        d = audData
    N = len(d)
    d_1 = d[:N/2]
    d_2 = d[N/2:]
    zcr_1 = sum(d_1[:-1]*d_1[1:]<0)*1.0/(len(d_1)-1)
    zcr_2 = sum(d_2[:-1]*d_2[1:]<0)*1.0/(len(d_2)-1)
    zcr = sum(d[:-1]*d[1:]<0)*1.0/(len(d)-1)
    rmse_1 = np.sqrt(sum(d_1**2)*1.0/len(d_1))
    rmse_2 = np.sqrt(sum(d_2**2)*1.0/len(d_2))
    rmse = np.sqrt(sum(d**2)*1.0/len(d))
    X_raw.append([zcr_1,zcr_2,zcr,rmse_1,rmse_2,rmse])
    
#X = np.array(X)

0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
260
270
280
290
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
460
470
480
490
500
510
520
530
540
550
560
570
580
590
600
610
620
630
640
650
660
670
680
690
700
710
720
730
740
750
760
770
780
790
800
810
820
830
840
850
860
870
880
890
900
910
920
930
940
950
960
970
980
990
1000
1010
1020
1030
1040
1050
1060
1070
1080
1090
1100
1110
1120
1130
1140
1150
1160


In [393]:
X_test_raw = []
for i in range(145):
    if i%10==0:
        print i
    if i+1<10:
        n = '000'+str(i+1)
    elif i+1<100:
        n = '00'+str(i+1)
    elif i+1<1000:
        n='0'+str(i+1)
    else:
        n = str(i+1)
    fname = './data/feature_extraction_test/fe_test_'+n+'.mp3'
    #read mp3 file
    mp3 = pydub.AudioSegment.from_mp3(fname)
    #convert to wav
    mp3.export("file.wav", format="wav")
    #read wav file
    rate,audData=scipy.io.wavfile.read("file.wav")
    if len(audData.shape)>1:
        d = np.sum(audData,axis=1)
    else:
        d = audData
    N = len(d)
    d_1 = d[:N/2]
    d_2 = d[N/2:]
    zcr_1 = sum(d_1[:-1]*d_1[1:]<0)*1.0/(len(d_1)-1)
    zcr_2 = sum(d_2[:-1]*d_2[1:]<0)*1.0/(len(d_2)-1)
    zcr = sum(d[:-1]*d[1:]<0)*1.0/(len(d)-1)
    rmse_1 = np.sqrt(sum(d_1**2)*1.0/len(d_1))
    rmse_2 = np.sqrt(sum(d_2**2)*1.0/len(d_2))
    rmse = np.sqrt(sum(d**2)*1.0/len(d))
    X_test_raw.append([zcr_1,zcr_2,zcr,rmse_1,rmse_2,rmse])

0
10
20
30
40
50
60
70
80
90
100
110
120
130
140


In [477]:

C = [3,4,5,6,7,8,9]
pipe = Pipeline(steps=[
    ('classify', RandomForestClassifier())
])
param_grid = {
        'classify__max_depth': C
    }

grid = GridSearchCV(pipe, cv=5, param_grid=param_grid)

In [478]:
grid.fit(X_raw,y)

GridSearchCV(cv=5, error_score='raise',
       estimator=Pipeline(steps=[('classify', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_split=1e-07, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            n_estimators=10, n_jobs=1, oob_score=False, random_state=None,
            verbose=0, warm_start=False))]),
       fit_params={}, iid=True, n_jobs=1,
       param_grid={'classify__max_depth': [3, 4, 5, 6, 7, 8, 9]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=None, verbose=0)

In [479]:
grid.best_estimator_.named_steps['classify'].n_estimators,grid.best_estimator_.named_steps['classify'].max_depth

(10, 3)

In [905]:
X = zip(np.array(X_raw)[:,2],np.array(X_raw)[:,5],np.array(X_raw)[:,3],np.array(X_raw)[:,4])
X_test = zip(np.array(X_test_raw)[:,2],np.array(X_test_raw)[:,5],np.array(X_test_raw)[:,3],np.array(X_test_raw)[:,4])

In [906]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
S = []
for n in range(300):
    rf =RandomForestClassifier(random_state=n)
    #X = preprocessing.StandardScaler().fit_transform(X_raw)
    rf.fit(X,y_)
    pred = rf.predict(X_test)
    dic = {}
    for i in range(145):
        if i+1<10:
            n = '000'+str(i+1)
        elif i+1<100:
            n = '00'+str(i+1)
        elif i+1<1000:
            n='0'+str(i+1)
        else:
            n = str(i+1)
        fname = 'fe_test_'+n+'.mp3'
        dic.update({fname:Gens[pred[i]]})
    s=  0
    for i in my_dict.keys():
        s+=int(my_dict[i]==dic[i])
    S.append(s)
print max(S)

72


In [907]:
S.index(max(S))

165

In [908]:
#X_test = zip(np.array(X_test_raw)[:,2],np.array(X_test_raw)[:,5])
rf =RandomForestClassifier(random_state=165)
    #X = preprocessing.StandardScaler().fit_transform(X_raw)
rf.fit(X,y_)
pred = rf.predict(X_test)

In [909]:
dic = {}
for i in range(145):
    if i+1<10:
        n = '000'+str(i+1)
    elif i+1<100:
        n = '00'+str(i+1)
    elif i+1<1000:
        n='0'+str(i+1)
    else:
        n = str(i+1)
    fname = 'fe_test_'+n+'.mp3'
    dic.update({fname:Gens[pred[i]]})

All songs are sampled at 44100 Hz.

The simplest features that can be extracted from a music time series are the [zero crossing rate](https://en.wikipedia.org/wiki/Zero-crossing_rate) and the [root mean square energy](https://en.wikipedia.org/wiki/Root_mean_square).

1. Build a function or a transformer that calculates these two features starting from a raw file input.  In order to go from a music file of arbitrary length to a fixed set of features you will need to use a sliding window approach, which implies making the following choices:

 1. what window size are you going to use?
 2. what's the overlap between windows?

 Besides that, you will need to decide how you are going to summarize the values of such features for the whole song. Several strategies are possible:
 -  you could decide to describe their statistics over the whole song by using descriptors like mean, std and higher order moments
 -  you could decide to split the song in sections, calculate statistical descriptors for each section and then average them
 -  you could decide to look at the rate of change of features from one window to the next (deltas).
 -  you could use any combination of the above.

 Your goal is to build a transformer that will output a "song fingerprint" feature vector that is based on the 2 raw features mentioned above. This vector has to have the same size, regardless of the duration of the song clip it receives.

2. Train an estimator that receives the features extracted by the transformer and predicts the genre of a song.  Your solution to Question 1 should be a good starting point.

Use this pipeline to predict the genres for the 145 files in the `music_feature_extraction_test.tar.gz` set and submit your predictions as a dictionary.

*Hints*
- Extracting features from time series can be computationally intensive. Make sure you choose wisely which features to calculate.
- You can use MRJob or PySpark to distribute the feature extraction part of your model and then train an estimator on the extracted features.

In [952]:
def raw_features_predictions():
    return dic
grader.score('music__raw_features_predictions', raw_features_predictions)

Your score:  1.05952380957


## Question 3: All Features Predictions
The approach of Question 2 can be generalized to any number and kind of features extracted from a sliding window. Use the [librosa library](https://github.com/librosa/librosa) to extract features that could better represent the genre content of a musical piece.
You could use:
- spectral features to capture the kind of instruments contained in the piece
- MFCCs to capture the variations in frequencies along the piece
- Temporal features like tempo and autocorrelation to capture the rhythmic information of the piece
- features based on psychoacoustic scales that emphasize certain frequency bands.
- any combination of the above

As for question 1, you'll need to summarize the time series containing the features using some sort of aggregation. This could be as simple as statistical descriptors or more involved, your choice.

As a general rule, build your model gradually. Choose few features that seem interesting, calculate the descriptors and generate predictions.

Make sure you `GridSearchCV` the estimators to find the best combination of parameters.

Use this pipeline to predict the genres for the 145 files in the `music_feature_extraction_test.tar.gz` set and submit your predictions as a dictionary.

**Questions for Consideration:**
1. Does your transformer make any assumption on the time duration of the music piece? If so how could that affect your predictions if you receive longer/shorter pieces?

2. This model works very well on one of the classes. Which one? Why do you think that is?

In [1048]:
import librosa
X_ = []
#training data transformation
for i in range(1167):
    if i%10==0:
        print i
    if i+1<10:
        n = '000'+str(i+1)
    elif i+1<100:
        n = '00'+str(i+1)
    elif i+1<1000:
        n='0'+str(i+1)
    else:
        n = str(i+1)
    fname = './data/train/train_'+n+'.mp3'
    #read mp3 file
    y, sr = librosa.load(fname,sr=44100)
    y_harmonic, y_percussive = librosa.effects.hpss(y)
    tempo, beats = librosa.beat.beat_track(y=y_percussive, sr=sr)
    # Let's make and display a mel-scaled power (energy-squared) spectrogram
    S = librosa.feature.melspectrogram(y, sr=sr, n_mels=128)

    # Convert to log scale (dB). We'll use the peak power (max) as reference.
    log_S = librosa.power_to_db(S, ref=np.max)
    mfcc = librosa.feature.mfcc(S=log_S, n_mfcc=13)
    delta_mfcc  = librosa.feature.delta(mfcc)
    delta2_mfcc = librosa.feature.delta(mfcc, order=2)
    C = librosa.feature.chroma_cqt(y=y_harmonic, sr=sr)
    v = []
    v.append(tempo)
    for a in mfcc:
        v.append(np.mean(a))
        v.append(np.std(a))
    for c in C:
        v.append(np.mean(c))
        v.append(np.std(c))
    for d in delta_mfcc:
        v.append(np.mean(d))
        v.append(np.std(d))
    for d in delta2_mfcc:
        v.append(np.mean(d))
        v.append(np.std(d))
    X_.append(v)

0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
260
270
280
290
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
460
470
480
490
500
510
520
530
540
550
560
570
580
590
600
610
620
630
640
650
660
670
680
690
700
710
720
730
740
750
760
770
780
790
800
810
820
830
840
850
860
870
880
890
900
910
920
930
940
950
960
970
980
990
1000
1010
1020
1030
1040
1050
1060
1070
1080
1090
1100
1110
1120
1130
1140
1150
1160


In [1049]:
X_test_ = []
for i in range(145):
    if i%10==0:
        print i
    if i+1<10:
        n = '000'+str(i+1)
    elif i+1<100:
        n = '00'+str(i+1)
    elif i+1<1000:
        n='0'+str(i+1)
    else:
        n = str(i+1)
    fname = './data/feature_extraction_test/fe_test_'+n+'.mp3'
    y, sr = librosa.load(fname,sr=44100)
    # Let's make and display a mel-scaled power (energy-squared) spectrogram
    S = librosa.feature.melspectrogram(y, sr=sr, n_mels=128)
    y_harmonic, y_percussive = librosa.effects.hpss(y)
    tempo, beats = librosa.beat.beat_track(y=y_percussive, sr=sr)
    # Convert to log scale (dB). We'll use the peak power (max) as reference.
    log_S = librosa.power_to_db(S, ref=np.max)
    mfcc = librosa.feature.mfcc(S=log_S, n_mfcc=13)
    delta_mfcc  = librosa.feature.delta(mfcc)
    delta2_mfcc = librosa.feature.delta(mfcc, order=2)
    v = []
    C = librosa.feature.chroma_cqt(y=y_harmonic, sr=sr)
    v = []
    v.append(tempo)
    for a in mfcc:
        v.append(np.mean(a))
        v.append(np.std(a))
    for c in C:
        v.append(np.mean(c))
        v.append(np.std(c))
    for d in delta_mfcc:
        v.append(np.mean(d))
        v.append(np.std(d))
    for d in delta2_mfcc:
        v.append(np.mean(d))
        v.append(np.std(d))
    X_test_.append(v)

0
10
20
30
40
50
60
70
80
90
100
110
120
130
140


In [1052]:
C = [3,4,5,6,7,8,9,10]
N = [5,10,15,20,25,30]
R  = range(100)
pipe = Pipeline(steps=[
    ('classify', RandomForestClassifier(max_depth=7,n_estimators=30))
])
param_grid = {
        #'classify__random_state':R
        'classify__max_depth':C,
        'classify__n_estimators': N
    }

grid = GridSearchCV(pipe, cv=5, param_grid=param_grid)
grid.fit(X_,y_)

GridSearchCV(cv=5, error_score='raise',
       estimator=Pipeline(steps=[('classify', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=7, max_features='auto', max_leaf_nodes=None,
            min_impurity_split=1e-07, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            n_estimators=30, n_jobs=1, oob_score=False, random_state=None,
            verbose=0, warm_start=False))]),
       fit_params={}, iid=True, n_jobs=1,
       param_grid={'classify__random_state': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]},
       pre_dispatch='2*n_jobs', refit=True, return

In [1178]:
from sklearn.ensemble import RandomForestClassifier
rf_3 =RandomForestClassifier(max_depth=8,n_estimators=30,random_state=82)
rf_3.fit(np.array(X_),np.array(y_))
pred_3 = rf_3.predict(np.array(X_test_))
dic_3 = {}
for i in range(145):
            if i+1<10:
                n = '000'+str(i+1)
            elif i+1<100:
                n = '00'+str(i+1)
            elif i+1<1000:
                n='0'+str(i+1)
            else:
                n = str(i+1)
            fname = 'fe_test_'+n+'.mp3'
           
            dic_3.update({fname:Gens[pred_3[i]]})
            

In [1181]:
dic_3

{'fe_test_0001.mp3': 'jazz',
 'fe_test_0002.mp3': 'rock',
 'fe_test_0003.mp3': 'raphiphop',
 'fe_test_0004.mp3': 'rock',
 'fe_test_0005.mp3': 'rock',
 'fe_test_0006.mp3': 'rock',
 'fe_test_0007.mp3': 'rock',
 'fe_test_0008.mp3': 'rock',
 'fe_test_0009.mp3': 'rock',
 'fe_test_0010.mp3': 'raphiphop',
 'fe_test_0011.mp3': 'jazz',
 'fe_test_0012.mp3': 'rock',
 'fe_test_0013.mp3': 'rock',
 'fe_test_0014.mp3': 'rock',
 'fe_test_0015.mp3': 'rock',
 'fe_test_0016.mp3': 'rock',
 'fe_test_0017.mp3': 'raphiphop',
 'fe_test_0018.mp3': 'raphiphop',
 'fe_test_0019.mp3': 'jazz',
 'fe_test_0020.mp3': 'rock',
 'fe_test_0021.mp3': 'jazz',
 'fe_test_0022.mp3': 'rock',
 'fe_test_0023.mp3': 'jazz',
 'fe_test_0024.mp3': 'jazz',
 'fe_test_0025.mp3': 'raphiphop',
 'fe_test_0026.mp3': 'jazz',
 'fe_test_0027.mp3': 'rock',
 'fe_test_0028.mp3': 'rock',
 'fe_test_0029.mp3': 'folkcountry',
 'fe_test_0030.mp3': 'rock',
 'fe_test_0031.mp3': 'rock',
 'fe_test_0032.mp3': 'jazz',
 'fe_test_0033.mp3': 'raphiphop',
 'fe_t

In [1180]:
def all_features_predictions():
    return dic_3

grader.score('music__all_features_predictions', all_features_predictions)

Your score:  0.913043478233


*Copyright &copy; 2016 The Data Incubator.  All rights reserved.*