# External Validation: An Episode of Futurama

As an external validation exercise we will try to automatically detect gunshots in the futurama episode <a href=https://www.imdb.com/title/tt1630889/> "Law and Oracle".<a/>

Due to copyright issues, we can not provide the audio data itself. We provide only the features extracted from that data, in the file "extracted_features_futurama.csv". To obtain these features, we split the audio data for the episode into fragments each approximately 5 seconds in length, then applied the feature extraction procedures described in a previous notebook.

In [1]:
# Import required packages

import numpy as np
import pandas as pd
import pickle
from datetime import timedelta

In [2]:
# Load the features
# Note: These include only features described in a previous notebook as Feature Set 1 and 2, as 3 was not used in training the final classifiers

features_df = pd.read_csv('large_data/extracted_features_futurama.csv', index_col=0)
features_df.info()
features_df.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 260 entries, 0 to 259
Columns: 390 entries, eq_0 to hits_ratio
dtypes: float64(389), int64(1)
memory usage: 794.2 KB


Unnamed: 0,eq_0,eq_10,eq_20,eq_30,eq_40,eq_60,eq_80,eq_120,eq_160,eq_230,...,roll_128.logbands_mw_20.12,roll_128.logbands_mw_20.13,roll_128.logbands_mw_20.14,roll_128.logbands_mw_20.15,roll_128.logbands_mw_20.16,roll_128.logbands_mw_20.17,roll_128.logbands_mw_20.18,roll_128.logbands_mw_20.19,power_ratio,hits_ratio
0,0.197786,0.475224,0.640533,0.877277,1.040508,1.037066,1.244018,1.316525,1.339426,1.265299,...,0.934274,0.900729,0.862285,0.81294,0.730737,0.598143,0.409162,0.0,1.331745,-1.518403
1,1.010537,1.511049,1.229768,2.022973,2.413715,2.187437,2.187408,2.046839,1.960684,1.721527,...,0.911348,0.875964,0.831477,0.78359,0.703153,0.586626,0.408304,0.0,-0.259233,-0.858878
2,1.038472,1.468064,1.274101,2.044264,2.430573,2.155454,2.176625,2.008776,1.843365,1.62875,...,0.917274,0.890996,0.837766,0.78754,0.686259,0.553205,0.376001,0.0,-0.411571,-1.072553
3,0.9479,1.200151,1.109876,1.496561,1.800022,2.082368,1.825926,1.802636,1.699902,1.569342,...,0.928391,0.891245,0.851558,0.805795,0.721267,0.58829,0.399301,0.0,-0.04572,-1.097423
4,1.296087,1.254715,1.371302,1.391663,1.602395,1.804,1.734784,1.708646,1.770531,1.559705,...,0.933297,0.898499,0.847763,0.790472,0.698572,0.551165,0.356246,0.0,-0.699165,-1.169


In [3]:
# Make a list of the keys of features used to train the final classifiers in the previous notebook

classify_keys = ['eq_0', 'eq_10', 'eq_20', 'eq_30', 'eq_40', 'eq_60', 'eq_80', 'eq_120', 'eq_160', 'eq_230', 'eq_300', 'eq_450', 'eq_600', 'eq_900', 'eq_1200', 'eq_1800', 'eq_2400', 'eq_3700', 'eq_5000', 'eq_7500', 'eq_10000', 'eq_15000', 'eq_20000', 'crestfactor', 'harmonic_power', 'percussive_power', 'harmonic_hits', 'percussive_hits', 'power_ratio', 'hits_ratio', 'roll_32.spec_flatness.median', 'roll_32.spec_flatness.iqr', 'roll_32.spec_centroid.median', 'roll_32.spec_bandwidth.median', 'roll_32.spec_bandwidth.std', 'roll_32.y_mw_zcr', 'roll_64.spec_flatness.median', 'roll_64.spec_flatness.iqr', 'roll_64.spec_centroid.median', 'roll_64.spec_bandwidth.median', 'roll_64.spec_bandwidth.std', 'roll_64.y_mw_zcr', 'roll_128.spec_flatness.median', 'roll_128.spec_flatness.iqr', 'roll_128.spec_centroid.median', 'roll_128.spec_bandwidth.median', 'roll_128.spec_bandwidth.std', 'roll_128.y_mw_zcr', 'roll_64.logbands_mw_5.1', 'roll_64.logbands_mw_5.2', 'roll_64.logbands_mw_5.3', 'roll_64.logbands_mw_10.1', 'roll_64.logbands_mw_10.2', 'roll_64.logbands_mw_10.3', 'roll_64.logbands_mw_10.4', 'roll_64.logbands_mw_10.5', 'roll_64.logbands_mw_10.6', 'roll_64.logbands_mw_10.7', 'roll_64.logbands_mw_10.8', 'roll_64.logbands_mw_20.0', 'roll_64.logbands_mw_20.2', 'roll_64.logbands_mw_20.3', 'roll_64.logbands_mw_20.4', 'roll_64.logbands_mw_20.5', 'roll_64.logbands_mw_20.6', 'roll_64.logbands_mw_20.7', 'roll_64.logbands_mw_20.8', 'roll_64.logbands_mw_20.9', 'roll_64.logbands_mw_20.10', 'roll_64.logbands_mw_20.11', 'roll_64.logbands_mw_20.12', 'roll_64.logbands_mw_20.13', 'roll_64.logbands_mw_20.14', 'roll_64.logbands_mw_20.15', 'roll_64.logbands_mw_20.16', 'roll_64.logbands_mw_20.17', 'roll_64.logbands_mw_20.18']

We then import the pickled (i.e. saved) models for the logistic regression and random forest (These are not included on Github but can be generated at the end of the previous notebook). From here we will pass the data extracted from the futurama episode through the prediction function and analyze which clips, by index, are identified as gunshots.

In [4]:
forest = pickle.load(open('fullfeaturesRobustScaledRandomForest.pkl','rb'))
logistic = pickle.load(open('fullfeaturesRobustScaledlogistic.pkl','rb'))

In [5]:
X_test = features_df[classify_keys].values

In [6]:
forest_pred = forest.predict(X_test)
logistic_pred = logistic.predict(X_test)

In [7]:
# Print the resulting predicted labels for each ~5 second interval
print(forest_pred)
print(logistic_pred)

[2 7 7 9 9 9 2 2 9 9 2 2 2 3 3 2 9 9 2 6 6 2 9 2 3 2 6 2 9 9 9 9 9 9 9 2 2
 9 9 9 9 9 9 9 9 2 2 9 9 9 9 9 9 6 9 9 2 3 2 9 9 2 3 3 2 9 9 2 9 9 2 2 3 2
 9 9 9 9 9 6 8 9 9 6 9 9 9 9 9 6 7 9 2 9 9 2 6 3 9 9 9 3 9 2 9 9 7 7 9 2 2
 2 6 9 9 2 3 9 2 9 9 9 2 2 2 2 2 2 2 9 2 2 6 9 4 2 9 3 2 2 2 9 2 9 9 9 9 2
 2 9 6 9 9 9 9 2 9 9 9 2 9 9 2 9 9 9 2 9 9 2 9 9 3 2 2 9 9 9 3 2 9 2 9 9 9
 9 2 6 2 9 2 2 3 2 2 9 9 2 9 9 3 2 9 2 9 9 9 6 2 7 4 7 3 9 2 2 2 2 9 2 9 9
 9 9 2 9 2 2 9 2 9 6 9 9 2 9 2 2 2 2 2 2 2 9 2 9 9 9 9 2 2 9 2 7 7 7 7 7 9
 9]
[4 9 5 4 6 0 2 2 9 4 2 9 2 3 2 3 8 8 2 3 2 2 9 2 2 8 3 3 2 6 9 0 2 9 9 2 2
 9 9 9 9 2 9 2 9 9 8 9 9 4 9 9 9 6 9 9 9 8 2 3 9 8 2 3 2 2 9 8 9 9 2 9 3 2
 9 9 9 9 9 9 9 5 9 8 9 9 9 9 9 9 1 9 2 8 9 9 6 8 8 8 9 3 9 3 9 9 7 4 3 3 2
 2 2 9 9 2 2 9 2 9 3 9 2 8 8 8 2 9 2 9 2 2 3 6 2 6 9 2 2 2 2 9 2 3 2 9 6 2
 2 9 6 9 9 9 9 8 9 9 2 2 9 9 9 3 9 9 3 9 9 8 9 9 2 2 2 2 2 6 2 3 3 8 9 9 9
 2 2 8 2 3 2 2 2 2 2 9 2 2 2 2 2 3 9 9 9 2 8 2 2 3 9 0 3 6 9 2 2 2 9 8 9 2
 3 9 8 8 2 2 9 8 2 3 

The models return a predicted class for each 5 second interval, as they were trained to do. Of course, most of these labels can not possibly be correct, as most sounds in the episode do not belong to the 10 sound classes the models were trained on. Nevertheless, we can estimate when in the episode gun_shot predictions were made. In our presentation slides, we compare these predictions with the episode's true content.

In [8]:
desired_class = 6 # Recall from previous notebooks that classID for gun_shots is 6

where_forest = np.where(forest_pred == desired_class)[0]
where_logistic = np.where(logistic_pred == desired_class)[0]

In [9]:
print('Using the Random Forest model, possible gun shots were detected at approximate times:')
for index in where_forest:
    print(f'{timedelta(seconds=int(index)*5)}s')

Using the Random Forest model, possible gun shots were detected at approximate times:
0:01:35s
0:01:40s
0:02:10s
0:04:25s
0:06:35s
0:06:55s
0:07:25s
0:08:00s
0:09:20s
0:11:00s
0:12:30s
0:15:35s
0:17:15s
0:19:15s


In [10]:
print('Using the Logistic Regression model, possible gun shots were detected at approximate times:')
for index in where_logistic:
    print(f'{timedelta(seconds=int(index)*5)}s')

Using the Logistic Regression model, possible gun shots were detected at approximate times:
0:00:20s
0:02:25s
0:04:25s
0:08:00s
0:11:05s
0:11:15s
0:12:10s
0:12:30s
0:14:45s
0:17:45s
0:19:25s
0:20:25s
