<h1>Don't call me turkey!</h1>
<br>
This dataset is based on AudioSet’s data available <a href="https://research.google.com/audioset/">here</a>. The data contains video IDs and time bounds for youtube clips, as well as 128-dimensional audio-based features created with VGGish based on these clips.<br>
We must predict if the sound clip from which audio_embedding originates contains a turkey sound.<br>
AudioSet's dataset is under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, while their ontology is under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. Our data includes data from both, modified to fit the <a href="https://www.kaggle.com/c/dont-call-me-turkey">Kaggle</a> competition format.<br>
<br>
<h3>Objective</h3><br>
Find the turkey sound signature from pre-extracted audio features in order to classify the dataset.<br><br>
<h3>Content</h3><br>
This notebook is divided into:
<ol>
    <li><a href="#basic">Basic information</a></li>
    <li><a href="#cleaning">Data cleaning</a></li>
    <li><a href="#engineering">Feature engineering</a></li>
    <li><a href="#training">Model training</a></li>
    <li><a href="#prection">Prediction</a></li>
</ol><br>
<h3>File description</h3>
<ul>
    <li>train.json - training set</li>
    <li>test.json - test set</li>
    <li>sample_submission.csv - sample submission file in the correct format</li>
</ul><br>
<h3>Data description</h3>
<ul>
    <li>vid_id: YouTube video ID associated with this sample</li>
    <li>start_time_seconds_youtube_clip: Where in the YouTube video this audio feature starts</li>
    <li>end_time_seconds_youtube_clip: Where in the YouTube video this audio feature ends</li>
    <li>audio_embedding: Extracted frame-level audio feature, embedded down to 128 dimensions per frame using AudioSet’s VGGish tools available <a href="https://github.com/tensorflow/models/tree/master/research/audioset">here</a></li>
    <li>is_turkey: The target: whether or not the original audio clip contained a turkey. Label is a soft label, based on whether or not AudioSet’s ontology labeled this clip with “Turkey”, and may count turkey calls and other related content as being “turkey”. is_turkey is 1 if the clip contains a turkey sound, and 0 if it does not</li>
</ul>

<h1 style="font-size:18px">Import libraries</h1>

In [1]:
# Numpy for numerical computing
import numpy as np

# Pandas for Dataframes
import pandas as pd
pd.set_option('display.max_columns',100)

# Matplolib for visualization
from matplotlib import pyplot as plt
# display plots in the notebook
%matplotlib inline

# Seaborn for easier visualization
import seaborn as sns

<h1 style="font-size:18px">Load files</h1>

In [2]:
train = pd.read_json('train.json')
test = pd.read_json('test.json')

<br id="basic">
# 1. Basic information
Let's first check some informations about the dataset for each loaded file, as:
* Dimension
* View the first 3 rows

In [3]:
# Dataframe dimensions
print('The dimension of the training set is:',train.shape,'\n')
print('The dimension of the test set is:',test.shape,'\n')
print('First 3 rows of the training set:\n')
train.head(3)

The dimension of the training set is: (1195, 5) 

The dimension of the test set is: (1196, 4) 

First 3 rows of the training set:



Unnamed: 0,audio_embedding,end_time_seconds_youtube_clip,is_turkey,start_time_seconds_youtube_clip,vid_id
0,"[[172, 34, 216, 110, 208, 46, 95, 66, 161, 125...",70,0,60,kDCk3hLIVXo
1,"[[169, 20, 165, 102, 205, 62, 110, 103, 211, 1...",40,1,30,DPcGzqHoo7Y
2,"[[148, 8, 138, 60, 237, 48, 121, 108, 145, 177...",240,1,230,7yM63MTHh5k


<br id="preprocessing">
# 2. Data preprocessing

Firstly we will divide the dataset into features and target variable.

In [4]:
# Function for splitting training and test set
from sklearn.model_selection import train_test_split

from tensorflow.python.keras.preprocessing.sequence import pad_sequences

  from ._conv import register_converters as _register_converters


In [5]:
# Create variables for the training set
X = train.audio_embedding
y = train.is_turkey

# Create variable for the test set
X_test = test.audio_embedding

In [6]:
# Split X and y into train and test sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=1234)

# Print number of observations in X_train, X_val, y_train, and y_val
print('Number of examples:\n', 'X_train = ', len(X_train), ', y_train = ', len(y_train),
      '\n X_val = ', len(X_val), ' , y_val = ', len(y_val))

Number of examples:
 X_train =  956 , y_train =  956 
 X_val =  239  , y_val =  239


In [7]:
# Pad the audio features to have length 10
X_train = pad_sequences(X_train, maxlen=10)
X_val = pad_sequences(X_val, maxlen=10)
X_test = pad_sequences(X_test, maxlen=10)

In [8]:
y_train = np.asarray(y_train)
y_val = np.asarray(y_val)

<br id="training">
# 3. Model training
For the model we will use the Keras library from Tensorflow, through the following steps:
1. Layers setup: define the number of layers, number of nodes of each layer, and their respectively activation function
2. Compile: define optimizer, loss and metric
3. Training: train the model

<h1 style="font-size:18px">Import libraries</h1>

In [9]:
import tensorflow as tf
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Bidirectional, LSTM, BatchNormalization, Dropout

<h1 style="font-size:18px">Layers setup</h1><br>
The "keras.layers.Dense" layers are densely-connected, or fully-connected, neural layers.
* The first Dense layer has 128 nodes, with the ReLU as the activation function
* The second Dense layer has 10 nodes, which determine through the softmax function, the probability of the current image belong to one of the 10 classes.

In [10]:
# Sequential layers
model = Sequential([
    BatchNormalization(input_shape=(10,128)),
    Dropout(0.5),
    Bidirectional(LSTM(128, activation='relu')),
    Dense(1, activation='sigmoid')])

<h1 style="font-size:18px">Compile</h1><br>
Configures the model for the training.
* optimizer: how the model is updated based on the data and its loss function
* loss: objective that the model wants to minimize
* metrics: monitor the training and test steps

In [11]:
# Set the training parameters
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [12]:
# For a binary classification problem
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

<h1 style="font-size:18px">Training</h1><br>
Train the model for a given number of iterations (epochs) on the dataset.

In [13]:
# Train the model
model.fit(X_train, y_train, batch_size=200, epochs=5, validation_data=(X_val, y_val))

Train on 956 samples, validate on 239 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x1b327484160>

In [14]:
# Get accuracy of model on validation data. It's not AUC but it's something at least!
score, acc = model.evaluate(X_val, y_val, batch_size=300)
print('Test accuracy:', acc)

Test accuracy: 0.8828451633453369


<br id="prediction">
# 4. Prediction
The model is now ready to predict the test set classes.

In [15]:
# Predict X_test
predictions = model.predict_classes(X_test)

In [16]:
# Save the result for submission
result = pd.DataFrame()
result['vid_id'] = test.vid_id
result['is_turkey'] = predictions

result.to_csv('submission.csv', index=None)

At the Kaggle competition, this model gave me score of 0.95657.