# Neural Network

We'll now use a Neural Network to predict the players identity.

In [1]:
%matplotlib notebook

import pylab as plt
import numpy as np
import seaborn as sns; sns.set()

import keras
from keras.models import Sequential, Model
from keras.layers import Dense
from keras.optimizers import Adam

from sklearn.decomposition import PCA

Using TensorFlow backend.


## We'll start with the features and the topology proposed by the last year and train the NN with it. After we'll try with our features (generated by wave instead of each balloon).

In [20]:
data = np.genfromtxt('./features/kate_data_julien_sarah.csv', delimiter=',')
np.random.shuffle(data)

In [21]:
training_ratio = 0.85
l = len(data)
X = data[:,:-1]
y = data[:,-1]
X_train = X[:int(l*training_ratio)]
X_test = X[int(l*training_ratio):]
y_train = y[:int(l*training_ratio)]/2
y_test = y[int(l*training_ratio):]/2

In [22]:
y_train = keras.utils.np_utils.to_categorical(y_train.astype(int))
y_test = keras.utils.np_utils.to_categorical(y_test.astype(int))

# Dimensionality reduction with PCA

In [23]:
mu = X_train.mean(axis=0)
U,s,V = np.linalg.svd(X_train - mu, full_matrices=False)
Zpca = np.dot(X_train - mu, V.transpose())

Rpca = np.dot(Zpca[:,:2], V[:2,:]) + mu    # reconstruction
err = np.sum((X_train-Rpca)**2)/Rpca.shape[0]/Rpca.shape[1]
print('PCA reconstruction error with 2 PCs: ' + str(round(err,3)));
print(max(Zpca[:,0]))
print(min(Zpca[:,0]))
print(max(Zpca[:,1]))
print(min(Zpca[:,1]))

print(np.argmax(Zpca[:,0]))
print(np.argmax(Zpca[:,1]))

PCA reconstruction error with 2 PCs: 7.471
2234.6621633472378
-22.639794956373947
128.40015756257122
-369.87814850662505
94
94


# Building and training of a dnn

In [24]:
m = Sequential()
m.add(Dense(150,  activation='relu', input_shape=(105,)))
#m.add(Dense(150,  activation='relu')) 
m.add(Dense(150,  activation='relu'))
m.add(Dense(150,  activation='relu'))
m.add(Dense(50,  activation='relu'))
m.add(Dense(2,  activation='sigmoid'))
m.compile(loss='categorical_crossentropy', optimizer = Adam(), metrics=['accuracy'])

history = m.fit(X_train, y_train, batch_size=10, epochs=20, verbose=1, validation_data = (X_test, y_test))

Train on 1360 samples, validate on 240 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [25]:
y_pred = m.predict(X_test)

In [26]:
accuracy = m.evaluate(X_test, y_test)[1]
print("Précision old features: %.2f" % accuracy)

Précision old features: 0.81


## Now let's try with our features and with a different topology. Since we have computed a 12 dimension feature-vector, it would make no sense to use layers with more than 100 neurons as the last year group did.

In [17]:
X = np.genfromtxt('./features/features_wave_julian_sarah.csv', delimiter=',')
y = np.genfromtxt('./features/output_wave_julian_sarah.csv', delimiter=',')

p = np.random.permutation(len(X))
X, y = X[p], y[p]

training_ratio = 0.85
l = len(y)
X_train = X[:int(l*training_ratio)]
X_test = X[int(l*training_ratio):]
y_train = y[:int(l*training_ratio)]/2
y_test = y[int(l*training_ratio):]/2

y_train = keras.utils.np_utils.to_categorical(y_train.astype(int))
y_test = keras.utils.np_utils.to_categorical(y_test.astype(int))

In our case we have 12 dimensions to features instead. Let's apply the NN directly without a PCA to compare later

In [18]:
m = Sequential()
m.add(Dense(15,  activation='relu', input_shape=(12,)))
m.add(Dense(15,  activation='relu'))
m.add(Dense(15,  activation='relu'))
m.add(Dense(4,  activation='relu'))
m.add(Dense(2,  activation='sigmoid'))
m.compile(loss='categorical_crossentropy', optimizer = Adam(), metrics=['accuracy'])

history = m.fit(X_train, y_train, batch_size=10, epochs=20, verbose=1, validation_data = (X_test, y_test))

Train on 1360 samples, validate on 240 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [19]:
accuracy = m.evaluate(X_test, y_test)[1]
print("Précision New features: %.2f" % accuracy)

Précision New features: 0.88


# We got a precision of 88%, which is better than the 81% of the last year. But we would need to do an average to compare

Let's apply PCA now to reduce the dimension to 3

In [27]:
model_pca3 = PCA(n_components=3)

# On entraîne notre modèle (fit) sur les données
model_pca3.fit(X)

# On applique le résultat sur nos données :
X_reduced3 = model_pca3.transform(X)

In [28]:
training_ratio = 0.85
l = len(y)
X_train = X_reduced3[:int(l*training_ratio)]
X_test = X_reduced3[int(l*training_ratio):]
y_train = y[:int(l*training_ratio)]/2
y_test = y[int(l*training_ratio):]/2

y_train = keras.utils.np_utils.to_categorical(y_train.astype(int))
y_test = keras.utils.np_utils.to_categorical(y_test.astype(int))

In [29]:
m = Sequential()
m.add(Dense(20,  activation='relu', input_shape=(3,)))
#m.add(Dense(20,  activation='relu')) 
m.add(Dense(20,  activation='relu'))
m.add(Dense(20,  activation='relu'))
m.add(Dense(5,  activation='relu'))
m.add(Dense(2,  activation='sigmoid'))
m.compile(loss='categorical_crossentropy', optimizer = Adam(), metrics=['accuracy'])

history = m.fit(X_train, y_train, batch_size=10, epochs=20, verbose=1, validation_data = (X_test, y_test))

Train on 1360 samples, validate on 240 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


Using a PCA doesn't seems to improve the accuracy, but the accuracy seems to be more stable with respect to the epochs

## We conclude that the Neural Network gives a better precision that our previous alorithms. Computing the features by wave instead of by balloons also seems to improve the Neural Network precision. The only disavantage is that we get less data by doing so. If few data is available, it might be better to use the features computed by balloon instead