In [4]:
def add_first(n):
    return sum(x for x in range(n))

In [5]:
add_first(5)

10

In [8]:
sum = 0
n = 5
for x in range(n):
    sum += x

In [9]:
sum

10

### Bonus: A Shallow Dive into Deep Learning

To put what I've learned in the Deep Learning Unit into practice, I will attempt to improve upon my Gradient Boosting model from the Machine Learning notebook.

In [1]:
# Import pandas for reading in the data
import pandas as pd
import matplotlib.pyplot as plt

The initial setup is the same as before.

In [2]:
matches = pd.read_csv('Resources/tennis_clean/atp_top_100_matches.csv',
                      index_col=['player_id', 'tournament_id', 'match_id'],
                      parse_dates=['tournament_date'], low_memory=False)

In [3]:
matches['points_diff'] = matches.points_won - (matches.points - matches.points_won)
matches['ranking_points_diff'] = matches.ranking_points - matches.opponent_ranking_points
matches['rank_diff'] = matches.opponent_rank - matches.player_rank
matches['height_diff'] = matches.player_height - matches.opponent_height

In [4]:
recent_matches = matches[(matches['tournament_date'] > '2009') & (matches['tournament_date'] < '2019')]
test_matches = matches[matches['tournament_date'] > '2019']

In [5]:
COLUMNS = ['ranking_points_diff', 'rank_diff',
       'recent_first_serve_percentage', 'recent_break_points_save_percentage', 'recent_service_points_won_percentage', 
       'recent_return_points_won_percentage',
       'recent_first_serves_won_percentage', 'recent_second_serves_won_percentage',
       'recent_first_serve_return_points_won_percentage',
       'recent_second_serve_return_points_won_percentage', 
       'recent_break_points_won_percentage',
       'recent_points_won_percentage',
       'past_year_first_serve_percentage', 'past_year_break_points_save_percentage', 
       'past_year_service_points_won_percentage', 
       'past_year_return_points_won_percentage',
       'past_year_first_serves_won_percentage', 'past_year_second_serves_won_percentage',
       'past_year_first_serve_return_points_won_percentage',
       'past_year_second_serve_return_points_won_percentage', 
       'past_year_break_points_won_percentage',
       'past_year_points_won_percentage',
       'career_first_serve_percentage', 'career_break_points_save_percentage', 'career_service_points_won_percentage', 
       'career_return_points_won_percentage',
       'career_first_serves_won_percentage', 'career_second_serves_won_percentage',
       'career_first_serve_return_points_won_percentage',
       'career_second_serve_return_points_won_percentage', 
       'career_break_points_won_percentage',
       'career_points_won_percentage', 'h2h', 'winrate', 'age_diff', 'result_value']

To create the neural network model, I will use the Keras library, which is built upon TensorFlow. Keras requires that the result column is in a categorical format, but it provides the function "to_categorical" for this purpose.

In [6]:
from keras.utils import to_categorical

X_train = recent_matches[COLUMNS].dropna().drop('result_value', axis=1)
y_train = recent_matches[COLUMNS].dropna().result_value

X_test = test_matches[COLUMNS].dropna().drop('result_value', axis=1)
y_test = test_matches[COLUMNS].dropna().result_value

y_train_cat = to_categorical(y_train) 
y_test_cat = to_categorical(y_test)

Using Theano backend.


The data must first be scaled to improve accuracy. The test data is not scaled independently, but according to the fit of the train data. After testing both the StandardScaler and the MinMaxScaler, the StandardScaler performs better and converges in fewer epochs.

In [7]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

To build the neural network, I used a simple sequential model with just one hidden layer containing 100 nodes. I tested many different configurations, but adding layers and nodes did not improve the final test accuracy score. The Rectified Linear Unit or "RELU" activation function performs better than sigmoid activation functions like "tanh". The output layer must have 2 nodes, representing a win or a loss. The "softmax" activation function is also necessary for binary classification, as it converts the outputs to probabilities and ensures that they add up to 1.

In [8]:
from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential()

model.add(Dense(100, activation='relu', input_dim=X_train_scaled.shape[1]))
model.add(Dense(2, activation='softmax'))

The model was compiled with the "adam" optimizer. I also chose the "categorical_crossentropy" loss function, a good default for classification. 

In [9]:
from keras.metrics import accuracy

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

When fitting the model, I used two callbacks provided by Keras. The EarlyStopping callback stops when the validation loss function does not improve after five epochs. The ModelCheckpoint callback saves the model with the best validation accuracy score. I experimented with splitting the training data to create an additional validation set, but setting the test data as the validation set optimizes for test set accuracy.

In [17]:
from keras.callbacks.callbacks import EarlyStopping
from keras.callbacks.callbacks import ModelCheckpoint

es = EarlyStopping(monitor='val_loss', patience=5)
mc = ModelCheckpoint('best_model.h5', monitor='val_accuracy', save_best_only=True)

history = model.fit(X_train_scaled, y_train_cat, 
          epochs=10, 
          callbacks= [es], #mc], 
          batch_size=32, 
          validation_data=(X_test_scaled, y_test_cat), 
          shuffle=True)

Train on 53676 samples, validate on 4996 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10


An evaluation of the best model shows a test accuracy of 66.5%. This outperforms any of my previous Machine Learning models, even after hyperparameter tuning. 

In [11]:
from keras.models import load_model

best_model = load_model('best_model.h5')
_, train_accuracy = best_model.evaluate(X_train_scaled, y_train_cat)
_, test_accuracy = best_model.evaluate(X_test_scaled, y_test_cat)
train_accuracy, test_accuracy



(0.7016729712486267, 0.6651321053504944)

I do not believe that the model has much room for improvement. For proof, I trained the model for 100 epochs, and plotted its history. As shown below, the test loss steadily increases after only a few epochs. Similarly, the test accuracy decreases after only a few epochs, then converges. Even at a training accuracy of 75%, the model is overfitting.

The presence of upsets in the match data throw off any attempt at finding general patterns in the features. Increasing the training accuracy can only reduce the model's ability to generalize for the test data. To improve the model at this point, I would need to add new features that can generalize more effectively when encountering upsets in the training data.

. | .
- | - 
![loss](epoch-loss.png) | ![loss](epoch-accuracy.png) 

### Conclusions

- Neural networks are not only useful for image and audio recognition. They can also be a powerful tool for classification.
- Simple models work well for simple data. Complexity should be added only when needed.
- Beware of overfitting. Longer training time does not mean better accuracy on unseen data.
- Utilize callbacks to save the best model, and to stop training early.
- If you're stuck, it might be time to look at the dataset's features.