# Appendix - Comparing Chrsitopher Zita's RNNs to a Naive Model

In this project, I found that just predicting a three day moving average of fantasy points for each player was only slightly less effective than using XGBoost or RNNs. Out of curiosity, I want to use this naive prediction on the dataset used in the write-up that inspired this project.

In that article, Christopher Zita used multivariable RNNs to predict the 2018-2019 seasons of Tom Brady and Todd Gurley more accurately that most 'expert' fantasy predictions. As training data he used 7 previous seasons (if they existed).

I will simply take the three day moving average of fantasy points scored by each player as a prediction for the subsequent game. Let's see if this beats Zita.

#### Import Packages

In [1]:
import datetime as dt
import os
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error, mean_absolute_error

#### Read in Data

In [2]:
players = {}
split_date = dt.datetime(2018,5,1)
path = '../data/appendix_zita_naive'


for filename in os.listdir(path):
    if '.csv' in filename:
        tmp_df = pd.read_csv(os.path.join(path,filename))
        tmp_df['Date'] = pd.to_datetime(tmp_df['Date'])
        season_idx = np.min(tmp_df.loc[tmp_df['Date']>split_date].index)
        tmp_df = tmp_df.iloc[season_idx-3:,:] # Keep only 2018-19 season + 3 games before for moving average
    
        players[filename.split('.')[0]] = tmp_df

for key, value in players.items():    
    print('{}:'.format(key))
    print(value.head())

gurley:
    Rk  G#       Date   Tm Unnamed: 4  Opp   Result Pos   Att   Yds  ...  \
41  13  13 2017-12-10  LAR        NaN  PHI  L 35-43  HB   4.0  15.0  ...   
42  14  14 2017-12-17  LAR          @  SEA   W 42-7  HB   6.0  11.0  ...   
43  15  15 2017-12-24  LAR          @  TEN  W 27-23  HB   4.0  15.0  ...   
44   1   1 2018-09-10  LAR          @  OAK  W 33-13  RB   5.0  19.0  ...   
45   2   2 2018-09-16  LAR        NaN  ARI   W 34-0  HB  10.0  33.0  ...   

    Tgt.1  Rec.1  Yds.3  TD.3  Num     Pct  Num.1  FantPt  DKPt FDPt  
41    0.0      0    0.0     0   45  93.80%   25.5    28.5  27.0  NaN  
42    0.0      0    0.0     0   43  63.20%   42.0    48.0  43.5  NaN  
43    1.0      1    3.0     1   62  92.50%   39.6    55.6  44.6  NaN  
44    0.0      0    0.0     0   59  93.70%   20.7    26.7  22.2  NaN  
45    0.0      0    0.0     0   49  68.10%   29.3    32.3  30.8  NaN  

[5 rows x 28 columns]
brady:
     Rk  G#       Date   Tm Unnamed: 4  Opp   Result Pos  Cmp  Att  ...  \
105 

#### Make Predictions

In [3]:
for key, value in players.items():
    value['pred'] = value['FantPt'].rolling(3).mean().shift()
    value = value.loc[value['Date']>split_date]
    rmse = np.sqrt(mean_squared_error(value['FantPt'], value['pred']))
    mae = mean_absolute_error(value['FantPt'], value['pred'])
    print('{}:'.format(key))
    print("RMSE: {:.2f}\n MAE: {:.2f}".format(rmse, mae))

gurley:
RMSE: 11.58
 MAE: 9.96
brady:
RMSE: 7.95
 MAE: 6.74


For Brady, Zita's RNN achieved and MAE of 5.3, while the best expert predictions got 4.9. For Gurley, Zita's RNN was the strongest at 5.68.

So, Zita's gains over a baseline mean prediction appear to be sizeable.