<a href="https://colab.research.google.com/github/r-wisniewski/xG-LinearModel/blob/main/AHL_xG_LinearModel.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Acknowledgements

This project is heavily based on the linear model found here: https://www.tensorflow.org/tutorials/estimator/linear

In [None]:
%tensorflow_version 2.x

!pip install -q sklearn

#from __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf
import matplotlib.pyplot as plt # Dataset visualization.
import numpy as np              # Low-level numerical Python library.
import pandas as pd             # Higher-level numerical Python library.
from IPython.display import clear_output

3.6.9 (default, Oct  8 2020, 12:12:24) 
[GCC 8.4.0]




---
# Import scraped data

This step imports scraped event data (using ahl_scraper.py) from a csv file. Once imported, the labels and data are separated.


In [None]:
#import training and test data from csv's
# training data is from game id's 1017122 to 1020000
xg_df_train = pd.read_csv('https://github.com/r-wisniewski/xG-LinearModel/blob/main/training.csv')
# testing data is from game id's 1020001 to 1020558. 1020558 is the latest game we have data for
xg_df_testing = pd.read_csv('https://github.com/r-wisniewski/xG-LinearModel/blob/main/testing.csv')

# separate the labels and data
y_train = xg_df_train.pop('Goal')
y_testing = xg_df_testing.pop('Goal')

### Check the imported data

Let's have a quick look at what data has been imported

In [None]:
#check the first few rows of both dataframes
xg_df_train.head()
t_train.head()

#check the first few rows of both dataframes
xg_df_testing.head()
y_testing.head()



---

# Let's train!
Now that both the training and test data have been collected, let's train the model!

Since all the data is numerical, there is no need to convert any categorical data to numerical. Newer AHL game summaries are beginning to track types of shots, when enough of those game summaries are available we could implement shot type as an additional feature.

### Create the input function

In [None]:
# data_df = all data in table form, label_df = all associated labels in table form
def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
  def input_function():  # inner function, this will be returned
    # Convert the pandas dataframe into a tf.data.Dataset object. We want to convert our pandas "table" to this new object type before processing.
    ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))  # create tf.data.Dataset object with data and its associated label
    if shuffle:
      ds = ds.shuffle(1000)  # randomize order of data
    ds = ds.batch(batch_size).repeat(num_epochs)  # split dataset into batches of 32 and repeat process for number of epochs
    return ds  # return a batch of the dataset
  return input_function  ####### return a function object for use ######

train_input_fn = make_input_fn(xg_df_training, y_train)  # here we will call the input_function that was returned to us to get a dataset object we can feed to the model
eval_input_fn = make_input_fn(xg_df_testing, y_testing, num_epochs=1, shuffle=False) # we aren't training it here anymore so 1 epoch and no shuffling

In [None]:
feature_columns = ['XLocation', 'YLocation', 'Strength', 'Goal']
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)

linear_est.train(train_input_fn)  # uses the passed function "train_input_fn" to grab data and train the model
result = linear_est.evaluate(eval_input_fn)  # get model metrics/stats by running the model on testing data

# lets see how accurate the model is
print(result['accuracy'])

# Let's make some predictions

We can predict the expected goals of an event (x,y,strength) using the `.predict()` method. Using the `.predict()` method, we'll be able to predict the expected goals and generate heat maps for each strength.

In [None]:
#import prediction data from csv
xg_df_predict = pd.read_csv('https://github.com/r-wisniewski/xG-LinearModel/blob/main/prediction.csv')

# generate the prediction input function
pred_input_fun = make_input_fn(xg_df_predict, y_testing, num_epochs=1, shuffle=False)

# cast the predict return to a list. If not casted use next(pred_xxxx)
pred_dicts = list(linear_est.predict(pred_input_fn)) # this returns a predictions array (whether or not the event is a goal) for EACH input test data 
probs = pd.Series([pred['probabilities'][1] for pred in pred_dicts]) # take each prediction array and strip off the 2nd element ([1]) which is the chance of this event being a goal. Add to a pd.series.