# Basketball Predictions
In this workshop we're going to be training a machine learning algorithm for predicting basketball games based on previous basketball outcomes.

First thing, make sure to paste in the name of your specific S3 bucket that you created earlier.

In [None]:
bucket = 'YOUR BUCKET NAME HERE'
prefix =  bucket + '/DEMO-linear-dm'

We need to input some packages for our script to work. These include things like pandas to more easily work with our data, and boto3 which is the python SDK for AWS.

In [None]:
# Define IAM role
import boto3
import re
from sagemaker import get_execution_role
import numpy as np                                # For matrix operations and numerical processing
import pandas as pd                               # For munging tabular data
from IPython.display import display               # For displaying outputs in the notebook
from time import gmtime, strftime                 # For labeling SageMaker models, endpoints, etc.
import sys                                        # For writing outputs to notebook
import math                                       # For ceiling function
import json                                       # For parsing hosting outputs
import os                                         # For manipulating filepath names
import io
import sagemaker.amazon.common as smac
import sagemaker                                  # Amazon SageMaker's Python SDK provides many helper functions
from sagemaker.predictor import csv_serializer    # Converts strings for HTTP POST requests on inference

Then we have a little bit of setup. We pull the current execution role which we will need later on. 

We also upload a local file to S3, and then pull it back down to work on it.
In a production environment you wouldn't want to do this, but the code is here to show how to pull files from S3.

In [None]:
role = get_execution_role()
filename = 'basketball_predictions_112119.csv'
boto3.client('s3').upload_file(filename, bucket, filename)

s3 = boto3.resource('s3')
object = s3.Object(bucket,'basketball_predictions_112119.csv')

Here we are loading the content from the csv file into a local data object using pandas. Then we're simply setting some display options so that we can view the data easily.

In [None]:
data = pd.read_csv(object.get()['Body'])
pd.set_option('display.max_columns', 5)     # Make sure we can see all of the columns
pd.set_option('display.max_rows', 15)       # Keep the output on one page
data

We need to ensure the data is in the proper format that our training algorithm will expect.

In [None]:
model_data = data.astype(float)

Here we are setting up our training X and Y values.

The Y value will be the value to be predicted. In the case it is the point differential between the two teams. A positive point differential indicates a victory for team 1, whereas a negative point differential indicates a victory for team 2.

The X value will be the data to be looked at when attempting to predict a winner. In this case it is all data except for the point differential itself.

In [None]:
train_y = model_data['point_diff (N)']
train_X = model_data.drop('point_diff (N)', axis=1)

Then we are importing the prebuilt sagemaker container that corresponds to the desired training algorithm. In this case it is the "linear-learner", which is a linear regression based algorithm.

In [None]:
from sagemaker.amazon.amazon_estimator import get_image_uri
container = get_image_uri(boto3.Session().region_name, 'linear-learner')

Here we are uploading our data set that we have created earlier to S3.

In [None]:
buf = io.BytesIO()
smac.write_numpy_to_dense_tensor(buf, np.array(train_X).astype('float32'), np.array(train_y).astype('float32'))
buf.seek(0)
key = 'linear_train.data'
boto3.resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train', key)).upload_fileobj(buf)
s3_train_data = 's3://{}/{}/train/{}'.format(bucket, prefix, key)

## Train Your Model

This is the code that actually trains the model. It allows us to specify the number and types of instances on which we will train. We will also define our hyperparamaters which are different variables that affect exactly how our model will be trained. This process can take a couple of minutes.

In [None]:
sess = sagemaker.Session()

linear = sagemaker.estimator.Estimator(container,
                                       role, 
                                       train_instance_count=1, 
                                       train_instance_type='ml.m5.large',
                                       output_path='s3://{}/{}/output'.format(bucket, prefix),
                                       sagemaker_session=sess)

linear.set_hyperparameters(feature_dim=4,
                           mini_batch_size=1,
                           predictor_type='regressor',
                           epochs=5,
                           loss='squared_loss')

linear.fit({'train': s3_train_data})

## Deploy Your model

Next we want to deploy our trained model so that we can query it for predictions. The following code takes the model we just trained and deploys it onto a t2.medium sized instance.

In [None]:
lin_predictor = linear.deploy(initial_instance_count=1,
                           instance_type='ml.t2.medium')

## Test Your Model

Now the only thing left to do is test to see how your model performs. The following code queries the model we have created and deployed for a prediction based on the listed values.

Feel free to alter these values and see how it affects the prediction.

In [None]:
away_team_elo = 1471
home_team_elo = 1328
away_team_last10 = 0.5
home_team_last10 = 0.2
matchup = [away_team_elo,home_team_elo,away_team_last10,home_team_last10]


endpoint_name = 'YOUR ENDPOINT HERE'
predictor = sagemaker.predictor.RealTimePredictor(endpoint=endpoint_name,   #create predictor to send serialized data to sagemaker
                                                serializer=sagemaker.predictor.csv_serializer,
                                                content_type='text/csv')

response = predictor.predict(matchup)
response

# Clean up
Uncomment the following line (delete the # sign) in order to delete the endpoint you just created

In [None]:
#sagemaker.Session().delete_endpoint(endpoint_name)