<a href="https://colab.research.google.com/github/sachinkun21/Data-Science-Projects/blob/main/BasketBall_Analysis_Using_Keras_2_0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### DEEP LEARNING WITH KERAS

This Notebook shows you how to solve a variety of problems using the versatile Keras functional API. 

- We will start with simple, multi-layer dense networks (also known as multi-layer perceptrons), and continue on to more complicated architectures.
 
-  We will learn how to build models with multiple inputs and a single output, as well as how to share weights between layers in a model. 
 
- We will also cover advanced topics such as category embeddings and multiple-output networks. 

- If you've ever wanted to train a network that does both classification and regression, then this is for you!

We will be using the 2 Basketball datasets from american college basket ball games in this notebook. Let's load it from google drive and explore it first.

Let's load the datasets from drive


In [1]:
from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /gdrive
/gdrive


In [2]:
%cd My Drive/DataScience/BasketBall_Project
%ls

/gdrive/My Drive/DataScience/BasketBall_Project
ch3_stacked_tourney_model.h5     games_season_enriched.csv
ch4_2_output_reg_class_model.h5  games_tourney.csv
ch4_2_output_reg_model.h5        simple_tourney_model.h5
games_season.csv


Basketball Datasets: College basketball data, 1989-2017
#### Dataset 1: Regular season
- Team ID 1
- Team ID 2
- Home vs Away
- Score Difference (Team 1 - Team 2)
- Team 1 Score
- Team 2 Score
- Won vs Lost


#### Dataset 2: Tournament games
- Same as Dataset 1
- Also has difference in Seed which is nothing but difference in ranking of the two teams



Two fundamental parts of any prediction Model:

- Input layer
- Output layer 


In [3]:
import pandas as pd
import numpy as np

df1 = pd.read_csv('games_tourney.csv')
df1.head()

Unnamed: 0,season,team_1,team_2,home,seed_diff,score_diff,score_1,score_2,won
0,1985,288,73,0,-3,-9,41,50,0
1,1985,5929,73,0,4,6,61,55,1
2,1985,9884,73,0,5,-4,59,63,0
3,1985,73,288,0,3,9,50,41,1
4,1985,3920,410,0,1,-9,54,63,0


In [4]:
df3 = pd.read_csv('games_season_enriched.csv')
df3.head()

Unnamed: 0,season,team_1,team_2,home,seed_diff,score_diff,score_1,score_2,won,pred
0,1985,288,73,0,-3,-9,41,50,0,-3.601452
1,1985,5929,73,0,4,6,61,55,1,0.474164
2,1985,9884,73,0,5,-4,59,63,0,-0.414316
3,1985,73,288,0,3,9,50,41,1,3.601452
4,1985,3920,410,0,1,-9,54,63,0,8.176179


In [5]:
df2 = pd.read_csv('games_season.csv')
df2.head()

Unnamed: 0,season,team_1,team_2,home,score_diff,score_1,score_2,won
0,1985,3745,6664,0,17,81,64,1
1,1985,126,7493,1,7,77,70,1
2,1985,288,3593,1,7,63,56,1
3,1985,1846,9881,1,16,70,54,1
4,1985,2675,10298,1,12,86,74,1


**Input layers**

The first step in creating a neural network model is to define the Input layer. This layer takes in raw data, usually in the form of numpy arrays. The shape of the Input layer defines how many variables your neural network will use. 

For example, if the input data has 10 columns, you define an Input layer with a shape of (10,).
In this case, you are only using one input in your network

In [6]:
# Load layers
from keras.layers import Input, Dense

# Create an input layer of shape 1
input_tensor = Input(shape=(1,))
print(type(input_tensor))

Using TensorFlow backend.




<class 'tensorflow.python.framework.ops.Tensor'>


**Dense layers**
Once you have an Input layer, the next step is to add a Dense layer.

Dense layers learn a weight matrix, where the first dimension of the matrix is the dimension of the input data, and the second dimension is the dimension of the output data. Recall that your Input layer has a shape of 1. In this case, your output layer will also have a shape of 1. This means that the Dense layer will learn a 1x1 weight matrix

In [7]:
# Dense layer
output_layer = Dense(1)

print(type(output_layer))
# Connect the dense layer to the input_tensor
output_tensor = output_layer(input_tensor)
print(type(output_tensor))

<class 'keras.layers.core.Dense'>

<class 'tensorflow.python.framework.ops.Tensor'>


***This network will take the input, apply a linear coefficient to it, and return the result.***

In [0]:
# Input/dense/output layers
from keras.layers import Input, Dense
input_tensor = Input(shape=(1,))
output_tensor = Dense(1)(input_tensor)


### **Build a model**
Once you've defined an input layer and an output layer, you can build a Keras model. The model object is how you tell Keras where the model starts and stops: where data comes in and where predictions come out.

In [0]:
# Build the model
from keras.models import Model
model = Model(input_tensor, output_tensor)

### **Compile a model**
The final step in creating a model is compiling it. Now that you've created a model, you have to compile it before you can fit it to data. This finalizes your model, freezes all its settings, and prepares it to meet some data!

During compilation, you specify the optimizer to use for fitting the model to the data, and a loss function. 'adam' is a good default optimizer to use, and will generally work well. Loss function depends on the problem at hand. Mean squared error is a common loss function and will optimize for predicting the mean, as is done in least squares regression.

Mean absolute error optimizes for the median and is used in quantile regression. For this dataset, 'mean_absolute_error' works pretty well, so use it as your loss function.

In [10]:
# Compile the model
model.compile(optimizer='adam', loss='mean_absolute_error')




### Visualize a model(Optional)
Now that you've compiled the model, take a look a the result of you! You can do this by looking at the model summary, as well as its plot.

The summary will tell you the names of the layers, as well as how many units they have and how many parameters are in the model.

The plot will show how the layers connect to each other.



In [11]:
# Import the plotting function
from keras.utils import plot_model
import matplotlib.pyplot as plt

# Summarize the model
model.summary()


Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, 1)                 0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 2         
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________


### Model fitting
Now that the model is compiled, you are ready to fit it to some data!

In this exercise, you'll use a dataset of scores from US College Basketball tournament games. Each row of the dataset has the team ids: team_1 and team_2, as integers. It also has the seed difference between the teams (seeds are assigned by the tournament committee and represent a ranking of how strong the teams are) and the score difference of the game (e.g. if team_1 wins by 5 points, the score difference is 5).

To fit the model, you provide a matrix of X variables (in this case one column: the seed difference) and a matrix of Y variables (in this case one column: the score difference).

In [12]:
# Now fit the model
model.fit(df1['seed_diff'], df1['score_diff'],
          epochs=1,
          batch_size=128,
          validation_split=0.1,
          verbose=True)




Train on 3810 samples, validate on 424 samples
Epoch 1/1







<keras.callbacks.History at 0x7f7442ad3278>

### Evaluate the model on a test set
After fitting the model, you can evaluate it on new data. You will give the model a new X matrix (also called test data), allow it to make predictions, and then compare to the known y variable (also called target data).

In this case, you'll use data from the post-season tournament to evaluate your model. The tournament games happen after the regular season games you used to train our model, and are therefore a good evaluation of how well your model performs out-of-sample.

The games_tourney_test DataFrame along with the fitted model object is available in your workspace.

In [13]:
# Load the X variable from the test data
X_test = df1['seed_diff']

# Load the y variable from the test data
y_test = df1['score_diff']

# Evaluate the model on the test data
print(model.evaluate(X_test, y_test, verbose=False))

9.74903702859784


Looks like your model makes pretty good predicitions!

In [14]:
df3.shape

(4234, 10)

In [15]:
df2.head()

Unnamed: 0,season,team_1,team_2,home,score_diff,score_1,score_2,won
0,1985,3745,6664,0,17,81,64,1
1,1985,126,7493,1,7,77,70,1
2,1985,288,3593,1,7,63,56,1
3,1985,1846,9881,1,16,70,54,1
4,1985,2675,10298,1,12,86,74,1


### Embedding Layer:

#### defining Team lookup
Shared layers allow a model to use the same weight matrix for multiple steps. In this exercise, you will build a "team strength" layer that represents each team by a single number. You will use this number for both teams in the model. The model will learn a number for each team that works well both when the team is team_1 and when the team is team_2 in the input data.

The games_season DataFrame is available in your workspace.

In [0]:
from keras.layers import Embedding

n_teams = df2.team_1.nunique()

embed_layer = Embedding( input_dim = n_teams,
                        output_dim = 1, 
                        input_length =1 ,
                        name = 'Team_Strength')

The embedding layer is a lot like a dictionary, but your model learns the values for each key

### Define team model
The team strength lookup has three components: an input, an embedding layer, and a flatten layer that creates the output.

If you wrap these three layers in a model with an input and output, you can re-use that stack of three layers at multiple places.

Note again that the weights for all three layers will be shared everywhere we use them.


- Create a 1D input layer for the team ID (which will be an integer). Be sure to set the correct input shape!
- Pass this input to the team strength lookup layer you created previously.
- Flatten the output of the team strength lookup.
- Create a model that uses the 1D input as input and flattened team strength as output.

In [0]:
from keras.layers import Input, Embedding , Flatten
from keras.models import  Model

# Count the unique number of teams
n_teams = df2.team_1.nunique()

# Create an input layer for the team ID
input_layer =  Input(shape = (1,))

# Create an embedding layer
embed_layer = Embedding( input_dim = n_teams,
                        output_dim = 1,
                        input_length = 1 ,
                        name = 'Strength')

# Lookup the input in the team strength embedding layer
team_strength = embed_layer(input_layer)

# Flatten the output
flat_strength_layer = Flatten()(team_strength)

# Combine the operations into a single, re-usable model
model = Model(input_layer, flat_strength_layer, name = 'Team_Strength_Model')

The model will be reusable, so you can use it in two places in your final model.

### Shared Layer:
Now we will define two input layers for the two teams in your model. This allows you to specify later in the model how the data from each team will be used differently.

```
#  Shared layers
shared_layer = Dense(1)
output_tensor_1 = shared_layer(input_tensor_1)
output_tensor_2 = shared_layer(input_tensor_2)
```



In [0]:
# Load the input layer from keras.layers
from keras.layers import Input

# Input layer for team 1
team_in_1 = Input( (1,), name ="Team-1-In" )

# Separate input layer for team 2
team_in_2 = Input((1,), name = "Team-2-In")

These two inputs will be used for the shared layer.

### **Lookup both inputs in the same model**
Now that you have a team strength model and an input layer for each team, you can lookup the team inputs in the shared team strength model. The two inputs will share the same weights.

In this dataset, you have 10,888 unique teams. You want to learn a strength rating for each team, such that if any pair of teams plays each other, you can predict the score, even if those two teams have never played before. Furthermore, you want the strength rating to be the same, regardless of whether the team is the home team or the away team.

To achieve this, you use a shared layer, defined by the re-usable model (team_strength_model()) you built in exercise 3 and the two input layers (team_in_1 and team_in_2) from the previous exercise, all of which are available in your workspace.



In [0]:
team1_strength = model(team_in_1)
team2_strength = model(team_in_2)

In [0]:
 # Import the Subtract layer from keras
from keras.layers import Subtract

# Create a subtract layer using the inputs from the previous exercise
score_diff = Subtract()([team1_strength, team1_strength])