# Categorical Embeddings
- Input: integers
- Output: floats
- Note: Increased dimensionality: output layer flattens back to 2D

![image](image.png)

- Categorical embeddings are an advanced type of layer, only available in deep learning libraries. They are extremely useful for dealing with high cardinality categorical data. In this dataset, the team ID variable has high cardinality. 

In [1]:
# Import all necessary libraries
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Embedding, Flatten
from tensorflow.keras.models import Model

2023-05-30 03:42:44.140696: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-05-30 03:42:44.140720: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


In [2]:
# Read csv file for game season
games_season = pd.read_csv('datasets/games_season.csv')
games_season.head()

Unnamed: 0,season,team_1,team_2,home,score_diff,score_1,score_2,won
0,1985,3745,6664,0,17,81,64,1
1,1985,126,7493,1,7,77,70,1
2,1985,288,3593,1,7,63,56,1
3,1985,1846,9881,1,16,70,54,1
4,1985,2675,10298,1,12,86,74,1


In [3]:
games_season.shape

(312178, 8)

## Create embedding layer

In [4]:
# Imports
from tensorflow.keras.layers import Embedding
from numpy import unique

# Count the unique number of teams
n_teams = unique(games_season['team_1']).shape[0]

# Create an embedding layer that maps each team ID to a single number representing that team's strength
team_lookup = Embedding(input_dim=n_teams, #<-- total features to map
                        output_dim=1, #<-- output shape should be 1 dimension (as we want to represent the teams by a single rating)
                        input_length=1, #<--  input length should be 1 dimension (as each team is represented by exactly one id)
                        name='Team-Strength')

**Embedding Layer:** 
- Turns positive integers (indexes) i.e `input_dim` into dense vectors of fixed size i.e `output_dim`.
- Unlike one-hot encoding, for mapping 10888 unique teams, it requires 10888x10888 array, with _Embedding Layer_ it only requires, 10888 x output_dim = 10888x1

## Define team model

In [5]:
# Imports
from tensorflow.keras.layers import Input, Embedding, Flatten
from tensorflow.keras.models import Model

# Create an input layer for the team ID
teamid_in = Input(shape=(1,))

# Lookup the input in the team strength embedding layer
strength_lookup = team_lookup(teamid_in)

# Flatten the output
strength_lookup_flat = Flatten()(strength_lookup)

# Combine the operations into a single, re-usable model
team_strength_model = Model(teamid_in, strength_lookup_flat, name='Team-Strength-Model')

2023-05-30 03:42:45.830089: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-05-30 03:42:45.830111: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-05-30 03:42:45.830126: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (19a8477c-1ee5-4228-b4aa-1cfddec36545): /proc/driver/nvidia/version does not exist
2023-05-30 03:42:45.830316: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [6]:
# Model summary
team_strength_model.summary()

Model: "Team-Strength-Model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 1)]               0         
                                                                 
 Team-Strength (Embedding)   (None, 1, 1)              10888     
                                                                 
 flatten (Flatten)           (None, 1)                 0         
                                                                 
Total params: 10,888
Trainable params: 10,888
Non-trainable params: 0
_________________________________________________________________


# Shared Layers
- Require Functional API
- Very flexible

![image-2](image-2.png)


In [7]:
# Load the input layer from tensorflow.keras.layers
from tensorflow.keras.layers import Input

# Input layer for team 1
team_in_1 = Input(shape=(1,), name="Team-1-In")

# Separate input layer for team 2
team_in_2 = Input(shape=(1,), name="Team-2-In")

In [8]:
# SHARED LAYER

# Lookup team 1 in the team strength model
team_1_strength = team_strength_model(team_in_1)

# Lookup team 2 in the team strength model
team_2_strength = team_strength_model(team_in_2)

# Merge Layers
- Add, Subtract, and Multiply layers do simple arithmetic operations by element on the input layers, and require them to be the same shape.
- Concatenate layers simply append the 2 layers together, similar to the hstack() function from numpy. Unlike the other merge layers, the Concatenate layer can operate on layers with different numbers of columns.

![image-3](image-3.png)


In [9]:
# Import the Subtract layer from tensorflow.keras
from tensorflow.keras.layers import Subtract

# Create a subtract layer using the inputs from the previous exercise
score_diff = Subtract()([team_1_strength, team_2_strength])

In [10]:
# Create the model
model = Model([team_in_1, team_in_2], score_diff)

# Compile the model
model.compile(optimizer='adam', loss='mean_absolute_error')

In [11]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 Team-1-In (InputLayer)         [(None, 1)]          0           []                               
                                                                                                  
 Team-2-In (InputLayer)         [(None, 1)]          0           []                               
                                                                                                  
 Team-Strength-Model (Functiona  (None, 1)           10888       ['Team-1-In[0][0]',              
 l)                                                               'Team-2-In[0][0]']              
                                                                                                  
 subtract (Subtract)            (None, 1)            0           ['Team-Strength-Model[0][0]',

# Predict from your model

In [12]:
# Get the team_1 column from the regular season data
input_1 = games_season['team_1']

# Get the team_2 column from the regular season data
input_2 = games_season['team_2']

# Fit the model to input 1 and 2, using score diff as a target
model.fit([input_1,input_2],
          games_season['score_diff'],
          epochs=1,
          batch_size=2048,
          validation_split=0.1,
          verbose=True)



<keras.callbacks.History at 0x7f0b1e9001c0>

In [13]:
# Evaluate the model on the tournament test data

games_tourney = pd.read_csv('datasets/games_tourney.csv')

# Get team_1 from the tournament data
input_1 = games_tourney['team_1']

# Get team_2 from the tournament data
input_2 = games_tourney['team_2']

# Evaluate the model using these inputs
print(model.evaluate([input_1, input_2], games_tourney['score_diff'], verbose=False))

11.681157112121582
