# Two Input Networks Using Categorical Embeddings, Shared Layers, and Merge Layers
>  In this chapter, you will build two-input networks that use categorical embeddings to represent high-cardinality data, shared layers to specify re-usable building blocks, and merge layers to join multiple inputs to a single output. By the end of this chapter, you will have the foundational building blocks for designing neural networks with complex data flows.

- toc: true 
- badges: true
- comments: true
- author: Lucas Nunes
- categories: [Datacamp]
- image: images/datacamp/___

> Note: This is a summary of the course's chapter 2 exercises "Advanced Deep Learning with Keras" at datacamp. <br>[Github repo](https://github.com/lnunesAI/Datacamp/) / [Course link](https://www.datacamp.com/tracks/machine-learning-scientist-with-python)

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams['figure.figsize'] = (8, 8)

In [2]:
import tensorflow as tf
from keras.utils import to_categorical
from keras.models import Sequential, Model
from keras.layers import Dense, Input, Subtract
from keras.callbacks import EarlyStopping, ModelCheckpoint

## Category embeddings

### Define team lookup

<div class=""><p>Shared layers allow a model to use the same weight matrix for multiple steps. In this exercise, you will build a "team strength" layer that represents each team by a single number. You will use this number for both teams in the model. The model will learn a number for each team that works well both when the team is <code>team_1</code> and when the team is <code>team_2</code> in the input data.</p>
<p>The <code>games_season</code> DataFrame is available in your workspace.</p></div>

In [None]:
games_season = pd.read_csv('https://github.com/lnunesAI/Datacamp/raw/main/2-machine-learning-scientist-with-python/17-advanced-deep-learning-with-keras/datasets/games_season.csv')

Instructions
<ul>
<li>Count the number of unique teams.</li>
<li>Create an embedding layer that maps each team ID to a single number representing that team's strength.</li>
<li>The output shape should be 1 dimension (as we want to represent the teams by a single number).</li>
<li>The input length should be 1 dimension (as each team is represented by exactly one id).</li>
</ul>

In [None]:
# Imports
from keras.layers import Embedding
from numpy import unique

# Count the unique number of teams
n_teams = unique(games_season['team_1']).shape[0]

# Create an embedding layer
team_lookup = Embedding(input_dim=n_teams,
                        output_dim=1,
                        input_length=1,
                        name='Team-Strength')

**The embedding layer is a lot like a dictionary, but your model learns the values for each key.**

### Define team model

<div class=""><p>The team strength lookup has three components: an input, an embedding layer, and a flatten layer that creates the output.</p>
<p>If you wrap these three layers in a model with an input and output, you can re-use that stack of three layers at multiple places.</p>
<p>Note again that the weights for <em>all three</em> layers will be shared everywhere we use them.</p></div>

Instructions
<ul>
<li>Create a 1D input layer for the team ID (which will be an integer). Be sure to set the correct input shape!</li>
<li>Pass this input to the team strength lookup layer you created previously.</li>
<li>Flatten the output of the team strength lookup.</li>
<li>Create a model that uses the 1D input as input and flattened team strength as output.</li>
</ul>

In [None]:
# Imports
from keras.layers import Input, Embedding, Flatten
from keras.models import Model

# Create an input layer for the team ID
teamid_in = Input(shape=(1,))

# Lookup the input in the team strength embedding layer
strength_lookup = team_lookup(teamid_in)

# Flatten the output
strength_lookup_flat = Flatten()(strength_lookup)

# Combine the operations into a single, re-usable model
team_strength_model = Model(teamid_in, strength_lookup_flat, name='Team-Strength-Model')

**The model will be reusable, so you can use it in two places in your final model.**

## Shared layers

### Defining two inputs

<p>In this exercise, you will define two input layers for the two teams in your model. This allows you to specify later in the model how the data from each team will be used differently.</p>

Instructions
<ul>
<li>Create an input layer to use for team 1. Recall that our input dimension is 1.</li>
<li>Name the input "Team-1-In" so you can later distinguish it from team 2.</li>
<li>Create an input layer to use for team 2, named "Team-2-In".</li>
</ul>

In [None]:
# Input layer for team 1
team_in_1 = Input((1,), name='Team-1-In')

# Separate input layer for team 2
team_in_2 = Input((1,), name='Team-2-In')

**These two inputs will be used later for the shared layer.**

### Lookup both inputs in the same model

<div class=""><p>Now that you have a team strength model and an input layer for each team, you can lookup the team inputs in the shared team strength model. The two inputs will share the same weights.</p>
<p>In this dataset, you have 10,888 unique teams.  You want to learn a strength rating for each team, such that if any pair of teams plays each other, you can predict the score, even if those two teams have never played before. Furthermore, you want the strength rating to be the same, regardless of whether the team is the home team or the away team.</p>
<p>To achieve this, you use a shared layer, defined by the re-usable model (<code>team_strength_model()</code>) you built in exercise 3 and the two input layers (<code>team_in_1</code> and <code>team_in_2</code>) from the previous exercise, all of which are available in your workspace.</p></div>

Instructions
<ul>
<li>Lookup the first team ID in the team strength model.</li>
<li>Lookup the second team ID in the team strength model.</li>
</ul>

In [None]:
# Lookup team 1 in the team strength model
team_1_strength = team_strength_model(team_in_1)

# Lookup team 2 in the team strength model
team_2_strength = team_strength_model(team_in_2)

**Now your model knows how strong each team is.**

## Merge layers

### Output layer using shared layer

<div class=""><p>Now that you've looked up how "strong" each team is, subtract the team strengths to determine which team is expected to win the game.</p>
<p>This is a bit like the seeds that the tournament committee uses, which are also a measure of team strength. But rather than using seed differences to predict score differences, you'll use the difference of your own team strength model to predict score differences.</p>
<p>The subtract layer will combine the weights from the two layers by subtracting them.</p></div>

Instructions
<ul>
<li>Import the <code>Subtract</code> layer from <code>keras.layers</code>.</li>
<li>Combine the two-team strength lookups you did earlier.</li>
</ul>

In [None]:
# Import the Subtract layer from keras
from keras.layers import Subtract

# Create a subtract layer using the inputs from the previous exercise
score_diff = Subtract()([team_1_strength, team_2_strength])

**This setup subracts the team strength ratings to determine a winner.**

### Model using two inputs and one output

<div class=""><p>Now that you have your two inputs (team id 1 and team id 2) and output (score difference), you can wrap them up in a model so you can use it later for fitting to data and evaluating on new data.</p>
<p>Your model will look like the following diagram:</p>
<p>
  <img src="https://s3.amazonaws.com/assets.datacamp.com/production/course_6554/datasets/basketball_model_2.png" width="300">
</p></div>

Instructions
<ul>
<li>Define a model with the two teams as inputs and use the score difference as the output.</li>
<li>Compile the model with the <code>'adam'</code> optimizer and <code>'mean_absolute_error'</code> loss.</li>
</ul>

In [None]:
# Imports
from keras.layers import Subtract
from keras.models import Model

# Subtraction layer from previous exercise
score_diff = Subtract()([team_1_strength, team_2_strength])

# Create the model
model = Model([team_in_1, team_in_2], score_diff)

# Compile the model
model.compile(optimizer='adam', loss='mean_absolute_error')

**Now your model is finalized and ready to fit to data.**

## Predict from your model

### Fit the model to the regular season training data

<p>Now that you've defined a complete team strength model, you can fit it to the basketball data! Since your model has two inputs now, you need to pass the input data as a list.</p>

Instructions
<ul>
<li>Assign the <code>'team_1'</code> and <code>'team_2'</code> columns from <code>games_season</code> to <code>input_1</code> and <code>input_2</code>, respectively.</li>
<li>Use <code>'score_diff'</code> column from <code>games_season</code> as the target.</li>
<li>Fit the model using 1 epoch, a batch size of 2048, and a 10% validation split.</li>
</ul>

In [None]:
# Get the team_1 column from the regular season data
input_1 = games_season['team_1']

# Get the team_2 column from the regular season data
input_2 = games_season['team_2']

# Fit the model to input 1 and 2, using score diff as a target
model.fit([input_1, input_2],
          games_season['score_diff'],
          epochs=1,
          batch_size=2048,
          validation_split=0.1,
          verbose=True)



<tensorflow.python.keras.callbacks.History at 0x7fe535d335c0>

**Now our model has learned a strength rating for every team.**

### Evaluate the model on the tournament test data

<div class=""><p>The model you fit to the regular season data (<code>model</code>) in the previous exercise and the tournament dataset (<code>games_tourney</code>) are available in your workspace.</p>
<p>In this exercise, you will evaluate the model on this new dataset. This evaluation will tell you how well you can predict the tournament games, based on a model trained with the regular season data.  This is interesting because many teams play each other in the tournament that did not play in the regular season, so this is a very good check that your model is not overfitting.</p></div>

In [None]:
games_tourney = pd.read_csv('https://github.com/lnunesAI/Datacamp/raw/main/2-machine-learning-scientist-with-python/17-advanced-deep-learning-with-keras/datasets/games_tourney.csv')

Instructions
<ul>
<li>Assign the <code>'team_1'</code> and <code>'team_2'</code> columns from <code>games_tourney</code> to <code>input_1</code> and <code>input_2</code>, respectively.</li>
<li>Evaluate the model.</li>
</ul>

In [None]:
# Get team_1 from the tournament data
input_1 = games_tourney['team_1']

# Get team_2 from the tournament data
input_2 = games_tourney['team_2']

# Evaluate the model using these inputs
print(model.evaluate([input_1, input_2], games_tourney['score_diff'], verbose=False))

11.681324005126953


**Its time to move on to models with more than two inputs.**