# Lab Assignment Five: Wide and Deep Network Architectures
 

#### Everett Cienkus, Blake Miller, Colin Weil

### 1. Preparation

#### 1.1 Define and Prepare Class Variables

Data from https://www.kaggle.com/datasets/arashnic/hr-ana

Define and prepare your class variables. Use proper variable representations (int, float, one-hot, etc.). Use pre-processing methods (as needed) for dimensionality reduction, scaling, etc. Remove variables that are not needed/useful for the analysis. Describe the final dataset that is used for classification/regression (include a description of any newly formed variables you created).

This data set is a collection of different types of characteristics of over 50,000 employees who work for large MNCs (Multinational Corporations). These characteristics are paired with whether or not the employee was recommended to be promoted after being evaluated. There are 13 feature columns comprised of different characteristics and accomplishments of the employees, but we did not use all of the features. We decided not to use age or gender because we did not believe that either of these characteristics should be considered when evaluating an individual for a promotion. The importance of this data set is to cut down on time taken to evaluate countless employees for a position. Our algorithm will be able to reduce the employees needed for evaluation so that those giving the promotion can use their valuable time to be productive in other ways. Methods like this have already been used by companies to sort through an abundance on resumes and produce a smaller list of best fit candidates for a certain job. No new variables were created for this dataset.

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
# Load the data into memory and save it to a pandas data frame.
df = pd.read_csv('promotion_dataset/train.csv')
df = df.dropna()

df_train, df_test = train_test_split(df,train_size=0.8)
df_test

Unnamed: 0,employee_id,department,region,education,gender,recruitment_channel,no_of_trainings,age,previous_year_rating,length_of_service,awards_won?,avg_training_score,is_promoted
15840,71539,Technology,region_13,Bachelor's,m,sourcing,1,45,1.0,17,0,76,0
43612,43310,Sales & Marketing,region_13,Master's & above,m,other,1,39,4.0,3,0,52,0
30332,13162,HR,region_26,Bachelor's,m,sourcing,1,30,3.0,4,0,52,0
10776,57708,Sales & Marketing,region_15,Bachelor's,m,other,1,33,1.0,7,0,53,0
5075,71268,Sales & Marketing,region_28,Master's & above,f,other,1,37,5.0,4,0,49,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
6176,52563,Sales & Marketing,region_29,Bachelor's,m,other,1,33,3.0,6,1,49,0
2297,68124,Sales & Marketing,region_13,Master's & above,m,sourcing,1,36,3.0,4,0,47,0
48794,16746,Sales & Marketing,region_22,Master's & above,f,other,2,32,4.0,7,1,48,0
30799,53107,Sales & Marketing,region_25,Master's & above,m,sourcing,1,37,5.0,2,0,49,0


In [2]:
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
# ========================================================
# define objects that can encode each variable as integer
encoders = dict() # save each encoder in dictionary
categorical_headers = ['department','region','education','gender','recruitment_channel']
# train all encoders
for col in categorical_headers:
    df_train[col] = df_train[col].str.strip()
    df_test[col] = df_test[col].str.strip()
    encoders[col] = LabelEncoder() # save the encoder
    df_train[col+'_int'] = encoders[col].fit_transform(df_train[col])
    df_test[col+'_int'] = encoders[col].transform(df_test[col])
# ========================================================
# scale the numeric, continuous variables
numeric_headers = ['no_of_trainings', 'previous_year_rating', 'length_of_service', 'awards_won?', 'avg_training_score','age']
ss = StandardScaler()
df_train[numeric_headers] = ss.fit_transform(df_train[numeric_headers].values)
df_test[numeric_headers] = ss.transform(df_test[numeric_headers].values)


categorical_headers_ints = [x+'_int' for x in categorical_headers]

feature_columns = categorical_headers_ints+numeric_headers

import pprint
pp = pprint.PrettyPrinter(indent=4)
print(f"We will use the following {len(feature_columns)} features:")
pp.pprint(feature_columns)


We will use the following 11 features:
[   'department_int',
    'region_int',
    'education_int',
    'gender_int',
    'recruitment_channel_int',
    'no_of_trainings',
    'previous_year_rating',
    'length_of_service',
    'awards_won?',
    'avg_training_score',
    'age']


#### 1.2 Combine into Cross-Product Features

One of the crosses we decided to use is crossing the department column and the education column. These would be good to combine because certain departments might have different promotion rates as well as education will be very important to some departments and if someone should be promoted inside of them. 

We also chose to combine the recruitment channel and education columns. This would be a good idea because the way someone enters the compnay tells a lot about how they will progress throughout the company. Pairing this concept with education may bring some strong results as people who get recruited with the same education type would be expected to have similar career paths.

In [3]:
for col in categorical_headers:
    vals = df_train[col].unique()
    print(col,'has', len(vals), 'unique values:')
    print(vals)

department has 9 unique values:
['Sales & Marketing' 'Analytics' 'HR' 'Technology' 'Procurement'
 'Operations' 'R&D' 'Finance' 'Legal']
region has 34 unique values:
['region_27' 'region_2' 'region_28' 'region_15' 'region_19' 'region_7'
 'region_22' 'region_26' 'region_25' 'region_23' 'region_14' 'region_31'
 'region_5' 'region_30' 'region_3' 'region_17' 'region_32' 'region_13'
 'region_20' 'region_16' 'region_11' 'region_10' 'region_6' 'region_29'
 'region_9' 'region_12' 'region_4' 'region_24' 'region_8' 'region_1'
 'region_33' 'region_21' 'region_34' 'region_18']
education has 3 unique values:
["Bachelor's" "Master's & above" 'Below Secondary']
gender has 2 unique values:
['m' 'f']
recruitment_channel has 3 unique values:
['other' 'sourcing' 'referred']


In [4]:
cross_columns = [
    ['department','education'],
    ['recruitment_channel','education'],
    ['department', 'region']
]

# cross each set of columns in the list above
cross_col_df_names = []
for cols_list in cross_columns:
    # encode as ints for the embedding
    enc = LabelEncoder()

    # 1. create crossed labels by join operation
    X_crossed_train = df_train[cols_list].apply(lambda x: '_'.join(x), axis=1)
    X_crossed_test = df_test[cols_list].apply(lambda x: '_'.join(x), axis=1)

    # get a nice name for this new crossed column
    cross_col_name = '_'.join(cols_list)

    # 2. encode as integers, stacking all possibilities
    enc.fit(np.hstack((X_crossed_train.to_numpy(),  X_crossed_test.to_numpy())))

    # 3. Save into dataframe with new name
    df_train[cross_col_name] = enc.transform(X_crossed_train)
    df_test[cross_col_name] = enc.transform(X_crossed_test)

    # keep track of the new names of the crossed columns
    cross_col_df_names.append(cross_col_name)

cross_col_df_names

['department_education', 'recruitment_channel_education', 'department_region']

#### 1.3 Choose Metrics to Evaluate Performance

We will be using precision as our metric to evaluate our algorithms' performance. This metric is appropriate for our dataset becuase precision reduces false positiives and the worst case for the prediction that we will be mkaing would be to promote someone that does not qualify or deserve to be promoted. In addition, many times in the workplace people who are qualified do not get the promotion, so if our algorithm mirrors the real life action and does not classify someone who is qualified to get promotion, this is acceptable.

#### 1.4 Choose Method for Dividing Data

In [5]:
from sklearn.model_selection import train_test_split
X_train = df_train[feature_columns].to_numpy()
X_test = df_test[feature_columns].to_numpy()

y_train = df_train['is_promoted'].to_numpy()
y_test = df_test['is_promoted'].to_numpy()
# X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=42)

Since our dataset is over 50,000, it is okay to use 80/20 split according to the Larson Rule (we felt like this was a good name). When splitting our data set, it is important to ensure there is an even distribution positive outcomes and negative outcomes in both the testing and training data. This allows the data to be less biased, allowing the algorithm to train with a diverse dataset. The 80/20 rule works in this case bacause the large data set almost garentees that there will be diverse data because the set should contains multiple different combinations of data.

### 2. Modeling

#### 2.1 Create Three Combined Wide and Deep Netowkrs using Keras

Create at least three combined wide and deep networks to classify your data using Keras. Visualize the performance of the network on the training data and validation data in the same plot versus the training iterations. Note: use the "history" return parameter that is part of Keras "fit" function to easily access this data.

In [6]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Activation, Input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Embedding
from tensorflow.keras.layers import concatenate
print(tf.__version__)
print(keras.__version__)

2.10.0
2.10.0


In [7]:
# get crossed columns
X_train_crossed = df_train[cross_col_df_names].to_numpy()
X_test_crossed = df_test[cross_col_df_names].to_numpy()
# save categorical features
X_train_cat = df_train[categorical_headers_ints].to_numpy()
X_test_cat = df_test[categorical_headers_ints].to_numpy()
# and save off the numeric features
X_train_num =  df_train[numeric_headers].to_numpy()
X_test_num =  df_test[numeric_headers].to_numpy()

# we need to create separate lists for each branch
crossed_outputs = []

# CROSSED DATA INPUT
input_crossed = Input(shape=(X_train_crossed.shape[1],), dtype='int64', name='wide_inputs')
for idx,col in enumerate(cross_col_df_names):

    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(df_train[col].max(),df_test[col].max())+1


    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_crossed, idx, axis=1)

    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N,
                  output_dim=int(np.sqrt(N)),
                  input_length=1, name=col+'_embed')(x)

    # save these outputs to concatenate later
    crossed_outputs.append(x)


# now concatenate the outputs and add a fully connected layer
wide_branch = concatenate(crossed_outputs, name='wide_concat')

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers_ints):

    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(df_train[col].max(),df_test[col].max())+1

    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)

    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N,
                  output_dim=int(np.sqrt(N)),
                  input_length=1, name=col+'_embed')(x)

    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)

# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=22, activation='relu',name='num_1')(input_num)

all_deep_branch_outputs.append(x_dense)


# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
deep_branch = Dense(units=50,activation='relu', name='deep1')(deep_branch)
deep_branch = Dense(units=25,activation='relu', name='deep2')(deep_branch)
deep_branch = Dense(units=10,activation='relu', name='deep3')(deep_branch)

# merge the deep and wide branch
final_branch = concatenate([wide_branch, deep_branch],
                           name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(final_branch)

model1 = Model(inputs=[input_crossed,input_cat,input_num],
              outputs=final_branch)

model1.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=[tf.keras.metrics.Precision()])



In [8]:
history = model1.fit([X_train_crossed,X_train_cat,X_train_num],
                    y_train,
                    epochs=20,
                    batch_size=10,
                    verbose=1,
                    validation_data = ([X_test_crossed,X_test_cat,X_test_num],y_test))

Epoch 1/20


2022-11-20 20:52:52.412520: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [9]:
from matplotlib import pyplot as plt
plt.plot(history.history['precision'])
plt.plot(history.history['val_precision'])
plt.title('model precision')
plt.ylabel('precision')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

ModuleNotFoundError: No module named 'matplotlib'

In [None]:
from sklearn import metrics as mt
yhat = np.round(model1.predict([X_test_crossed,X_test_cat,X_test_num])).astype(int)
print(mt.confusion_matrix(y_test,yhat))
print(mt.classification_report(y_test,yhat))
unique_yhat, counts_yhat = np.unique(yhat, return_counts=True)
print("Y hat\n",np.asarray((unique_yhat, counts_yhat)).T)
unique_ytest, counts_ytest = np.unique(y_test, return_counts=True)
print("actual\n",np.asarray((unique_ytest, counts_ytest)).T)

#### 2.2 Investigate Performance by Altering the Number of Layers in the Deep Branch of the Network

Investigate generalization performance by altering the number of layers in the deep branch of the network. Try at least two different number of layers. Use the method of cross validation and evaluation metric that you argued for at the beginning of the lab to select the number of layers that performs superiorly.

In [None]:
# we need to create separate lists for each branch
crossed_outputs = []

# CROSSED DATA INPUT
input_crossed = Input(shape=(X_train_crossed.shape[1],), dtype='int64', name='wide_inputs')
for idx,col in enumerate(cross_col_df_names):

    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(df_train[col].max(),df_test[col].max())+1


    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_crossed, idx, axis=1)

    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N,
                  output_dim=int(np.sqrt(N)),
                  input_length=1, name=col+'_embed')(x)

    # save these outputs to concatenate later
    crossed_outputs.append(x)

# now concatenate the outputs and add a fully connected layer
wide_branch = concatenate(crossed_outputs, name='wide_concat')

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers_ints):

    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = df_train[col].max()+1

    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)

    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N,
                  output_dim=int(np.sqrt(N)),
                  input_length=1, name=col+'_embed')(x)

    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)

# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=22, activation='relu',name='num_1')(input_num)

all_deep_branch_outputs.append(x_dense)


# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
# 5 layer network
deep_branch = Dense(units=50,activation='relu', name='deep1')(deep_branch)
deep_branch = Dense(units=40,activation='relu', name='deep2')(deep_branch)
deep_branch = Dense(units=30,activation='relu', name='deep3')(deep_branch)
deep_branch = Dense(units=20,activation='relu', name='deep4')(deep_branch)
deep_branch = Dense(units=10,activation='relu', name='deep5')(deep_branch)

# merge the deep and wide branch
final_branch = concatenate([wide_branch, deep_branch],
                           name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(final_branch)

model2 = Model(inputs=[input_crossed,input_cat,input_num],
              outputs=final_branch)

model2.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=[tf.keras.metrics.Precision()])

history = model2.fit([X_train_crossed,X_train_cat,X_train_num],
                    y_train,
                    epochs=20,
                    batch_size=10,
                    verbose=1,
                    validation_data = ([X_test_crossed,X_test_cat,X_test_num],y_test))

In [None]:
yhat = np.round(model2.predict([X_test_crossed,X_test_cat,X_test_num])).astype(int)
print(mt.confusion_matrix(y_test,yhat))
print(mt.classification_report(y_test,yhat))
unique_yhat, counts_yhat = np.unique(yhat, return_counts=True)
print("Y hat\n",np.asarray((unique_yhat, counts_yhat)).T)
unique_ytest, counts_ytest = np.unique(y_test, return_counts=True)
print("actual\n",np.asarray((unique_ytest, counts_ytest)).T)

In [None]:
plt.plot(history.history['precision_1'])
plt.plot(history.history['val_precision_1'])
plt.title('model precision')
plt.ylabel('precision')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

#### 2.3 Third wide and deep network

In [None]:
# # get new crossed columns
# cross_columns = [
#     ['department','education'],
#     ['recruitment_channel','education'],
#     ['department', 'region'],
#     ['department', 'education', 'region']
# ]
# cross_col_df_names = []
# for cols_list in cross_columns:
#     # encode as ints for the embedding
#     enc = LabelEncoder()
#     # 1. create crossed labels by join operation
#     X_crossed_train = df_train[cols_list].apply(lambda x: '_'.join(x), axis=1)
#     X_crossed_test = df_test[cols_list].apply(lambda x: '_'.join(x), axis=1)
#
#     # get a nice name for this new crossed column
#     cross_col_name = '_'.join(cols_list)
#
#     # 2. encode as integers, stacking all possibilities
#     enc.fit(np.hstack((X_crossed_train.to_numpy(),  X_crossed_test.to_numpy())))
#
#     # 3. Save into dataframe with new name
#     df_train[cross_col_name] = enc.transform(X_crossed_train)
#     df_test[cross_col_name] = enc.transform(X_crossed_test)
#
#     # keep track of the new names of the crossed columns
#     cross_col_df_names.append(cross_col_name)
#
# cross_col_df_names

In [None]:
crossed_outputs = []
# CROSSED DATA INPUT
input_crossed = Input(shape=(X_train_crossed.shape[1],), dtype='int64', name='wide_inputs')
for idx,col in enumerate(cross_col_df_names):

    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(df_train[col].max(),df_test[col].max())+1


    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_crossed, idx, axis=1)

    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N,
                  output_dim=int(np.sqrt(N)),
                  input_length=1, name=col+'_embed')(x)

    # save these outputs to concatenate later
    crossed_outputs.append(x)

# now concatenate the outputs and add a fully connected layer
wide_branch = concatenate(crossed_outputs, name='wide_concat')

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers_ints):

    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(df_train[col].max(),df_test[col].max())+1

    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)

    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N,
                  output_dim=int(np.sqrt(N)),
                  input_length=1, name=col+'_embed')(x)

    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)

# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=22, activation='relu',name='num_1')(input_num)

all_deep_branch_outputs.append(x_dense)


# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
# 4 layer deep network
deep_branch = Dense(units=30,activation='relu', name='deep1')(deep_branch)
deep_branch = Dense(units=20,activation='relu', name='deep2')(deep_branch)
deep_branch = Dense(units=10,activation='relu', name='deep3')(deep_branch)
deep_branch = Dense(units=5,activation='relu', name='deep4')(deep_branch)

# merge the deep and wide branch
final_branch = concatenate([wide_branch, deep_branch],
                           name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(final_branch)

model3 = Model(inputs=[input_crossed,input_cat,input_num],
              outputs=final_branch)

model3.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=[tf.keras.metrics.Precision()])

history = model3.fit([X_train_crossed,X_train_cat,X_train_num],
                    y_train,
                    epochs=20,
                    batch_size=10,
                    verbose=1,
                    validation_data = ([X_test_crossed,X_test_cat,X_test_num],y_test))

In [None]:
yhat = np.round(model3.predict([X_test_crossed,X_test_cat,X_test_num])).astype(int)
print(mt.confusion_matrix(y_test,yhat))
print(mt.classification_report(y_test,yhat))
unique_yhat, counts_yhat = np.unique(yhat, return_counts=True)
print("Y hat\n",np.asarray((unique_yhat, counts_yhat)).T)
unique_ytest, counts_ytest = np.unique(y_test, return_counts=True)
print("actual\n",np.asarray((unique_ytest, counts_ytest)).T)

In [None]:
plt.plot(history.history['precision_2'])
plt.plot(history.history['val_precision_2'])
plt.title('model precision')
plt.ylabel('precision')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

#### 2.4 Investigate Performance of the Best Wide and Deep Network to Multi-Layer Perceptron

Compare the performance of your best wide and deep network to a standard multi-layer perceptron (MLP). Alternatively, you can compare to a network without the wide branch (i.e., just the deep network). For classification tasks, compare using the receiver operating characteristic and area under the curve. For regression tasks, use Bland-Altman plots and residual variance calculations.  Use proper statistical methods to compare the performance of different models.  

In [None]:
# Now let's define the architecture for a multi-layer network

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers_ints):

    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(df_train[col].max(),df_test[col].max())+1

    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)

    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N,
                  output_dim=int(np.sqrt(N)),
                  input_length=1, name=col+'_embed')(x)

    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)

# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=22, activation='relu',name='num_1')(input_num)

all_deep_branch_outputs.append(x_dense)


# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
# 7 layers now

deep_branch = Dense(units=30,activation='relu', name='deep1')(deep_branch)
deep_branch = Dense(units=20,activation='relu', name='deep2')(deep_branch)
deep_branch = Dense(units=10,activation='relu', name='deep3')(deep_branch)
deep_branch = Dense(units=5,activation='relu', name='deep4')(deep_branch)

# merge the deep and wide branch
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(deep_branch)

MLPModel = Model(inputs=[input_cat,input_num],
              outputs=final_branch)
MLPModel.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=[tf.keras.metrics.Precision()])
history = MLPModel.fit([X_train_cat,X_train_num],
                    y_train,
                    epochs=20,
                    batch_size=10,
                    verbose=1,
                    validation_data = ([X_test_cat,X_test_num],y_test))



In [None]:
yhatMLP = np.round(MLPModel.predict([X_test_cat,X_test_num])).astype(int)
print(mt.confusion_matrix(y_test,yhatMLP))
print(mt.classification_report(y_test,yhatMLP))
unique_yhat, counts_yhat = np.unique(yhatMLP, return_counts=True)
print("Y hat\n",np.asarray((unique_yhat, counts_yhat)).T)
unique_ytest, counts_ytest = np.unique(y_test, return_counts=True)
print("actual\n",np.asarray((unique_ytest, counts_ytest)).T)

In [None]:
from sklearn.metrics import roc_curve
from sklearn.metrics import auc
yhatMLP = MLPModel.predict([X_test_cat,X_test_num]).ravel()
fpr_mlp, tpr_mlp, thresholds_keras = roc_curve(y_test, yhatMLP)
auc_mlp = auc(fpr_mlp, tpr_mlp)
yhat_model1 = model1.predict([X_test_crossed,X_test_cat,X_test_num]).ravel()
fpr_model1, tpr_model1, thresholds_keras = roc_curve(y_test, yhat_model1)
auc_model1= auc(fpr_model1, tpr_model1)
yhat_model2 = model2.predict([X_test_crossed,X_test_cat,X_test_num]).ravel()
fpr_model2, tpr_model2, thresholds_keras = roc_curve(y_test, yhat_model2)
auc_model2= auc(fpr_model2, tpr_model2)
yhat_model3 = model3.predict([X_test_crossed,X_test_cat,X_test_num]).ravel()
fpr_model3, tpr_model3, thresholds_keras = roc_curve(y_test, yhat_model3)
auc_model3= auc(fpr_model3, tpr_model3)
plt.figure(1)
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_mlp, tpr_mlp, label='MLP (area = {:.3f})'.format(auc_mlp))
plt.plot(fpr_model1, tpr_model1, label='Model 1 (area = {:.3f})'.format(auc_model1))
plt.plot(fpr_model2, tpr_model2, label='Model 2 (area = {:.3f})'.format(auc_model2))
plt.plot(fpr_model3, tpr_model3, label='Model 3 (area = {:.3f})'.format(auc_model3))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

For each of the models we trained, we get a pretty similar ROC curve 

### 3. Capturing the Embedding Weights from the Deep Network

Capture the embedding weights from the deep network and (if needed) perform dimensionality reduction on the output of these embedding layers (only if needed). That is, pass the observations into the network, save the embedded weights (called embeddings), and then perform  dimensionality reduction in order to visualize results. Visualize and explain any clusters in the data.

In [None]:
model3.summary()

In [None]:
dept_embeddings = model3.get_layer('department_int_embed').get_weights()[0]
print(encoders['department'].inverse_transform([0]))
print(len(dept_embeddings))
for i in range(len(dept_embeddings)):
    print(encoders['department'].inverse_transform([i]))
    print(dept_embeddings[i])
#words_embeddings = {w:embeddings[idx] for w, idx in word_to_index.items()}

In [None]:
%matplotlib notebook

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d    

 
fig = plt.figure()
 
# syntax for 3-D projection
ax = plt.axes(projection ='3d')
 
for i in range(len(dept_embeddings)):
    ax.scatter(dept_embeddings[i][0],dept_embeddings[i][1],dept_embeddings[i][2], 'green', label = encoders['department'].inverse_transform([i]) )
# plotting
#ax.plot3D(x, y, z, 'green')
ax.set_title('Learned encodings by department')
plt.legend(loc='best')
plt.show()

As we can see from the above plot, we do get some meaningful clusters in the embedding data. On the bottom right side of the plot, we can see that the R&D, Technology, and Analytics departments have clustered embeddings. We can also see that in the center of the plot, the Legal and HR departments   