# lab 6 - wide and deep

## Overview

This data set was created by IBM data scientists.  It describes 35 features for 1470 (fictional) employees including whether or not the employee has left the firm (labeled "attrition" in the dataset).  Employees leave companies for a variety of reasons: disatisfaction with their role, their manager or their pay.  Perhaps they aren't necessarily dissatified with their current job but feel like something better is out there.  Or maybe they just feel like they'd been there long enough, and want something different. Most likely its a combination of all of these things, plus a few others.  

Employers would like to have a sense of why and when an employee might leave.  If an employer believes that an employee that they really value might leave, they could respond and try to prevent them from leaving.  This is what we will attempt to predict using a wide and deep neural network.

In [1]:
import pandas as pd
import numpy as np
import os

In [2]:
data_path = '../data/'
df = pd.read_csv(os.path.join(data_path, 'WA_Fn-UseC_-HR-Employee-Attrition.csv'))
df.head()

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeNumber,...,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
0,41,Yes,Travel_Rarely,1102,Sales,1,2,Life Sciences,1,1,...,1,80,0,8,0,1,6,4,0,5
1,49,No,Travel_Frequently,279,Research & Development,8,1,Life Sciences,1,2,...,4,80,1,10,3,3,10,7,1,7
2,37,Yes,Travel_Rarely,1373,Research & Development,2,2,Other,1,4,...,2,80,0,7,3,3,0,0,0,0
3,33,No,Travel_Frequently,1392,Research & Development,3,4,Life Sciences,1,5,...,3,80,0,8,3,3,8,7,3,0
4,27,No,Travel_Rarely,591,Research & Development,2,1,Medical,1,7,...,4,80,1,6,3,3,2,2,2,2


In [3]:
print('Breakdown of attrition.  (0) Stayed at company, (1) Left company')

df.Attrition.value_counts()

Breakdown of attrition.  (0) Stayed at company, (1) Left company


No     1233
Yes     237
Name: Attrition, dtype: int64

# Expansion

This is a relative small dataset with only 1470 data rows. We want more data to train and test. For data expension, we will try several ways to do so. And first we are going to copy and append some rows to origin dataset and expand it to 2000 rows.
For another attempt, we take numberical datas from the raw dataframe slice, and add some randomly generat noise to these numerical data. Then we insert categorical data rows back to the dataframe of numerical data with random noise.

In [4]:
df_slice = df[:530]

In [5]:
df_new = df.append(df_slice)
df_new = df_new.reset_index(drop=True)

In [6]:
df_slice2 = df_slice[['Age','DistanceFromHome','Education','EnvironmentSatisfaction',
                      'JobSatisfaction','MonthlyIncome','PerformanceRating','RelationshipSatisfaction',
                      'TotalWorkingYears','YearsAtCompany']]

We use df_slice2 to take numberical datas from the raw dataframe slice, and add some randomly generated noise to these numerical data.

In [7]:
df_slice2 = df_slice2 * (1 + np.random.uniform(-0.01,0.01,(df_slice2.shape)))

In [8]:
df_slice2.insert(1, 'Attrition', df_slice['Attrition'])
df_slice2.insert(2, 'Department', df_slice['Department'])
df_slice2.insert(6, 'Gender', df_slice['Gender'])
df_slice2.insert(8, 'MaritalStatus', df_slice['MaritalStatus'])
df_slice2.insert(10, 'OverTime', df_slice['OverTime'])

In [9]:
df_new2 = df.append(df_slice2)
df_new2 = df_new2.reset_index(drop=True)
df = df_new2

So now we have a dataframe with 2000 rows and 15 clomuns expanded dataset

There are 35 features in total in the dataset, but we don't want to use all of them.  
Let's focus on a few of them:
- Age
- Attrition 
- Department
- DistanceFromHome 
- Education 
- EduacationField
- EnvironmentSatisfaction
- Gender
- JobSatisfaction
- MaritalStatus
- MonthlyIncome
- OverTime
- PerformanceRating
- RelationshipSatisfaction
- TotalWorkingYears
- YearsAtCompany
- YearsSinceLastPromotion

These features are what we believe important to predict the attrition status. 
We will use attrition as our label.

So let's first drop the other features. 

In [10]:
to_keep = {'Age', 'Attrition', 'Department','DistanceFromHome', 'Education', 'EnvironmentSatisfaction', 'Gender', 'JobSatisfaction', 'MaritalStatus',
           'MonthlyIncome', 'OverTime', 'PerformanceRating', 'RelationshipSatisfaction','TotalWorkingYears','YearsAtCompany'}
to_drop = set(df.columns)-to_keep
df.drop(to_drop, axis=1, inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 15 columns):
Age                         2000 non-null float64
Attrition                   2000 non-null object
Department                  2000 non-null object
DistanceFromHome            2000 non-null float64
Education                   2000 non-null float64
EnvironmentSatisfaction     2000 non-null float64
Gender                      2000 non-null object
JobSatisfaction             2000 non-null float64
MaritalStatus               2000 non-null object
MonthlyIncome               2000 non-null float64
OverTime                    2000 non-null object
PerformanceRating           2000 non-null float64
RelationshipSatisfaction    2000 non-null float64
TotalWorkingYears           2000 non-null float64
YearsAtCompany              2000 non-null float64
dtypes: float64(10), object(5)
memory usage: 234.5+ KB


In [11]:
print('Breakdown of attrition.  (0) Stayed at company, (1) Left company')

df.Attrition.value_counts()

Breakdown of attrition.  (0) Stayed at company, (1) Left company


No     1680
Yes     320
Name: Attrition, dtype: int64

# Preprocessing

It's good that we don't have any null value. Let's encode the categorical data to ints. There are some categorical values those have been encoded once from the origin and transfered to type int. We want to use some of them for the cross features, so we want to transfer their type to string. 

In [12]:
to_convert = ['Education','EnvironmentSatisfaction','JobSatisfaction',
            'PerformanceRating','RelationshipSatisfaction']

for col in to_convert:
    df[col] = df[col].astype(np.str) 
   


Encode the categorical features:

In [13]:
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler

to_encode = { 'Department','Gender','MaritalStatus','OverTime','Education','EnvironmentSatisfaction','JobSatisfaction',
            'PerformanceRating','RelationshipSatisfaction'}
encoders = dict()

for col in list(to_encode) +['Attrition']:
    if col=='Attrition':
        tmp = LabelEncoder()
        df[col] = tmp.fit_transform(df[col])
        df[col+'_int'] = tmp.fit_transform(df[col])
    else:
        encoders[col] = LabelEncoder()
        df[col+'_int'] = encoders[col].fit_transform(df[col])
    

Then, let's scale the numeric features. 

In [14]:
categorical_features =list(to_encode)
categorical_features = [x+'_int' for x in categorical_features]
numerics = set(df.columns) - to_encode
numerics = list(numerics - set(categorical_features)-{'Attrition'})
ss = StandardScaler()
for atr in numerics:
    df[atr] = df[atr].astype(np.float)    
    df[atr] = ss.fit_transform(df[atr].values.reshape(-1, 1))
    
feature_columns = categorical_features + numerics

In [15]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 25 columns):
Age                             2000 non-null float64
Attrition                       2000 non-null int64
Department                      2000 non-null object
DistanceFromHome                2000 non-null float64
Education                       2000 non-null object
EnvironmentSatisfaction         2000 non-null object
Gender                          2000 non-null object
JobSatisfaction                 2000 non-null object
MaritalStatus                   2000 non-null object
MonthlyIncome                   2000 non-null float64
OverTime                        2000 non-null object
PerformanceRating               2000 non-null object
RelationshipSatisfaction        2000 non-null object
TotalWorkingYears               2000 non-null float64
YearsAtCompany                  2000 non-null float64
MaritalStatus_int               2000 non-null int64
JobSatisfaction_int             2000 non-n

### Evaluation Metric

Let's take a moment to break down what the three main metrics mean in our task, i.e. predicting whether or not an employee will leave a company, and what that mean for our businesses using the model:

* accuracy: Values getting true positives and negatives.  This probably isn't very useful for our dataset because there are significantly fewer people that left the company than stayed

* precision: Values a low false positive rate.  This probably isn't the best either, because while we wouldn't **want** to think that an employee is leaving when they aren't, it probably won't hurt the business, unless the employer grossly overreacts and scares them away

* recall: Values a low false negative rate.  This is the best metric for our case.  If our job is to see when employees leave, and if the fact is that they usually **don't** leave, and if its potentially pretty damaging to the firm when the employee **does** leave, we want to make sure that we miss as few cases as possible.


### Validation Method

Because of the imbalance in our prediction label we'll use a stratified split, this way we'll preserve the distribution in our model.  In an attempt to realistically generalize the overall performance of our model we'll use a nested cross-validation scheme.  We'll use k-fold as opposed to a different cv scheme like shuffle-split, because our dataset is not that large and we would like to train on as much data as possible.  Using a k-fold cv ensures that we train on all of our data.  The inner loop will tune the hyper-parameters of our model which will be discussed later.

In [16]:
from sklearn.model_selection import train_test_split

# stratified 90/10 train/test split`
df_train, df_test = train_test_split(df, test_size=0.1, stratify=df.Attrition)

X_train =  ss.fit_transform(df_train[feature_columns].values).astype(np.float32)
X_test =  ss.transform(df_test[feature_columns].values).astype(np.float32)

y_train = df_train['Attrition'].values.astype(np.int)
y_test = df_test['Attrition'].values.astype(np.int)

print('train', X_train.shape, 'test', X_test.shape)

train (1800, 15) test (200, 15)


## Model Building

Let's just see how we well we can do with a singl network on all of our data.  We'll just split all of our data up into one train and test set.

In [17]:
#import some keras stuff
from keras.models import Sequential
from keras.layers import Dense, Activation, Input
from keras.layers import Embedding, Flatten, Merge, concatenate
from keras.models import Model

Using TensorFlow backend.


In [18]:
# This returns a tensor
inputs = Input(shape=(X_train.shape[1],))

# a layer instance is callable on a tensor, and returns a tensor
x = Dense(units=10, activation='relu')(inputs)
predictions = Dense(1,activation='sigmoid')(x)

# This creates a model that includes
# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)

In [19]:
model.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=['accuracy'])

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 15)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                160       
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 11        
Total params: 171
Trainable params: 171
Non-trainable params: 0
_________________________________________________________________


In [20]:
%%time

model.fit(X_train, y_train, epochs=100, batch_size=50, verbose=0)

from sklearn import metrics as mt
yhat = np.round(model.predict(X_test))
print(mt.confusion_matrix(y_test,yhat),mt.recall_score(y_test,yhat))

[[168   0]
 [  0  32]] 1.0
Wall time: 2.23 s


Well it didn't do very well.  It missed the majority of the true positives.  But there are a few issues here that we can address.

1. We don't have that much data to begin with and our target class is a small percentage of it so we're going to have a hard time.

2. We were using a mean squared error for our loss function on a binary classification task.  While this isn't terrible, it would likely work better if we used cross-entropy instead.

Luckily its easy to modify both of these in Keras.  The loss function is simple to change, and the fit function includes a parameter for 'class_weights' which accepts a dictionary of weights for each class value to use when computing the loss function.  This way we can tell the model which class is "more important".  Let's try it.

In [21]:
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='sgd',
              loss='binary_crossentropy',
              metrics=['accuracy'])


model.fit(X_train, y_train, epochs=100, batch_size=50, verbose=0, class_weight={0 : 0.20, 1 : 0.80})

yhat = np.round(model.predict(X_test))
print(mt.confusion_matrix(y_test,yhat),mt.recall_score(y_test,yhat))

[[168   0]
 [  0  32]] 1.0


It ended up getting a much better reacll score, at the expense of overall accuracy however.

### Parameter tuning in Deep Network

We identify the following as parameters that we can tune

- Number of layers
- Number of neurons per layer
- The weights of the class in the loss function
- number of epochs

Keras has a wrapper class for a model to be used as an sklearn estimator which we can that pass to GridSearchCV.  The only caveat is that we must use Keras' Sequential Model to do so which means we can only have one input branch.  This is okay though because if we only build it on the deep side we can use the result of that with our wide, cross-category branch later on.

In [22]:
# taken from https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/
# this uses the Keras Wrapper to make the model usable by sk-learn

# Use scikit-learn to grid search the batch size and epochs
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier

# Function to create model, required for KerasClassifier
def create_model(num_neurons=12, input_dim=8):
    # create model
    model = Sequential()
    
    # num_neurons is a list of the number nuerons at each layer
    for layer, num in enumerate(num_neurons):
        if layer == 0:
            model.add(Dense(num, input_dim=input_dim, activation='relu'))
        else:
            model.add(Dense(units=num, activation='relu'))
    
    model.add(Dense(1, activation='sigmoid'))
    
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

Now that we can creat models in that can be used by SKLearn, lets GridSearch over the parameters we identified.  Note that the number of neurons and number of layers is combined into one parameter called `num_neurons` which is a list of the number of output nerouns at each layer.

In [23]:
# this will move inside nested CV
from sklearn.model_selection import GridSearchCV
num_neurons = [[5, 10], [5, 10, 20]]
epochs = [5]
class_weight = [{0:x, 1:1-x} for x in np.linspace(0.1, 0.5, 2)]
param_grid = dict(num_neurons=num_neurons,
                  epochs=epochs,
                  class_weight=class_weight)

model = KerasClassifier(build_fn=create_model, input_dim=X_train.shape[-1], epochs=10, verbose=0)
g = GridSearchCV(estimator=model, param_grid=param_grid, verbose=3, scoring='f1')
r = g.fit(X_train, y_train)

Fitting 3 folds for each of 4 candidates, totalling 12 fits
[CV] class_weight={0: 0.10000000000000001, 1: 0.90000000000000002}, epochs=5, num_neurons=[5, 10] 
[CV]  class_weight={0: 0.10000000000000001, 1: 0.90000000000000002}, epochs=5, num_neurons=[5, 10], score=0.28054298642533937, total=   0.6s
[CV] class_weight={0: 0.10000000000000001, 1: 0.90000000000000002}, epochs=5, num_neurons=[5, 10] 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.7s remaining:    0.0s


[CV]  class_weight={0: 0.10000000000000001, 1: 0.90000000000000002}, epochs=5, num_neurons=[5, 10], score=0.6190476190476191, total=   0.7s
[CV] class_weight={0: 0.10000000000000001, 1: 0.90000000000000002}, epochs=5, num_neurons=[5, 10] 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    1.4s remaining:    0.0s


[CV]  class_weight={0: 0.10000000000000001, 1: 0.90000000000000002}, epochs=5, num_neurons=[5, 10], score=0.3369803063457331, total=   0.7s
[CV] class_weight={0: 0.10000000000000001, 1: 0.90000000000000002}, epochs=5, num_neurons=[5, 10, 20] 
[CV]  class_weight={0: 0.10000000000000001, 1: 0.90000000000000002}, epochs=5, num_neurons=[5, 10, 20], score=0.5740740740740741, total=   0.9s
[CV] class_weight={0: 0.10000000000000001, 1: 0.90000000000000002}, epochs=5, num_neurons=[5, 10, 20] 
[CV]  class_weight={0: 0.10000000000000001, 1: 0.90000000000000002}, epochs=5, num_neurons=[5, 10, 20], score=0.35915492957746475, total=   0.8s
[CV] class_weight={0: 0.10000000000000001, 1: 0.90000000000000002}, epochs=5, num_neurons=[5, 10, 20] 
[CV]  class_weight={0: 0.10000000000000001, 1: 0.90000000000000002}, epochs=5, num_neurons=[5, 10, 20], score=0.5045592705167173, total=   0.9s
[CV] class_weight={0: 0.5, 1: 0.5}, epochs=5, num_neurons=[5, 10] ....
[CV]  class_weight={0: 0.5, 1: 0.5}, epochs=5, 

  'precision', 'predicted', average, warn_for)


[CV]  class_weight={0: 0.5, 1: 0.5}, epochs=5, num_neurons=[5, 10], score=0.0, total=   0.9s
[CV] class_weight={0: 0.5, 1: 0.5}, epochs=5, num_neurons=[5, 10] ....
[CV]  class_weight={0: 0.5, 1: 0.5}, epochs=5, num_neurons=[5, 10], score=0.0, total=   0.8s
[CV] class_weight={0: 0.5, 1: 0.5}, epochs=5, num_neurons=[5, 10, 20] 
[CV]  class_weight={0: 0.5, 1: 0.5}, epochs=5, num_neurons=[5, 10, 20], score=0.0, total=   1.0s
[CV] class_weight={0: 0.5, 1: 0.5}, epochs=5, num_neurons=[5, 10, 20] 
[CV]  class_weight={0: 0.5, 1: 0.5}, epochs=5, num_neurons=[5, 10, 20], score=0.8404255319148937, total=   0.9s
[CV] class_weight={0: 0.5, 1: 0.5}, epochs=5, num_neurons=[5, 10, 20] 
[CV]  class_weight={0: 0.5, 1: 0.5}, epochs=5, num_neurons=[5, 10, 20], score=0.8571428571428571, total=   1.1s


[Parallel(n_jobs=1)]: Done  12 out of  12 | elapsed:   11.4s finished


In [24]:
pd.DataFrame(r.cv_results_)

Unnamed: 0,mean_fit_time,mean_score_time,mean_test_score,mean_train_score,param_class_weight,param_epochs,param_num_neurons,params,rank_test_score,split0_test_score,split0_train_score,split1_test_score,split1_train_score,split2_test_score,split2_train_score,std_fit_time,std_score_time,std_test_score,std_train_score
0,0.754994,0.026572,0.41219,0.42027,"{0: 0.1, 1: 0.9}",5,"[5, 10]","{'class_weight': {0: 0.1, 1: 0.9}, 'epochs': 5...",3,0.280543,0.291916,0.619048,0.578544,0.33698,0.390351,0.025422,0.006356,0.148074,0.118912
1,0.916197,0.051911,0.479263,0.495547,"{0: 0.1, 1: 0.9}",5,"[5, 10, 20]","{'class_weight': {0: 0.1, 1: 0.9}, 'epochs': 5...",2,0.574074,0.570175,0.359155,0.340798,0.504559,0.575668,0.024116,0.0072,0.089545,0.109447
2,0.84647,0.072703,0.102102,0.115702,"{0: 0.5, 1: 0.5}",5,"[5, 10]","{'class_weight': {0: 0.5, 1: 0.5}, 'epochs': 5...",4,0.306306,0.347107,0.0,0.0,0.0,0.0,0.05286,0.005777,0.144394,0.163628
3,1.0221,0.098479,0.565856,0.553362,"{0: 0.5, 1: 0.5}",5,"[5, 10, 20]","{'class_weight': {0: 0.5, 1: 0.5}, 'epochs': 5...",1,0.0,0.0,0.840426,0.833876,0.857143,0.826211,0.054312,0.005944,0.400179,0.391299


In [25]:
from copy import copy, deepcopy
m = r.best_estimator_.model
m_copy = copy(m)

In [26]:
inp = Input(shape=(X_train.shape[1],),sparse=False)
tt = Model(inputs=m.inputs, outputs=m.output)

In [27]:
r.best_params_

{'class_weight': {0: 0.5, 1: 0.5}, 'epochs': 5, 'num_neurons': [5, 10, 20]}

In [28]:
m.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_45 (Dense)             (None, 5)                 80        
_________________________________________________________________
dense_46 (Dense)             (None, 10)                60        
_________________________________________________________________
dense_47 (Dense)             (None, 20)                220       
_________________________________________________________________
dense_48 (Dense)             (None, 1)                 21        
Total params: 381
Trainable params: 381
Non-trainable params: 0
_________________________________________________________________


In [29]:
epochs = r.best_params_.epochs
class_weights = r.best_params_.class_weights
m.compile(loss='binary_crossentropy', epochs=epochs, )

AttributeError: 'dict' object has no attribute 'epochs'

## Nested Cross Validation set up

In order to generalize performance we'll do a nested cross validation scheme with an outer loop of 5-folds and the inner loop tuning the hyper parameters in the deep network using 3-folds.  We should probably use more, but we don't have all day people.


In [30]:
df_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1800 entries, 1351 to 1714
Data columns (total 25 columns):
Age                             1800 non-null float64
Attrition                       1800 non-null int64
Department                      1800 non-null object
DistanceFromHome                1800 non-null float64
Education                       1800 non-null object
EnvironmentSatisfaction         1800 non-null object
Gender                          1800 non-null object
JobSatisfaction                 1800 non-null object
MaritalStatus                   1800 non-null object
MonthlyIncome                   1800 non-null float64
OverTime                        1800 non-null object
PerformanceRating               1800 non-null object
RelationshipSatisfaction        1800 non-null object
TotalWorkingYears               1800 non-null float64
YearsAtCompany                  1800 non-null float64
MaritalStatus_int               1800 non-null int64
JobSatisfaction_int             1800 no

In [31]:
from sklearn.model_selection import StratifiedKFold

outer_loop = StratifiedKFold(n_splits=5)

X = df.drop('Attrition', axis=1).values
y = df.Attrition.values

scores = []

for train_idx, test_idx in outer_loop.split(X, y):
    
    # split data
    df_train = pd.DataFrame(X[train_idx], columns=df.columns.drop('Attrition'))
    df_test = pd.DataFrame(X[test_idx], columns=df.columns.drop('Attrition'))
    
    X_train = ss.fit_transform(df_train[feature_columns].values).astype(np.float32)
    X_test = ss.fit_transform(df_test[feature_columns].values).astype(np.float32)

    y_train = df_train['Attrition_int'].values.astype(np.int)
    y_test = df_test['Attrition_int'].values.astype(np.int)

    # the whole thing
    X_train_num =  df_train[numerics].values
    X_test_num = df_test[numerics].values

    
    # train deep
    print('training embedings')
    # we need to create separate sequential models for each embedding
    embed_branches = []
    X_ints_train = []
    X_ints_test = []
    all_inputs = []
    all_branch_outputs = []


    # reset this input branch
    all_branch_outputs = []
    # add in the embeddings
    for col in categorical_features:
        # encode as ints for the embedding
        X_ints_train.append( df_train[col].values )
        X_ints_test.append( df_test[col].values )

        # get the number of categories
        N = max(X_ints_train[-1]+1) # same as the max(df_train[col])

        # create embedding branch from the number of categories
        inputs = Input(shape=(1,),dtype='int32', name=col)
        all_inputs.append(inputs)
        x = Embedding(input_dim=N, output_dim=int(np.sqrt(N)), input_length=1)(inputs)
        x = Flatten()(x)
        all_branch_outputs.append(x)

    # also get a dense branch of the numeric features
    all_inputs.append(Input(shape=(X_train_num.shape[1],),sparse=False,name='numeric_data'))
    x = Dense(units=20, activation='relu')(all_inputs[-1])
    all_branch_outputs.append( x )

    # merge the branches together
    deep_branch = concatenate(all_branch_outputs)

    final_branch = Dense(units=1,activation='sigmoid')(deep_branch)

    model = Model(inputs=all_inputs, outputs=final_branch)

    model.compile(optimizer='adagrad',
                  loss='mean_squared_error',
                  metrics=['accuracy'])

    model.fit(X_ints_train+ [X_train_num],
            y_train, epochs=10, batch_size=32, verbose=1)
    
    
    # get output from embedding
    model2 = Model(inputs=model.inputs, outputs=model.get_layer(index=-2).output)

    X_embed = model2.predict(X_ints_train+ [X_train_num])
    
    # Grid Search Deep
    print('Grid Searching the layers of the Deep')
    from sklearn.model_selection import GridSearchCV
    num_neurons = [[5, 10], [5, 10, 20]]
    class_weight=class_weight = [{0:x, 1:1-x} for x in np.linspace(0.1, 0.5, 2)]
    param_grid = dict(num_neurons=num_neurons,
                      class_weight=class_weight)

    model = KerasClassifier(build_fn=create_model, input_dim=X_embed.shape[-1], epochs=10, verbose=0)
    g = GridSearchCV(estimator=model, param_grid=param_grid, verbose=1, scoring='recall')
    r = g.fit(X_embed, y_train)
    
    
    # Wide and best deep
    
    # we need to create separate sequential models for each embedding
    embed_branches = []
    X_ints_train = []
    X_ints_test = []
    all_inputs = []
    all_branch_outputs = []

    for cols in cross_columns:
        # encode crossed columns as ints for the embedding
        enc = LabelEncoder()

        # create crossed labels
        # needs to be commented better, Eric!
        X_crossed_train = df_train[cols].apply(lambda x: '_'.join(x), axis=1)
        X_crossed_test = df_test[cols].apply(lambda x: '_'.join(x), axis=1)

        enc.fit(np.hstack((X_crossed_train.values,  X_crossed_test.values)))
        X_crossed_train = enc.transform(X_crossed_train)
        X_crossed_test = enc.transform(X_crossed_test)
        X_ints_train.append( X_crossed_train )
        X_ints_test.append( X_crossed_test )

        # get the number of categories
        N = max(X_ints_train[-1]+1) # same as the max(df_train[col])

        # create embedding branch from the number of categories
        inputs = Input(shape=(1,),dtype='int32', name = '_'.join(cols))
        all_inputs.append(inputs)
        x = Embedding(input_dim=N, output_dim=int(np.sqrt(N)), input_length=1)(inputs)
        x = Flatten()(x)
        all_branch_outputs.append(x)

    # merge the branches together
    wide_branch = concatenate(all_branch_outputs)

    # reset this input branch
    all_branch_outputs = []
    # add in the embeddings
    for col in categorical_features:
        # encode as ints for the embedding
        X_ints_train.append( df_train[col].values )
        X_ints_test.append( df_test[col].values )

        # get the number of categories
        N = max(X_ints_train[-1]+1) # same as the max(df_train[col])

        # create embedding branch from the number of categories
        inputs = Input(shape=(1,),dtype='int32', name=col)
        all_inputs.append(inputs)
        x = Embedding(input_dim=N, output_dim=int(np.sqrt(N)), input_length=1)(inputs)
        x = Flatten()(x)
        all_branch_outputs.append(x)

    # also get a dense branch of the numeric features
    all_inputs.append(Input(shape=(X_train_num.shape[1],),sparse=False,name='numeric_data'))
    x = Dense(units=20, activation='relu')(all_inputs[-1])
    all_branch_outputs.append( x )

    # merge the branches together
    deep_branch = concatenate(all_branch_outputs)

    # here is where we'll use the result of the GridSearch
    for layer in best_deep.layers[:-1]:
        deep_branch = layer(deep_branch)

    final_branch = concatenate([wide_branch, deep_branch])
    final_branch = Dense(units=1,activation='sigmoid')(final_branch)

    model = Model(inputs=all_inputs, outputs=final_branch)

    model.compile(optimizer='adagrad',
                  loss='binary_crossentropy',
                  metrics=['accuracy'])

    print('fitting the whole thing')
    model.fit(X_ints_train+ [X_train_num],
            y_train, epochs=10, batch_size=32, verbose=1, class_weight=best_deep_params['class_weight'])
    
    yhat = np.round(model.predict(X_ints_test + [X_test_num]))
    score = mt.recall_score(y_test,yhat)
    print('Overall Accuracy', score)
    scores.append(score)



training embedings
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Grid Searching the layers of the Deep
Fitting 3 folds for each of 4 candidates, totalling 12 fits


ValueError: pos_label=1 is not a valid label: array([0, 2])

In [None]:
type(X.values)