# Neural Network Pipeline

In this notebook I have built a neural network using keras that uses a sklearn pipeline. The data that is used for this example is quite simple so the pipeline is only used for scaling but the pipeline can also be used for encoding categorical features and many other purposes. The data can be found on the [UCI](http://archive.ics.uci.edu/ml/datasets/HIGGS) website.

In [1]:
# using plaidml to connect to my eGPU
import os

os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"

In [2]:
# Import pandas and numpy to work with data
import pandas as pd
import numpy as np

# Load and read data file
data=pd.read_csv('HIGGS.csv')
data.head()

Unnamed: 0,1.000000000000000000e+00,8.692932128906250000e-01,-6.350818276405334473e-01,2.256902605295181274e-01,3.274700641632080078e-01,-6.899932026863098145e-01,7.542022466659545898e-01,-2.485731393098831177e-01,-1.092063903808593750e+00,0.000000000000000000e+00,...,-1.045456994324922562e-02,-4.576716944575309753e-02,3.101961374282836914e+00,1.353760004043579102e+00,9.795631170272827148e-01,9.780761599540710449e-01,9.200048446655273438e-01,7.216574549674987793e-01,9.887509346008300781e-01,8.766783475875854492e-01
0,1.0,0.907542,0.329147,0.359412,1.49797,-0.31301,1.095531,-0.557525,-1.58823,2.173076,...,-1.13893,-0.000819,0.0,0.30222,0.833048,0.9857,0.978098,0.779732,0.992356,0.798343
1,1.0,0.798835,1.470639,-1.635975,0.453773,0.425629,1.104875,1.282322,1.381664,0.0,...,1.128848,0.900461,0.0,0.909753,1.10833,0.985692,0.951331,0.803252,0.865924,0.780118
2,0.0,1.344385,-0.876626,0.935913,1.99205,0.882454,1.786066,-1.646778,-0.942383,0.0,...,-0.678379,-1.360356,0.0,0.946652,1.028704,0.998656,0.728281,0.8692,1.026736,0.957904
3,1.0,1.105009,0.321356,1.522401,0.882808,-1.205349,0.681466,-1.070464,-0.921871,0.0,...,-0.373566,0.113041,0.0,0.755856,1.361057,0.98661,0.838085,1.133295,0.872245,0.808487
4,0.0,1.595839,-0.607811,0.007075,1.81845,-0.111906,0.84755,-0.566437,1.581239,2.173076,...,-0.654227,-1.274345,3.101961,0.823761,0.938191,0.971758,0.789176,0.430553,0.961357,0.957818


In [3]:
# Split data into training and testing sets

from sklearn.model_selection import train_test_split

x_train, x_test = train_test_split(data, test_size=0.2)

y_train = x_train['1.000000000000000000e+00']
y_test = x_test['1.000000000000000000e+00']

x_train = x_train.drop(['1.000000000000000000e+00'], axis=1)
x_test = x_test.drop(['1.000000000000000000e+00'], axis=1)

In [10]:
# Import Keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras import optimizers
from keras import layers
from keras.wrappers.scikit_learn import KerasClassifier

# Build Neural Network
def nn():
    model = Sequential()
    model.add(Dense(128, input_dim=x_train.shape[1], activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
    return model


In [13]:
# Import sklearn
from sklearn import pipeline
from sklearn import preprocessing

# Build Pipeline for scaling data
estimators = []
estimators.append(('ss', preprocessing.StandardScaler()))
estimators.append(('nn', KerasClassifier(build_fn=nn, epochs=1, batch_size=128)))
pipeline = pipeline.Pipeline(estimators)

In [14]:
# Fit model
pipeline.fit(x_train, y_train)

Epoch 1/1


Pipeline(memory=None,
     steps=[('ss', StandardScaler(copy=True, with_mean=True, with_std=True)), ('nn', <keras.wrappers.scikit_learn.KerasClassifier object at 0x121809550>)])

The model did not end up with a very high prediction accuracy but that was not the point of the notebook. The point of the notebook was to use the sklearn pipeline with a keras neural network and the pipeline worked well. 