# Prediction using AutoEncoder network

Autoencoders are unsupervised neural networks that transform an input to a lower dimension, thus it highlights the major features of the data.
* After trying several sizes for the bottleneck, I noticed that a size less than the input dimension (here, 8) will learn more meaningful representations of the data. If you choose a larger bottleneck, you are giving the network too much flexibility and it becomes almost like a linear identity mapper. If you choose a bottleneck too narrow, it will result in excessive loss of information.
* The representations were learned more accurately when the fall in dimensions in the encoder (and so the rise in dimensions of decoder) were gradual.
* If you introduce noise in the encoder, such as a regularizer, you force the network to learn more information.
* Batch Normalization made a huge difference.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, BatchNormalization, Input
from keras import regularizers
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.linear_model import LogisticRegression

np.set_printoptions(suppress=True)

In [None]:
raw_data = pd.read_csv('../input/pima-indians-diabetes-database/diabetes.csv')
display(raw_data.head())
print(raw_data.Outcome.value_counts())

Since we are going to pass this data into neural networks, we MUST scale the columns. Here I will use the Min-Max method. (Don't forget to use the same transform on both training and test data.)

In [None]:
x_train, x_test, y_train, y_test = train_test_split(raw_data.drop('Outcome',axis=1), raw_data.Outcome, test_size=0.15, stratify=raw_data.Outcome)

In [None]:
train_min = x_train.min()
train_max = x_train.max()
x_train = (x_train - train_min) / (train_max - train_min)
x_test = (x_test - train_min) / (train_max - train_min)

In [None]:
positives = x_train[y_train == 1]
negatives = x_train[y_train == 0]

## The AutoEncoder Architecture

There are no written rules to create an autoencoder network but make sure that it is symmetric around a bottleneck.
* The network has 3 parts - encoder, decoder and bottleneck.
* The 100, 75, 50, 25 layers in encoder and decoder are symmetric.
* The 7-7 layer is the bottleneck.

In [None]:
## input layer 
input_layer = Input(shape=negatives.shape[1:])

## encoding part
encoded = Dense(100, activation='tanh', activity_regularizer=regularizers.l1(10e-5))(input_layer)
encoded = BatchNormalization()(encoded)
encoded = Dense(75, activation='tanh')(encoded)
encoded = BatchNormalization()(encoded)
encoded = Dense(50, activation='relu')(encoded)
encoded = BatchNormalization()(encoded)
encoded = Dense(25, activation='relu')(encoded)
encoded = BatchNormalization()(encoded)
encoded = Dense(7, activation='relu')(encoded)

## decoding part
decoded = Dense(7, activation='relu')(encoded)
decoded = BatchNormalization()(decoded)
decoded = Dense(25, activation='relu')(decoded)
decoded = BatchNormalization()(decoded)
decoded = Dense(50, activation='relu')(decoded)
decoded = BatchNormalization()(decoded)
decoded = Dense(75, activation='tanh')(decoded)
decoded = BatchNormalization()(decoded)
decoded = Dense(100, activation='tanh')(decoded)

## output layer
output_layer = Dense(negatives.shape[1], activation='relu')(decoded)

In [None]:
autoencoder = Model(input_layer, output_layer)
autoencoder.compile(optimizer="adadelta", loss="mse")

We train the autoencoder only over the negative examples in training set (persons without diabetes). The idea is, if the network is able to accurately learn to represent these records, it should misfire for the positive records.

In [None]:
autoencoder.fit(negatives, negatives, batch_size = 15, epochs = 1000, shuffle = True)

## Transforming the Data

Now that the autoencoder network is ready, we will create another network using only the encoding part.
We will use this network to transform both, the train sets and test sets (this includes the positive as well as negative examples).

In [None]:
hidden_representation = Sequential()
hidden_representation.add(autoencoder.layers[0])
hidden_representation.add(autoencoder.layers[1])
hidden_representation.add(autoencoder.layers[2])
hidden_representation.add(autoencoder.layers[3])
hidden_representation.add(autoencoder.layers[4])
hidden_representation.add(autoencoder.layers[5])
hidden_representation.add(autoencoder.layers[6])
hidden_representation.add(autoencoder.layers[7])
hidden_representation.add(autoencoder.layers[8])
hidden_representation.add(autoencoder.layers[9])

In [None]:
x_train_transformed = hidden_representation.predict(x_train)
x_test_transformed = hidden_representation.predict(x_test)

## Prediction

You can now perform your regular classification with this transformed data. I will use a neural network again (because I love it).

In [None]:
predictor = Sequential()
predictor.add(Dense(64, activation='relu', input_shape=x_train_transformed.shape[1:]))
predictor.add(BatchNormalization())
predictor.add(Dropout(0.25))
predictor.add(Dense(64, activation='relu'))
predictor.add(BatchNormalization())
predictor.add(Dense(64, activation='relu'))
predictor.add(BatchNormalization())
predictor.add(Dense(64, activation='tanh'))
predictor.add(BatchNormalization())
predictor.add(Dense(1, activation='sigmoid'))

In [None]:
predictor.compile(optimizer='adam', loss = 'binary_crossentropy', metrics=['accuracy'])
predictor.fit(x_train_transformed, y_train, batch_size=10, epochs=1000, shuffle=True)

The finale !
Let's see how we perform over the test set.

In [None]:
y_predict = predictor.predict(x_test_transformed)

In [None]:
print(classification_report(y_test, y_predict >= 0.35))

## Conclusion

Using a threshold of 0.35, we have accurately identified a high number of the diabetes patients without augmenting the data.