In [1]:
import numpy as np
import random
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

Read data and understand it. The dataset contains 2 continous feature variables and 1 target categorical variable

In [2]:
test_df = pd.read_csv('train_set.txt',sep=',')

In [3]:
test_df.columns = ['x1','x2','target']

Build new attributes

Define new attributes as per definition below.

X 3 = X1^2 
X4 = X2 
X5 = X1X2

In [4]:
test_df['x3'] = test_df.x1*test_df.x1
test_df['x4'] = test_df.x2*test_df.x2
test_df['x5'] = test_df.x1*test_df.x2

In [5]:
test_df.head(5)

Unnamed: 0,x1,x2,target,x3,x4,x5
0,0.775408,23.986692,r,0.601257,575.361405,18.599466
1,29.170503,-3.287474,r,850.918251,10.807487,-95.897279
2,6.739044,-28.033329,r,45.414707,785.867535,-188.917824
3,3.2161,22.013695,r,10.343297,484.602776,70.798239
4,47.374906,7.925541,g,2244.381691,62.814197,375.471748


In [6]:
simple_df_X = test_df[['x1','x2']]
simple_df_y = test_df['target']

In [7]:
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split


Target categorical variable is converted to nummerical values using one-hot encoding method

In [8]:
# One hot encoding
enc = OneHotEncoder()
Y = enc.fit_transform(simple_df_y[:, np.newaxis]).toarray()

Now, lets scale data using StandardScaler with mean 0 and variance 1

In [9]:
# Scale data to have mean 0 and variance 1 
# which is importance for convergence of the neural network
scaler = StandardScaler()
X_scaled = scaler.fit_transform(simple_df_X)

Data is now split into train and test data

In [10]:
# Split the data set into training and testing
X_train, X_test, Y_train, Y_test = train_test_split(
    X_scaled, Y, test_size=0.5, random_state=2)

In [11]:
n_features = simple_df_X.shape[1]
n_classes = Y.shape[1]

Lets resuse method with minor modifications from TA lecture. 
The modfifcations is to take different number of neurons and activation for each layer on a given network model.

The final output layer is softmax since this is classification task. We could use as well sigmoid linear activation function for binary classification task. For this task, since we do one-hot encoding, we see output values are 0 or 1. So, I choose sigmoid function on final layer vs softmax. This increased the accuracy.

Some of different choice of activations functions tried below are:

sigmoid - Output goes between 0 to 1. This mostly used on the final layer instead of hidden layers. Its not good idea to combine hidden layers with linear activations and final layer with linear activations. Which means its simple logistic regression without learning much

tanh - Output goes between -1 to 1.

Relu - Output goes between 0 to infinity.

Leaky Relu - Output goes between (-infinity to infinity)

Based on observations, few factors improve the model like number of neuron going high, choice of activations and layers. Its evident from below with respect to activations that sigmoid is poor choice on hidden layers and good for final layer when we have binary classification. 

The RELU activations functions on hidden layers (layers between input and output) performs better than 'tanh'. With regards to number of neuron the accucarcy increases number of neurons to learn. As of number of neurons increases accuracy increase along with runtime. So, its trade off to make.

In [27]:
from keras.models import Sequential
from keras.layers import Dense

def create_model_network(input_dim, output_dim, neurons, activations, layers=1, name='model'):
    def create_model():
        # Create model
        model = Sequential(name=name)
        for i in range(layers):
            model.add(Dense(neurons[i], input_dim=input_dim, activation=activations[i]))
        model.add(Dense(output_dim, activation='sigmoid'))

        # Compile model
        model.compile(loss='categorical_crossentropy', 
                      optimizer='adam', 
                      metrics=['accuracy'])
        return model
    return create_model

Below method is enhanced version from our TA lecture nb. This method takes care of reporting metrics for given models along with train, test data

In [28]:
from keras.callbacks import TensorBoard

history_dict = {}

# TensorBoard Callback
cb = TensorBoard()

def measure(models,X_train,Y_train,X_test,Y_test):
    for create_model in models:
        model = create_model()
        print('Model name:', model.name)
        history_callback = model.fit(X_train, Y_train,
                                     batch_size=5,
                                     epochs=50,
                                     verbose=0,
                                     validation_data=(X_test, Y_test),
                                     callbacks=[cb])
        score = model.evaluate(X_test, Y_test, verbose=0)
        print('Test loss:', score[0])
        print('Test accuracy:', score[1])

        history_dict[model.name] = [history_callback, model]

Find the simplest neural networks

Use above method to create multiple models with varying activation, neurons and layer.
Then, for each model print summary.

Create upto 4 models with each model having 'i+1' layers from 1 to 4. The activations and neuron for each layer is passed along as argument.

Now, first lets try relu activation models with varied layers and neurons

In [29]:
activations_list = [
    ['relu','relu','relu','relu'],
    ['relu','relu','relu','relu'],
    ['relu','relu','relu','relu'],
    ['relu','relu','relu','relu']
]
neurons_list = [
    [8,8,8,8],
    [16,16,16,16],
    [32,32,32,32],
    [64,64,64,64]
               ]

model_relu = [create_model_network(n_features, n_classes, neurons_list[i-1], activations_list[i-1], i, 'model_relu_{}-layer'.format(i)) 
          for i in range(1, 4)]

for create_model in model_relu:
    create_model().summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_109 (Dense)            (None, 8)                 24        
_________________________________________________________________
dense_110 (Dense)            (None, 3)                 27        
Total params: 51
Trainable params: 51
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_111 (Dense)            (None, 16)                48        
_________________________________________________________________
dense_112 (Dense)            (None, 16)                272       
_________________________________________________________________
dense_113 (Dense)            (None, 3)                 51        
Total params: 371
Trainable params: 371
Non-trainable params: 0
________________

In [30]:
measure(model_relu,X_train,Y_train,X_test,Y_test)

Model name: model_relu_1-layer
Test loss: 0.20454975152224825
Test accuracy: 0.9005847960187677
Model name: model_relu_2-layer
Test loss: 0.06124815869836779
Test accuracy: 0.9385964901823747
Model name: model_relu_3-layer
Test loss: 0.05101616887582673
Test accuracy: 0.9502923966151232


Now, lets try tanh activation models with varied layers and neurons

In [31]:
activations_list = [
    ['tanh','tanh','tanh','tanh'],
    ['tanh','tanh','tanh','tanh'],
    ['tanh','tanh','tanh','tanh'],
    ['tanh','tanh','tanh','tanh']
]
neurons_list = [
    [8,8,8,8],
   [16,16,16,16],
    [32,32,32,32],
    [64,64,64,64]
               ]

model_tanh = [create_model_network(n_features, n_classes, neurons_list[i-1], activations_list[i-1], i, 'model_tanh_{}-layer'.format(i)) 
          for i in range(1, 4)]

for create_model in model_tanh:
    create_model().summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_127 (Dense)            (None, 8)                 24        
_________________________________________________________________
dense_128 (Dense)            (None, 3)                 27        
Total params: 51
Trainable params: 51
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_129 (Dense)            (None, 16)                48        
_________________________________________________________________
dense_130 (Dense)            (None, 16)                272       
_________________________________________________________________
dense_131 (Dense)            (None, 3)                 51        
Total params: 371
Trainable params: 371
Non-trainable params: 0
________________

In [32]:
measure(model_tanh,X_train,Y_train,X_test,Y_test)

Model name: model_tanh_1-layer
Test loss: 0.4029647450000919
Test accuracy: 0.8801169597614579
Model name: model_tanh_2-layer
Test loss: 0.23033862086067422
Test accuracy: 0.8976608204562762
Model name: model_tanh_3-layer
Test loss: 0.10796705358906795
Test accuracy: 0.8918128672399019


Now, lets try sigmoid activation models with varied layers and neurons

In [25]:
activations_list = [
    ['sigmoid','sigmoid','sigmoid','sigmoid'],
    ['sigmoid','sigmoid','sigmoid','sigmoid'],
    ['sigmoid','sigmoid','sigmoid','sigmoid'],
    ['sigmoid','sigmoid','sigmoid','sigmoid']
]
neurons_list = [
    [8,8,8,8],
    [16,16,16,16],
    [32,32,32,32],
    [64,64,64,64]               
]
model_sigmoid = [create_model_network(n_features, n_classes, neurons_list[i-1], activations_list[i-1], i, 'model_sigmoid_{}-layer'.format(i)) 
          for i in range(1, 4)]

for create_model in model_sigmoid:
    create_model().summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_91 (Dense)             (None, 8)                 24        
_________________________________________________________________
dense_92 (Dense)             (None, 3)                 27        
Total params: 51
Trainable params: 51
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_93 (Dense)             (None, 16)                48        
_________________________________________________________________
dense_94 (Dense)             (None, 16)                272       
_________________________________________________________________
dense_95 (Dense)             (None, 3)                 51        
Total params: 371
Trainable params: 371
Non-trainable params: 0
________________

In [26]:
measure(model_sigmoid,X_train,Y_train,X_test,Y_test)

Model name: model_sigmoid_1-layer
Test loss: 0.46810212975357013
Test accuracy: 0.8684210533287093
Model name: model_sigmoid_2-layer
Test loss: 0.46840840578079224
Test accuracy: 0.8684210533287093
Model name: model_sigmoid_3-layer
Test loss: 0.42832385447987337
Test accuracy: 0.8888888906317147


We already defined new features from definitions below. We added 3 new columns like x3,x4,x5.

X3 = X1^2 

X4 = X2 

X5 = X1X2

Below we will use different subset of data from combinations of x1...x5 features and create datasets

In [35]:
simple_df_X_3_4 = test_df[['x3','x4']]
simple_df_X_3_5 = test_df[['x3','x5']]
simple_df_X_3_4_5 = test_df[['x3','x4','x5']]
simple_df_X_1_2_3_4_5 = test_df[['x1','x2','x3','x4','x5']]

We will create models for features with x3, x4.

We dont have any change on the target variables which is using one-hot encoding values. As discussed previously, we will go with 'sigmoid' for final layer. For hidden layers, we will choose activations functions like relu, tanh and will test with different neurons count. 

Its evident from below, that tanh activation function performs better than relu with high accuracy comparitvely. its consistent irrespective of number of neurons it has.

In [55]:
# Scale data to have mean 0 and variance 1 
# which is importance for convergence of the neural network
scaler = StandardScaler()
X_scaled_3_4 = scaler.fit_transform(simple_df_X_3_4)

# Split the data set into training and testing
X_train_3_4, X_test_3_4, Y_train, Y_test = train_test_split(
    X_scaled_3_4, Y, test_size=0.5, random_state=2)

n_features = simple_df_X_3_4.shape[1]
n_classes = Y.shape[1]

In [58]:
activations_list = [
    ['relu','relu','relu','relu'],
    ['relu','relu','relu','relu'],
    ['tanh','tanh','tanh','tanh'],
    ['tanh','tanh','tanh','tanh']
]
neurons_list = [
    [8,8,8,8],
    [64,64,64,64],
    [8,8,8,8],
    [64,64,64,64]
               ]
model_3_4 = [create_model_network(n_features, n_classes, neurons_list[i-1], activations_list[i-1], i, 'model_{}_{}-layer'.format(activations_list[i-1][0],i)) 
          for i in range(1, 5)]

for create_model in model_relu:
    create_model().summary()

measure(model_3_4,X_train_3_4,Y_train,X_test_3_4,Y_test)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_332 (Dense)            (None, 8)                 24        
_________________________________________________________________
dense_333 (Dense)            (None, 3)                 27        
Total params: 51
Trainable params: 51
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_334 (Dense)            (None, 16)                48        
_________________________________________________________________
dense_335 (Dense)            (None, 16)                272       
_________________________________________________________________
dense_336 (Dense)            (None, 3)                 51        
Total params: 371
Trainable params: 371
Non-trainable params: 0
________________

We will create models for features with x3, x5.

We dont have any change on the target variables which is using one-hot encoding values. As discussed previously, we will go with 'sigmoid' for final layer. For hidden layers, we will choose activations functions like relu, tanh and will test with different neurons count. 

It evident from below results that activation function with relu performs better than tanh functions. As, neuron counts increase so does the accuracy increase.

In [48]:
# Scale data to have mean 0 and variance 1 
# which is importance for convergence of the neural network
scaler = StandardScaler()
X_scaled_3_5 = scaler.fit_transform(simple_df_X_3_5)

# Split the data set into training and testing
X_train_3_5, X_test_3_5, Y_train, Y_test = train_test_split(
    X_scaled_3_5, Y, test_size=0.5, random_state=2)

n_features = simple_df_X_3_5.shape[1]
n_classes = Y.shape[1]

In [59]:
activations_list = [
    ['relu','relu','relu','relu'],
    ['relu','relu','relu','relu'],
    ['tanh','tanh','tanh','tanh'],
    ['tanh','tanh','tanh','tanh']
]
neurons_list = [
    [8,8,8,8],
    [64,64,64,64],
    [8,8,8,8],
    [64,64,64,64]
               ]

model_3_5 = [create_model_network(n_features, n_classes, neurons_list[i-1], activations_list[i-1], i, 'model_{}_{}-layer'.format(activations_list[i-1][0],i))
          for i in range(1, 5)]

for create_model in model_relu:
    create_model().summary()

measure(model_3_5,X_train_3_5,Y_train,X_test_3_5,Y_test)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_355 (Dense)            (None, 8)                 24        
_________________________________________________________________
dense_356 (Dense)            (None, 3)                 27        
Total params: 51
Trainable params: 51
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_357 (Dense)            (None, 16)                48        
_________________________________________________________________
dense_358 (Dense)            (None, 16)                272       
_________________________________________________________________
dense_359 (Dense)            (None, 3)                 51        
Total params: 371
Trainable params: 371
Non-trainable params: 0
________________

We will create models for features with x3, x4, x5.

We dont have any change on the target variables which is using one-hot encoding values. As discussed previously, we will go with 'sigmoid' for final layer. For hidden layers, we will choose activations functions like relu, tanh and will test with different neurons count. 

It evident from below results that activation function with relu performs better than tanh functions. As, neuron counts increase so does the accuracy increase.

In [62]:
# Scale data to have mean 0 and variance 1 
# which is importance for convergence of the neural network
scaler = StandardScaler()
X_scaled_3_4_5 = scaler.fit_transform(simple_df_X_3_4_5)

# Split the data set into training and testing
X_train_3_4_5, X_test_3_4_5, Y_train, Y_test = train_test_split(
    X_scaled_3_4_5, Y, test_size=0.5, random_state=2)

n_features = simple_df_X_3_4_5.shape[1]
n_classes = Y.shape[1]

In [63]:
activations_list = [
    ['relu','relu','relu','relu'],
    ['relu','relu','relu','relu'],
    ['tanh','tanh','tanh','tanh'],
    ['tanh','tanh','tanh','tanh']
]
neurons_list = [
    [8,8,8,8],
    [64,64,64,64],
    [8,8,8,8],
    [64,64,64,64]
               ]
model_3_4_5 = [create_model_network(n_features, n_classes, neurons_list[i-1], activations_list[i-1], i, 'model_{}_{}-layer'.format(activations_list[i-1][0],i)) 
          for i in range(1, 5)]

for create_model in model_relu:
    create_model().summary()

measure(model_3_4_5,X_train_3_4_5,Y_train,X_test_3_4_5,Y_test)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_401 (Dense)            (None, 8)                 24        
_________________________________________________________________
dense_402 (Dense)            (None, 3)                 27        
Total params: 51
Trainable params: 51
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_403 (Dense)            (None, 16)                48        
_________________________________________________________________
dense_404 (Dense)            (None, 16)                272       
_________________________________________________________________
dense_405 (Dense)            (None, 3)                 51        
Total params: 371
Trainable params: 371
Non-trainable params: 0
________________

We will create models for features with x1, x2, x3, x4, x5.

We dont have any change on the target variables which is using one-hot encoding values. As discussed previously, we will go with 'sigmoid' for final layer. For hidden layers, we will choose activations functions like relu, tanh and will test with different neurons count. 

It evident from below results that activation function with tanh performs better than relu functions. As, neuron counts increase so does the accuracy increase.

In [64]:
# Scale data to have mean 0 and variance 1 
# which is importance for convergence of the neural network
scaler = StandardScaler()
X_scaled_1_2_3_4_5 = scaler.fit_transform(simple_df_X_1_2_3_4_5)

# Split the data set into training and testing
X_train_1_2_3_4_5, X_test_1_2_3_4_5, Y_train, Y_test = train_test_split(
    X_scaled_1_2_3_4_5, Y, test_size=0.5, random_state=2)

n_features = simple_df_X_1_2_3_4_5.shape[1]
n_classes = Y.shape[1]

In [65]:
activations_list = [
    ['relu','relu','relu','relu'],
    ['relu','relu','relu','relu'],
    ['tanh','tanh','tanh','tanh'],
    ['tanh','tanh','tanh','tanh']
]
neurons_list = [
    [8,8,8,8],
    [64,64,64,64],
    [8,8,8,8],
    [64,64,64,64]
               ]

model_1_2_3_4_5 = [create_model_network(n_features, n_classes, neurons_list[i-1], activations_list[i-1], i, 'model_{}_{}-layer'.format(activations_list[i-1][0],i)) 
          for i in range(1, 5)]

for create_model in model_relu:
    create_model().summary()

measure(model_1_2_3_4_5,X_train_1_2_3_4_5,Y_train,X_test_1_2_3_4_5,Y_test)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_424 (Dense)            (None, 8)                 24        
_________________________________________________________________
dense_425 (Dense)            (None, 3)                 27        
Total params: 51
Trainable params: 51
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_426 (Dense)            (None, 16)                48        
_________________________________________________________________
dense_427 (Dense)            (None, 16)                272       
_________________________________________________________________
dense_428 (Dense)            (None, 3)                 51        
Total params: 371
Trainable params: 371
Non-trainable params: 0
________________