<a href="https://colab.research.google.com/github/paulowe/ml-lambda/blob/main/colab-train1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Import packages 


In [2]:
import sklearn
import pandas as pd
import numpy as np
import csv as csv
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report
from sklearn import metrics
from sklearn.externals import joblib
from sklearn.preprocessing import label_binarize
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import roc_auc_score



- Verify you are running Version 0.23.1 of sklearn. Some of the packages used for model evaluation only work with this version or higher.

- Run <> to upgrade sklearn

In [3]:
sklearn.__version__

'0.22.2.post1'

## Import Data

X - all training examples
y - all true labels

In [4]:
data = pd.read_csv('./syntheticData.csv')
X, y = data.iloc[:, 1:], data.iloc[:,0]

## Visualize Data 

(80100 * 377) training matrix

(801 * 1) label vector


In [5]:
print(X.head())
print(X.shape)
print(y.head())
print(y.shape)

   Abdominal distention  ...  Wrist weakness
0                     0  ...               0
1                     0  ...               0
2                     0  ...               0
3                     0  ...               0
4                     0  ...               0

[5 rows x 377 columns]
(80100, 377)
0    Abdominal aortic aneurysm
1    Abdominal aortic aneurysm
2    Abdominal aortic aneurysm
3    Abdominal aortic aneurysm
4    Abdominal aortic aneurysm
Name: Conditions_name, dtype: object
(80100,)


## Split into training, cross validation and test sets

- Shuffle dataset

- Perform Split (60-20-20)

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.40, stratify=y)

X_cv, X_test, y_cv, y_test = train_test_split(X_test, y_test, test_size=0.5, stratify=y_test)

print("Training data dimensions")
print(X_train.shape)
print(y_train.shape)

print("Cross validation data dimensions")
print(X_cv.shape)
print(y_cv.shape)

print("Test data dimensions")
print(X_test.shape)
print(y_test.shape)


Training data dimensions
(48060, 377)
(48060,)
Cross validation data dimensions
(16020, 377)
(16020,)
Test data dimensions
(16020, 377)
(16020,)


## Train default MLP Classifier

In [7]:
clf = MLPClassifier()
clf = clf.fit(X_train, y_train)



## Training Variant: Bottom Up implementation

In this variant I will implement an identical classifier to the one we trained above. The objective here is to expose underlying components of the training process and perform direct optimization and monitoring techniques.

- Random initialization for weights
- Feedforward Propagation - Prediction function
- Neural Network Cost Function
- Backpropagation
- Sigmoid Gradient



### Random initialization
Select values for $\Theta^{(l)}$ uniformly in the range $[-\epsilon_{init} , \epsilon_{init}]$
One effective strategy for choosing $\epsilon_{init}$ is to base it on the number of units in the network
$\epsilon_{init} = \frac{\sqrt{6}}{\sqrt{L_{in} + L_{out}}}$

In [None]:
def randInitializeWeights(L_in, L_out):
    """
    randomly initializes the weights of a layer with L_in incoming connections and L_out outgoing connections.
    """
    
    epi = (6**1/2) / (L_in + L_out)**1/2
    
    W = np.random.rand(L_out,L_in +1) *(2*epi) -epi
    
    return W

Initialize Theta Vectors

Here we will randomly intialize theta vecotrs for each layer



In [None]:
input_layer_size  = 400
hidden_layer_size = 25
num_labels = 801

Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size)
Theta2 = randInitializeWeights(hidden_layer_size, num_labels)
nn_params = np.append(Theta1.flatten(),Theta2.flatten())

In [None]:
def sigmoidGradient(z):
    """
    computes the gradient of the sigmoid function
    """
    sigmoid = 1/(1 + np.exp(-z))
    
    return sigmoid *(1-sigmoid)

In [None]:
def predict(Theta1, Theta2, X):
    """
    Predict the label of an input given a trained neural network
    """
    m= X.shape[0]
    X = np.hstack((np.ones((m,1)),X))
    
    a1 = sigmoid(X @ Theta1.T)
    a1 = np.hstack((np.ones((m,1)), a1)) # hidden layer
    a2 = sigmoid(a1 @ Theta2.T) # output layer
    
    #find out why its +1
    return np.argmax(a2,axis=1)+1

In [None]:
pred = predict(Theta1, Theta2, X)
# numEx - is the number of examples in the training set
print("Training Set Accuracy:",sum(pred[:,np.newaxis]==y)[0]/numEx*100,"%")

## Computing Neural Network Cost function

$J(\Theta) = \frac{1}{m} \sum_{i=1}^m \sum_{k=1}^k [-y_k^{(i)} log(h_\Theta(x^{(i)})_k) - ( 1 -y_k^{(i)} log (1-h_\Theta(x^{(i)})_k)] + \frac{\lambda}{2m}[\sum_{j=1}^{25} \sum_{k=1}^{400} (\Theta_{j,k}^{(1)})^2 + \sum_{j=1}^{10} \sum_{k=1}^{25} (\Theta_{j,k}^{(2)})^2]$

## Computing Backpropagation

Implementation of Backpropagation to compute gradients.


In [None]:
def nnCostFunction(nn_params,input_layer_size, hidden_layer_size, num_labels,X, y,Lambda):
    """
    nn_params contains the parameters unrolled into a vector
    
    compute the cost and gradient of the neural network
    """
    # Reshape nn_params back into the parameters Theta1 and Theta2
    Theta1 = nn_params[:((input_layer_size+1) * hidden_layer_size)].reshape(hidden_layer_size,input_layer_size+1)
    Theta2 = nn_params[((input_layer_size +1)* hidden_layer_size ):].reshape(num_labels,hidden_layer_size+1)
    
    m = X.shape[0]
    J=0
    X = np.hstack((np.ones((m,1)),X))
    y10 = np.zeros((m,num_labels))
    
    a1 = sigmoid(X @ Theta1.T)
    a1 = np.hstack((np.ones((m,1)), a1)) # hidden layer
    a2 = sigmoid(a1 @ Theta2.T) # output layer
    
    for i in range(1,num_labels+1):
        y10[:,i-1][:,np.newaxis] = np.where(y==i,1,0)
    for j in range(num_labels):
        J = J + sum(-y10[:,j] * np.log(a2[:,j]) - (1-y10[:,j])*np.log(1-a2[:,j]))
    
    cost = 1/m* J
    reg_J = cost + Lambda/(2*m) * (np.sum(Theta1[:,1:]**2) + np.sum(Theta2[:,1:]**2))
    
    # Implement the backpropagation algorithm to compute the gradients
    
    grad1 = np.zeros((Theta1.shape))
    grad2 = np.zeros((Theta2.shape))
    
    for i in range(m):
        xi= X[i,:] # 1 X 401
        a1i = a1[i,:] # 1 X 26
        a2i =a2[i,:] # 1 X 10
        d2 = a2i - y10[i,:]
        d1 = Theta2.T @ d2.T * sigmoidGradient(np.hstack((1,xi @ Theta1.T)))
        grad1= grad1 + d1[1:][:,np.newaxis] @ xi[:,np.newaxis].T
        grad2 = grad2 + d2.T[:,np.newaxis] @ a1i[:,np.newaxis].T
        
    grad1 = 1/m * grad1
    grad2 = 1/m*grad2
    
    grad1_reg = grad1 + (Lambda/m) * np.hstack((np.zeros((Theta1.shape[0],1)),Theta1[:,1:]))
    grad2_reg = grad2 + (Lambda/m) * np.hstack((np.zeros((Theta2.shape[0],1)),Theta2[:,1:]))
    
    return cost, grad1, grad2,reg_J, grad1_reg,grad2_reg
    

In [None]:
def sigmoidGradient(z):
    """
    computes the gradient of the sigmoid function
    """
    sigmoid = 1/(1 + np.exp(-z))
    
    return sigmoid *(1-sigmoid)

## In Action: Cost Function

Piece up different components defined above to compute cost of our Neural Network (regularized and unregularized)


** predicting an underfitted model

In [None]:
J,reg_J = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, 1)[0:4:3]
print("Cost at parameters (non-regularized):",J,"\nCost at parameters (Regularized):",reg_J)


## Model Evaluation

Model Evaluation is an important part of understanding your model performance. 

For that matter it is crucial to choose a good evaluation metric you can monitor. In our case Accuracy makes the most sense.

We will monitor

- Accuracy on Test (clf)
- AUC (implementation requires sklearn v0.23.1 +) 

- Accuracy on Test (eng)
- AUC

- Accuracy other vairants (vnt)
- AUC


In [8]:
# Accuracy
testsetPred = clf.predict(X_test)
accuracy_score(y_test, testsetPred)

#AUC
#roc_auc_score(y_test, testsetPred, multi_class='ovr')

0.8111111111111111

## Serialize Model Variant

Serialize the classifier you like 

(1) Default Sklearn Model (clf)

(2) Variant 1 (eng)

(3) Variant 2

(4) Variant 3

In [None]:
"""
Serialize Model
"""
joblib.dump(clf, 'mlp.pkl')