###### """Implementation of mYLogisticRegression using Gradient Descent Technique"""

Binary-Logistic Regression:

In [116]:
# Import numpy for functions like norm, Matmul...
import numpy as np
# Import loadmat from scipy for loading .mat file as a dictionary...
from scipy.io import loadmat

In [117]:
# Function for sigmoid or Logistic function...
def mYLogisticFunction(X,theta):
    """This function returns the probability of x belonging to the true class"""
    return 1/(1+(np.exp(-(np.matmul(theta,X.T)))))

In [118]:
#Function for computing the Gradient Descent...
def Derivative(X,theta,y):
    """This function returns the optimum cost function, which is used for computing the optimum theta"""
    return np.matmul(X.T,(np.array(mYLogisticFunction(X,theta)) - y))

In [119]:
# Main program...
# Load the Training data and their labels into X and y
X = loadmat('mnistTrainImages.mat')
y = loadmat('mnistTrainLabels.mat')
# Save the 'trainData' into X_train from the dictionary 'X' and 'trainLabels' into y_train from dictionary 'y'
X_train = X['trainData']
y_train = y['trainLabels']

In [120]:
# The above Similar steps applied for test data and their labels
XTest = loadmat('mnistTestImages.mat')
yTest = loadmat('mnistTestLabels.mat')
X_Test = XTest['testData']
y_Test = yTest['testLabels']

In [121]:
# Convert y_train, which is list of lists into numpy array
y_train = np.array([i[0] for i in y_train])

In [122]:
# Introduce the bias term to account for the intercept
X_train = np.insert(X_train,0,1,axis = 1)
X_Test = np.insert(X_Test,0,1,axis = 1)

In [123]:
# Initialize the parameters of the model
theta = np.ones(X_train[0].shape)
thetaOld = theta * 9999
# The learning rate
alpha = 0.05
# Calculate the norm using np.linalg.norm()
Norm = np.linalg.norm(theta - thetaOld,ord = 2)

In [129]:
# Find the optimum theta iteratively
while 1:
    thetaOld = theta
    theta = thetaOld - (alpha * Derivative(X_train,theta,y_train))
    if Norm < np.linalg.norm(theta - thetaOld, ord = 2):
        break
    else:
        Norm = np.linalg.norm(theta - thetaOld, ord = 2)
print "Error Tolerance: ", Norm
# Here, we find the error tolerance ADAPTIVELY, instead of fixing it before hand,
# inorder to avoid the local minimum to some extent

Error Tolerance:  14.666470294


  after removing the cwd from sys.path.


In [130]:
# Use theta computed above to calculate/ compute probabilities for the test dataset
predictions = mYLogisticFunction(X_Test, theta)

  after removing the cwd from sys.path.


In [135]:
# Use a suitable P to predict values and check the prediction on the test data
for i in range(predictions.size):
    if predictions[i] > 0.5:
        predictions[i] = 1
    else:
        predictions[i] = 0

In [136]:
# Find the accuracy of our method
count = 0
for i in range(y_Test.size):
    if predictions[i] == y_Test[i]:
        count = count +1
print "Accuracy is ", count/float(y_Test.size)

Accuracy is  0.9907


| Alpha ( Learning Rate) | Error_Tolerance | __init__Theta[i] | Accuracy |
| :---: |:----------: | :----: | :---: |
| 0.1 | 38.26  | 9999 | 99 |
| 0.1 | 27979.62| 999 | 90.2 |
| 0.1 | 2747.5 | 99 | 90.2 |
| 0.05 | 14.67  | 9999 | 99 |
| 0.05 | 2747.5| 999 | 99 |
| 0.05 | 2747.5 | 99 | 90.2 |
| 0.01 | 497.13  | 9999 | 90.2 |
| 0.01 | 497.13| 999 | 90.2 |
| 0.01 | 2747.5 | 99 | 90.2 |

For different values of learning rate Alpha and the parameter initial theta, we have varied accuracies... along with considerable changes in the error tolerances. Due to the Oscillations in the model, we don't observe a good error tolerance in the model.

The Python built-in package for Logistic regression gives the accuracy of 99.23, which is little better than our method which gives 99 accuracy with learning rate of 0.05 and error tolerance of 19.

With respect to training and test accuracies:

| Type of data | sklearn.linear_model |Our Method|
|:-:|:--:|:--:|
|Training|99.41|82.34|
|Test|99.23|99.07|

We observe that our method, though gives less accuracies on training set, performs well with the test data. Where as the linear_model of sklearn also does equally well by getting test accuracy close to that of training.