# Logistic Regression

In this exercise you will implement the logistic regression algorithm and use your code to learn to classify images of digits from the MNIST dataset. The MNIST database is a large database of handwritten digits that is commonly used for training and testing in the field of machine learning. Here are some sample images from the dataset:

<img src="mnist_sample.png", height="200" width="200">

## Task 1: Load the Data

Using the np.loadtxt() function, import all the data.

In [1]:
import numpy as np
import math

"""
    For digits 5 vs. 8, use the datasets starts with 58***.gz
"""

training_images = np.loadtxt("58Train_Images.gz")
labels = np.loadtxt("58Train_Labels.gz")
test_images = np.loadtxt("58Test_Images.gz")
test_labels = np.loadtxt("58Test_Labels.gz")

## Initialization

In [2]:
"""
Initialize w and b using a normal distribution N ~ (0, 0.1).
Note: we do not pre-pend 1 in our data. b represent w0 in our slides.
Set Learning rate L = 0.01.
Set numImages, numTestingImages to the number of training and test images respectively.

Initialized variables are global.
"""
w = np.random.normal(0, math.sqrt(0.1), training_images.shape[1])
b = np.random.normal(0, math.sqrt(0.1), 1)
L = 0.01
numImages = training_images.shape[0]
numTestingImages = test_images.shape[0]

In [3]:
def sigmoid(v):
    return 1 / (1 + np.exp(-v))

## Softmax
Implement the softmax activation function below. 

In [4]:
def softmax(v):
    """Returns the softmax of a vector v
    
    Input:
    - v: a vector that needs to be normalized
    
    Output:
    A vector whose entries have been normalized according to the softmax activation function.
    """
    e_v = np.exp(v - np.max(v))
    return e_v / e_v.sum(axis=0)


## Batch Gradient Descent
Write a procedure that updates w and b with a gradient descent step.

In [5]:
def train_batch(imags, labels, w, b):
    """Updates the parameters w and b according to the gradient descent algorithm in our slides
       Note: b is w0 in our slides
    
    Inputs:
    - imags:  Batch of images
              Note: Each image is in the form of a (1 x 784) list of normalized pixel values)
    - labels: Batch of labels
              Note: Each label is a one-hot vectors corresponding to the correct answer ([0,1] for one class 
                                                                                     and [1,0] for the other)
    
    Output:
    None
    """
    iteration = 300
    eps = 0.0001
    iteration_count = 0
    while True:
        iteration_count = iteration_count + 1
        gradient = np.zeros_like(w)
        for img, label in zip(training_images, labels):
            binary_label = label[1] # get binary label form one-hot label. If one-hot label is [0, 1], it means
                                    # the data sample has label 1. If one-hot label is [1, 0], it means label 0
                                    # one-hot[1] will always give you the one dimension binary label.
            posteriori = sigmoid(np.dot(img, w) + b) # this is (P(yi = 1 | xi))
            b -= (posteriori - binary_label)/numImages
            gradient = gradient + img*(posteriori - binary_label)
        w -= L*gradient/numImages
        if (iteration_count == iteration) or (np.linalg.norm(gradient) < eps):
            break


### Prediction
We will now write a function that returns True if our network produced a correct answer, and false otherwise:

In [6]:
def predict(imag, label):
    """Returns True if our model predicts the correct answer
    
    Inputs:
    - imag: An image in the form of a (1 x 784) list of pixel values
    - label: A one-hot vector corresponding to the correct answer ([0,1] for one class and [1,0] for the other)
    
    Outputs:
    A boolean representing whether the network succesfully predicted the correct class.
    """
    posteriori = sigmoid(np.dot(imag, w))
    if posteriori >= 0.5:
        if label[1] == 1:
            return True
        else:
            return False
    else:
        if label[1] == 1:
            return False
        else:
            return True
   

## Training the Network and Testing our Results
Now that we have all these functions at our disposal, let's train our network and see how it does!

In [7]:
# Training the network
train_batch(training_images, labels, w, b)
    
correct = 0  # Number of correct predictions the network makes

# Test Accuracy
for i in range(numTestingImages):
    if predict(test_images[i], test_labels[i]):
        correct += 1

# Train Accuracy
train_correct = 0
for i in range(numImages):
    if predict(training_images[i], labels[i]):
        train_correct += 1

print("Training Accuracy: " + str(float(train_correct) / len(training_images)))
print("Testing Accuracy: " + str(float(correct) / len(test_images)))


Training Accuracy: 0.6171453822359699
Testing Accuracy: 0.6237942122186495


Now try the same with the other data. Notice how the network performs much better when asked to classify 1s and 0s compared to the task of classifying 8s and 5s. Later we will find a way to improve our performance with 8s and 5s using tensorflow.