# CSCI-UA 0473 - Introduction to Machine Learning
## Wednesday, March 1, 2017

In the last class, you saw about using multinomial logistic regression in sklearn. In this programming assignment, you will be implementing multinomial logistic regression using numpy. 

The required libraries are imported for you and they are sufficient.

In [1]:
import autograd.numpy as np
from sklearn.datasets import make_blobs
from autograd import grad
import scipy.optimize

## 1. Data

For simplicity and understanding, let's restrict to just 3 classes.

In [2]:
# Sample dataset preparation

n_dim = 2
x_train, y_train = make_blobs(n_samples=200, n_features=n_dim, centers=[[2,2],[0,-3], [-2, 2]], shuffle=True)
x_test, y_test = make_blobs(n_samples=100, n_features=n_dim, centers=[[2,2],[0,-3], [-2, 2]], shuffle=True)

In [3]:
# Adds 1 at the end of each data vector in both training and test data

x_train = np.insert(x_train, 2, 1, axis=1)
x_test = np.insert(x_test, 2, 1, axis=1)

In [4]:
# Class distribution
np.bincount(y_train)

array([67, 67, 66])

## 2. Model Definition

In [5]:
'''
Definition of the multinomial logistic regression model.

INPUT: Feature vector (x) and weight matrix (w)
OUTPUT: The probability of each data point belonging to each class. If you have 'm' data points and 'k' classes, this 
        function should return a matrix of dimension (m X k) with values in each row summing to 1, as per definition.
'''

def multinomial_logreg(x, w):
    w = np.reshape(w, (3, 3))
    a = np.dot(x, w)
    return np.array([np.exp(i) / np.sum(np.exp(i)) for i in a])

In [6]:
'''
Distance function of the multinomial logistic regression model (popularly called cross-entropy loss). 

INPUT: True labels (y), feature vector (x) and weight vector (w)
OUTPUT: Log of the likelihood for the given 'w'
'''

def multinomial_lr_distance(y, x, w):
    y_ = multinomial_logreg(x, w)
    return np.sum([-np.log(y_[i][y[i]]) for i in xrange(len(y))]) 

In [7]:
def cost(w, x, y):
    return multinomial_lr_distance(y, x, w)

# Computing the gradient
multinomial_lr_rule = grad(cost)

In [8]:
def _multinomial_lr_dist(w, x, y):
    return multinomial_lr_distance(y, x, w), multinomial_lr_rule(w, x, y)

## 3. Training 

In [9]:
# Random starting point created for you
w0 = 0.01 * np.random.randn(3, 3); w0[:, -1] = 0.
w = np.copy(w0)
# TO DO: Use scipy.optimize
trained_model = scipy.optimize.minimize(_multinomial_lr_dist, w0, (x_train, y_train), jac=True)

In [10]:
# TO DO: Print the learned weight matrix
print trained_model.x
w = trained_model.x

[ 18.17731745   1.41502408 -19.6041589    9.15491323 -13.96458741
   4.80269389   5.75645394  -1.04116878  -4.72659675]


## 4. Testing

In [11]:
# TO DO: Predict the class for test dataset
lr = multinomial_logreg(x_test, w)
err = 0.0
for i in range(0, len(y_test)-1):
    ref = y_test[i]
    res = np.argmax(lr[i])
    if(ref != res):
        err += 1
err = err / len(y_test)

In [12]:
# TO DO: Calculate the accuracy of test predictions

print 'Test Accuracy =', err

Test Accuracy = 0.05
