# CAP 5610 - Introduction to Machine Learning <br>Florida International University - Fall 2018
## Example Set #3 - Logistic Regression

Here, we're using sklearn's [make_classification](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html#sklearn.datasets.make_classification) function to generate a classification dataset that will be used to solve the problems below.

In [5]:
%matplotlib widget
import numpy as np 
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import linear_model, datasets

# number of samples to generate
n_samples = 50
n_classes = 2

# generate the dataset
X, y = datasets.make_classification(n_samples, n_classes, n_informative=2, n_redundant=0, random_state=0)
print(X.shape)

# visualize the dataset
fig = plt.figure()
plt.scatter(X[:, 0], X[:, 1], marker='o', c=y, s=25, edgecolor='k')

(50, 2)


FigureCanvasNbAgg()

<matplotlib.collections.PathCollection at 0x7f4d6290be48>

### Problem 1 - Logistic Regression with gradient ascent

Develop a logistic regression model and train it using gradient ascent with the generated dataset.

**Answer:** Recall the following (from the logistic regression videos)

hypothesis: $h_\theta(x)=g(\theta^Tx)=\frac{1}{1+e^{-\theta^Tx}}$

sigmoid: $g(z)=\frac{1}{1+e^{-z}}$

Stochastic gradient ascent: $\theta_j:=\theta_j+\alpha(y^{(i)} - h_\theta(x^{(i)})x_j^{(i)})$

In [6]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def loglikelihood(h, y):
    return np.dot(y, np.log(h)) + np.dot(1 - y, np.log(1 - h))

def stochastic_gradient_ascent(X, y, alpha=1e-4, max_iterations=10000):
    # get input shape
    input_shape = X.shape

    # get number of samples
    m = input_shape[0]

    # initialize weights
    theta = np.ones(3)
    #     theta = np.random.rand(2)*100

    # add constant column to X to take advantage of numpy matrix mult
    ones = np.ones((m, 1))
    X = np.append(X, ones, 1)
    
    llh = []

    # update weights iteratively until max iterations
    for i in range(max_iterations):
        h = sigmoid(np.dot(X, theta))
        llh.append(loglikelihood(h, y))
        if i > 1 and abs(llh[i] - llh[i - 1]) < 0.001:
            break
        for i in range(m):
            theta = theta + alpha * (y[i] - h[i]) * X[i]

    # compute predictions for learned weights
    predictions = sigmoid(np.dot(X, theta))

    return theta, predictions, llh

weights, predictions, llh = stochastic_gradient_ascent(X, y)
weights

array([0.0970719 , 2.68853938, 0.02957662])

In [7]:
x = np.linspace(-4, 4, 100)
a = -weights[0] / weights[1]
b = weights[2]
decision_boundary = a*x + b 
plt.figure()
plt.scatter(X[:, 0], X[:, 1], marker='o', c=y, s=25, edgecolor='k')
plt.plot(x, decision_boundary)

FigureCanvasNbAgg()

[<matplotlib.lines.Line2D at 0x7f4d9c2c05f8>]

Let's see how good we're doing. We'll use sklearns f1 score to measure the performance of the classifier. See [f1_score](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score). Remember, the output of the classifier is the probability that the class is positive, or 1, so in order to turn it into a class label, we'll have to threshold the probabilities, so that $p\geq0.5$ is 1 and $p\lt0.5$ is 0.

In [8]:
from sklearn.metrics import f1_score

def threshold(input, level):
    input[input >= level] = 1
    input[input < level] = 0
    return input

y_pred = threshold(predictions,0.5)

print(y_pred)
print(f1_score(y, y_pred))

[1. 0. 1. 0. 1. 0. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 1. 0. 0.
 1. 1. 1. 0. 0. 1. 1. 0. 1. 0. 0. 1. 0. 1. 0. 1. 1. 1. 0. 1. 1. 0. 0. 0.
 1. 0.]
0.9803921568627451


### Problem 2 - Logistic Regression with Newton's method
Based on your work from problem 1, train a logistic classifier with Newton's method with the same generated dataset and compare the result.

### Problem 3 - Naive Bayes Classification
Consider the following dataset:



In [9]:
import pandas as pd
data = pd.DataFrame()

# Target labels
data['gender'] = ['male','male','male','male','female','female','female','female']

# features
data['height'] = [6,5.92,5.58,5.92,5,5.5,5.42,5.75]
data['weight'] = [180,190,170,165,100,150,130,150]
data['shoe size'] = [12,11,12,10,6,8,7,9]

data

Unnamed: 0,gender,height,weight,shoe size
0,male,6.0,180,12
1,male,5.92,190,11
2,male,5.58,170,12
3,male,5.92,165,10
4,female,5.0,100,6
5,female,5.5,150,8
6,female,5.42,130,7
7,female,5.75,150,9


In [10]:
person = pd.DataFrame()

# unknow person's features
person['height'] = [6]
person['weight'] = [130]
person['shoe size'] = [8]

person

Unnamed: 0,height,weight,shoe size
0,6,130,8


In [18]:
# calculate priors

# gender
number_of_males = data['gender'][data['gender'] == 'male'].count()
number_of_females = data['gender'][data['gender'] == 'female'].count()
total = data['gender'].count()
# 
P_male = number_of_males/total
P_female = number_of_females/total

# calculate likelihoods
# we also need the means for each feature by gender,
# so we can group it and use pandas to compute the means
data_means = data.groupby('gender').mean()

# View the values
data_means
P_female


0.5

In [12]:
# same with the variance of each feature by gender
data_variance = data.groupby('gender').var()

# View the values
data_variance

Unnamed: 0_level_0,height,weight,shoe size
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.097225,558.333333,1.666667
male,0.035033,122.916667,0.916667


We have just computed the statistics for each feature for each gender. Since we are assuming that the features are from a Gaussian distribution, we can compute the p(x|y) as follows.

In [13]:
# calculates p(x | y):
def p_x_given_y(x, mean_y, variance_y):

    # Input the arguments into a probability density function
    p = 1/(np.sqrt(2*np.pi*variance_y)) * np.exp((-(x-mean_y)**2)/(2*variance_y))
    
    # return p
    return p

So, if we store the likelihoods out for ease of use, we get:

In [14]:
# Means for male
male_height_mean = data_means['height'][data_variance.index == 'male'].values[0]
male_weight_mean = data_means['weight'][data_variance.index == 'male'].values[0]
male_footsize_mean = data_means['shoe size'][data_variance.index == 'male'].values[0]

# Variance for male
male_height_variance = data_variance['height'][data_variance.index == 'male'].values[0]
male_weight_variance = data_variance['weight'][data_variance.index == 'male'].values[0]
male_footsize_variance = data_variance['shoe size'][data_variance.index == 'male'].values[0]

# Means for female
female_height_mean = data_means['height'][data_variance.index == 'female'].values[0]
female_weight_mean = data_means['weight'][data_variance.index == 'female'].values[0]
female_footsize_mean = data_means['shoe size'][data_variance.index == 'female'].values[0]

# Variance for female
female_height_variance = data_variance['height'][data_variance.index == 'female'].values[0]
female_weight_variance = data_variance['weight'][data_variance.index == 'female'].values[0]
female_footsize_variance = data_variance['shoe size'][data_variance.index == 'female'].values[0]

So now, in order to classify our person, we need to compute the probability of gender given the feature for each gender and choose the gender label that gives the highest probability.

In [15]:
# Numerator of the posterior if the unclassified observation is a male
P_male * \
p_x_given_y(person['height'][0], male_height_mean, male_height_variance) * \
p_x_given_y(person['weight'][0], male_weight_mean, male_weight_variance) * \
p_x_given_y(person['shoe size'][0], male_footsize_mean, male_footsize_variance)

6.197071843878078e-09

In [16]:
# Numerator of the posterior if the unclassified observation is a female
P_female * \
p_x_given_y(person['height'][0], female_height_mean, female_height_variance) * \
p_x_given_y(person['weight'][0], female_weight_mean, female_weight_variance) * \
p_x_given_y(person['shoe size'][0], female_footsize_mean, female_footsize_variance)

0.0005377909183630018