# Lab 04

## Conrad Appel, Erik Gabrielson, Danh Nguyen

### Preparation and Overview (30 points total)

[5 points] Explain the task and what business-case or use-case it is designed to solve (or designed to investigate). Detail exactly what the classification task is and what parties would be interested in the results.

[10 points] (mostly the same processes as from lab one) Define and prepare your class variables. Use proper variable representations (int, float, one-hot, etc.). Use pre-processing methods (as needed) for dimensionality reduction, scaling, etc. Remove variables that are not needed/useful for the analysis. Describe the final dataset that is used for classification/regression (include a description of any newly formed variables you created).

[15 points] Divide you data into training and testing data using an 80% training and 20% testing split. Use the cross validation modules that are part of scikit-learn. Argue for or against splitting your data using an 80/20 split. That is, why is the 80/20 split appropriate (or not) for your dataset?  

### Modeling (50 points total)

[20 points] Create a custom, one-versus-all logistic regression classifier using numpy and scipy to optimize. Use object oriented conventions identical to scikit-learn. You should start with the template used in the course. You should add the following functionality to the logistic regression classifier:
Ability to choose optimization technique when class is instantiated: either steepest descent, stochastic gradient descent, or Newton's method. Update the gradient calculation to include a customizable regularization term (either using no regularization, L1 regularization, L2 regularization, or both L1/L2 norm of the weights). Associate a cost with the regularization term, "C", that can be adjusted when the class is instantiated.  

[15 points] Train your classifier to achieve good generalization performance. That is, adjust the optimization technique and the value of the regularization term "C" to achieve the best performance on your test set. Is your method of selecting parameters justified? That is, do you think there is any "data snooping" involved with this method of selecting parameters?

[15 points] Compare the performance of your "best" logistic regression optimization procedure to the procedure used in scikit-learn. Visualize the performance differences in terms of training time, training iterations, and memory usage while training. Discuss the results. 

### Deployment (10 points total)

Which implementation of logistic regression would you advise be used in a deployed machine learning model, your implementation or scikit-learn (or other third party)? Why?

### Exceptional Work (10 points total)

You have free reign to provide additional analyses.
One idea: Make your implementation of logistic regression compatible with the GridSearchCV function that is part of scikit-learn.

In [4]:
import numpy as np
import pandas as p
from scipy.optimize import fmin_bfgs
from scipy.special import expit

In [59]:
class BinaryClassifierBase:
    def __init__(self, eta, iterations=20, cost=0.001, norm=2):
        self.eta = eta
        self.cost = cost
        self.iters = iterations
        self.norm = norm
    
    def normalize(self, w, gradient):
        # regularization (does both if 3)
        if self.norm & 1: # L1 norm
            gradient[1:] += -1 * w[1:] * self.cost
        elif self.norm & 2: # L2 norm
            gradient[1:] += -2 * w[1:] * self.cost
    
    def fit(self, x, y):
        self.w_ = np.zeros((x.shape[1],1))
        for _ in range(self.iters):
            gradient = self._get_gradient(x,y)
            self.normalize(self.w_, gradient)
            self.w_ += gradient*self.eta
    
    def predict_proba(self,x):
        return 1/(1+np.exp(-(x @ self.w_)))
    
    def predict(self, x):
        return (self.predict_proba(x)>0.5)
    
    
class BinaryStochDescClassifier(BinaryClassifierBase):
    def _get_gradient(self, x, y):
        idx = int(np.random.rand()*len(y)) # grab random instance
        ydiff = y[idx]-self.predict_proba(x[idx]) # get y difference (now scalar)
        gradient = x[idx] * ydiff[:,np.newaxis] # make ydiff a column vector and multiply through
        gradient = gradient.reshape(self.w_.shape)
        return gradient
    
    
class BinarySteepDescClassifier(BinaryClassifierBase):
    def _get_gradient(self, x, y):
        ydiff = y-self.predict_proba(x).ravel()
        gradient = np.mean(x * ydiff[:,np.newaxis], axis=0)
        return gradient.reshape(self.w_.shape)

    
class BinaryNewtonClassifier(BinaryClassifierBase):
    def fit(self, x, y):
        def obj_fn(w, x, y, c):
            g = expit(x @ w)
            return -np.sum(np.log(g[y==1]))-np.sum(np.log(1-g[y==0])) + c*sum(w**2)
        
        def obj_grad(w, x, y, c):
            g = expit(x @ w)
            ydiff = y-g
            gradient = np.mean(x * ydiff[:,np.newaxis], axis=0)
            gradient = gradient.reshape(w.shape)
            self.normalize(w, gradient)
            return -gradient
        
        self.w_ = fmin_bfgs(obj_fn, 
                            np.zeros((x.shape[1], 1)), 
                            fprime=obj_grad, 
                            args=(x, y, self.cost), 
                            gtol=1e-03, 
                            maxiter=self.iters,
                            disp=False).reshape((x.shape[1], 1))

        
class LogRegClassifier:
    def __init__(self, eta, iterations=20, optimize='steepdesc', cost=0.001, norm=2):
        typesofoptimize = {
            'steepdesc': BinarySteepDescClassifier, 
            'stochdesc': BinaryStochDescClassifier, 
            'newton': BinaryNewtonClassifier
        }
        if optimize not in typesofoptimize.keys():
            raise ValueError('optimize must be one of: ' + ' '.join(typesofoptimize.keys()))
        
        self.eta = eta
        self.iters = iterations
        self.optimize = optimize
        self.classifier = typesofoptimize[optimize]
        self.cost = cost
        self.norm = norm
        self.classifiers = [] # fill with binary classifiers during fit
    
    def _add_bias(self, x):
        return np.hstack((np.ones((x.shape[0],1)),x))
    
    def fit(self, x, y):
        Xb = self._add_bias(x)
        classes = np.unique(y)
        
        for cl in classes:
            cur_y = y==cl
            cur_classifier = self.classifier(self.eta, self.iters, cost=self.cost, norm=self.norm)
            cur_classifier.fit(x, cur_y)
            self.classifiers.append(cur_classifier)
    
    def predict(self, x):
        if not self.classifiers:
            raise RuntimeError('Classifier not fit!')
        
        probabilities = []
        for classifier in self.classifiers:
            probabilities.append(classifier.predict_proba(x))
        probabilities = np.hstack(probabilities)
        return np.argmax(probabilities,axis=1)

In [64]:
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
ds = load_iris()
kwargs = {
    'norm': 2,
    'cost': .1,
    'iterations': 500
}
regrs = {
    'newton': LogRegClassifier(.1, optimize='newton', **kwargs),
    'stoch': LogRegClassifier(.1, optimize='stochdesc', **kwargs), # Different results every time, random
    'steep': LogRegClassifier(.1, optimize='steepdesc', **kwargs)
}
for regr in regrs.values():
    regr.fit(ds.data, ds.target)

for key, val in regrs.items():
    res = val.predict(ds.data)
    print(key + ': ' +str(accuracy_score(ds.target, res)))

steep: 0.813333333333
newton: 0.813333333333
stoch: 0.666666666667
