# Lab 04

## Conrad Appel, Erik Gabrielson, Danh Nguyen

### Preparation and Overview (30 points total)

[5 points] Explain the task and what business-case or use-case it is designed to solve (or designed to investigate). Detail exactly what the classification task is and what parties would be interested in the results.

[10 points] (mostly the same processes as from lab one) Define and prepare your class variables. Use proper variable representations (int, float, one-hot, etc.). Use pre-processing methods (as needed) for dimensionality reduction, scaling, etc. Remove variables that are not needed/useful for the analysis. Describe the final dataset that is used for classification/regression (include a description of any newly formed variables you created).

[15 points] Divide you data into training and testing data using an 80% training and 20% testing split. Use the cross validation modules that are part of scikit-learn. Argue for or against splitting your data using an 80/20 split. That is, why is the 80/20 split appropriate (or not) for your dataset?  

### Modeling (50 points total)

[20 points] Create a custom, one-versus-all logistic regression classifier using numpy and scipy to optimize. Use object oriented conventions identical to scikit-learn. You should start with the template used in the course. You should add the following functionality to the logistic regression classifier:
Ability to choose optimization technique when class is instantiated: either steepest descent, stochastic gradient descent, or Newton's method. Update the gradient calculation to include a customizable regularization term (either using no regularization, L1 regularization, L2 regularization, or both L1/L2 norm of the weights). Associate a cost with the regularization term, "C", that can be adjusted when the class is instantiated.  

[15 points] Train your classifier to achieve good generalization performance. That is, adjust the optimization technique and the value of the regularization term "C" to achieve the best performance on your test set. Is your method of selecting parameters justified? That is, do you think there is any "data snooping" involved with this method of selecting parameters?

[15 points] Compare the performance of your "best" logistic regression optimization procedure to the procedure used in scikit-learn. Visualize the performance differences in terms of training time, training iterations, and memory usage while training. Discuss the results. 

### Deployment (10 points total)

Which implementation of logistic regression would you advise be used in a deployed machine learning model, your implementation or scikit-learn (or other third party)? Why?

### Exceptional Work (10 points total)

You have free reign to provide additional analyses.
One idea: Make your implementation of logistic regression compatible with the GridSearchCV function that is part of scikit-learn.

In [9]:
import numpy as np
import pandas as p
from scipy.optimize import fmin_bfgs
from scipy.special import expit
from sklearn.model_selection import GridSearchCV
from sklearn.utils.estimator_checks import check_estimator
from sklearn.utils.validation import check_X_y, check_array, check_is_fitted
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

In [2]:
class BinaryClassifierBase:
    def __init__(self, eta, iters=20, cost=0.001, norm=2):
        self.eta = eta
        self.cost = cost
        self.iters = iters
        self.norm = norm
    
    def normalize(self, w, gradient):
        # regularization (does both if 3)
        if self.norm & 1: # L1 norm
            gradient[1:] += -1 * w[1:] * self.cost
        elif self.norm & 2: # L2 norm
            gradient[1:] += -2 * w[1:] * self.cost
    
    def fit(self, x, y):
        self.w_ = np.zeros((x.shape[1],1))
        for _ in range(self.iters):
            gradient = self._get_gradient(x,y)
            self.normalize(self.w_, gradient)
            self.w_ += gradient*self.eta
    
    def predict_proba(self,x):
        return 1/(1+np.exp(-(x @ self.w_)))
    
    def predict(self, x):
        return (self.predict_proba(x)>0.5)
    
    
class BinaryStochDescClassifier(BinaryClassifierBase):
    def _get_gradient(self, x, y):
        idx = int(np.random.rand()*len(y)) # grab random instance
        ydiff = y[idx]-self.predict_proba(x[idx]) # get y difference (now scalar)
        gradient = x[idx] * ydiff[:,np.newaxis] # make ydiff a column vector and multiply through
        gradient = gradient.reshape(self.w_.shape)
        return gradient
    
    
class BinarySteepDescClassifier(BinaryClassifierBase):
    def _get_gradient(self, x, y):
        ydiff = y-self.predict_proba(x).ravel()
        gradient = np.mean(x * ydiff[:,np.newaxis], axis=0)
        return gradient.reshape(self.w_.shape)

    
class BinaryNewtonClassifier(BinaryClassifierBase):
    def fit(self, x, y):
        def obj_fn(w, x, y, c):
            g = expit(x @ w)
            return -np.sum(np.log(g[y==1]))-np.sum(np.log(1-g[y==0])) + c*sum(w**2)
        
        def obj_grad(w, x, y, c):
            g = expit(x @ w)
            ydiff = y-g
            gradient = np.mean(x * ydiff[:,np.newaxis], axis=0)
            gradient = gradient.reshape(w.shape)
            self.normalize(w, gradient)
            return -gradient
        
        self.w_ = fmin_bfgs(obj_fn, 
                            np.zeros((x.shape[1], 1)), 
                            fprime=obj_grad, 
                            args=(x, y, self.cost), 
                            gtol=1e-03, 
                            maxiter=self.iters,
                            disp=False).reshape((x.shape[1], 1))

        
class LogRegClassifier:
    def __init__(self, eta=.0001, iters=20, optimize='steepdesc', cost=0.001, norm=2):
        self._set_optimization(optimize)
        self.eta = eta
        self.iters = iters
        self.cost = cost
        self.norm = norm
        
        self.classifiers = [] # fill with binary classifiers during fit
        self._estimator_type = 'classifier'
    
    def _add_bias(self, x):
        return np.hstack((np.ones((x.shape[0],1)),x))
    
    def _set_optimization(self, optimize):
        typesofoptimize = {
            'steepdesc': BinarySteepDescClassifier, 
            'stochdesc': BinaryStochDescClassifier, 
            'newton': BinaryNewtonClassifier
        }
        if optimize not in typesofoptimize.keys():
            raise ValueError('optimize must be one of: ' + ' '.join(typesofoptimize.keys()))
            
        self.optimize = optimize
        self.classifier = typesofoptimize[optimize]
        
    def fit(self, x, y):
        x, y = check_X_y(x, y)
        
        Xb = self._add_bias(x)
        self.classes_ = np.unique(y)
        self.X_ = x
        self.y_ = y
        
        for cl in self.classes_:
            cur_y = y==cl
            cur_classifier = self.classifier(self.eta, self.iters, cost=self.cost, norm=self.norm)
            cur_classifier.fit(x, cur_y)
            self.classifiers.append(cur_classifier)
        return self
    
    def predict(self, x):
        check_is_fitted(self, ['X_', 'y_'])
        x = check_array(x)
        
        probabilities = []
        for classifier in self.classifiers:
            probabilities.append(classifier.predict_proba(x))
        probabilities = np.hstack(probabilities)
        return np.argmax(probabilities,axis=1)
    
    def score(self, x, y):
        res = self.predict(x)
        return accuracy_score(y, res)
    
    def set_params(self, **parameters):
        for parameter, value in parameters.items():
            setattr(self, parameter, value)
        return self
            
    def get_params(self, deep=True):
        return {
            'eta': self.eta,
            'iters': self.iters,
            'optimize': self.optimize,
            'cost': self.cost,
            'norm': self.norm
        }
    
check_estimator(LogRegClassifier)

In [3]:

ds = load_iris()
kwargs = {
    'norm': 2,
    'cost': .001,
    'iters': 500
}
regrs = {
    'newton': LogRegClassifier(eta=.1, optimize='newton', **kwargs),
    'stoch': LogRegClassifier(eta=.1, optimize='stochdesc', **kwargs), # Different results every time, random
    'steep': LogRegClassifier(eta=.1, optimize='steepdesc', **kwargs)
}
for regr in regrs.values():
    regr.fit(ds.data, ds.target)

for key, val in regrs.items():
    res = val.predict(ds.data)
    print(key + ': ' +str(accuracy_score(ds.target, res)))


stoch: 0.693333333333
steep: 0.953333333333
newton: 0.966666666667


In [4]:
methods = ['newton', 'stochdesc', 'steepdesc']
etas = [.0001, .001, .01, .1, .25, .5]
norms = [0, 1, 2, 3]
costs = [.0001, .001, .01, .1, .25, .5]
optimize_results = {}
method_res = {method: list() for method in methods}
eta_res = {str(eta): list() for eta in etas}
norm_res = {str(norm): list() for norm in norms}
cost_res = {str(cost): list() for cost in costs}

for method in methods:
    for eta in etas:
        for norm in norms:
            for cost in costs:
                regr = LogRegClassifier(eta, optimize=method, cost=cost, norm=norm, iters=500)
                regr.fit(ds.data, ds.target)
                res = regr.predict(ds.data)
                acc = accuracy_score(ds.target, res)
                method_res[method].append(acc)
                eta_res[str(eta)].append(acc)
                cost_res[str(cost)].append(acc)
                norm_res[str(norm)].append(acc)
                s = "method: "+method+", eta: "+str(eta)+", norm: "+str(norm)+", cost: "+str(cost)+" ----> "+str(acc)+"%"
                print(s)

method: newton, eta: 0.0001, norm: 0, cost: 0.0001 ----> 0.96%
method: newton, eta: 0.0001, norm: 0, cost: 0.001 ----> 0.966666666667%
method: newton, eta: 0.0001, norm: 0, cost: 0.01 ----> 0.973333333333%
method: newton, eta: 0.0001, norm: 0, cost: 0.1 ----> 0.966666666667%
method: newton, eta: 0.0001, norm: 0, cost: 0.25 ----> 0.966666666667%
method: newton, eta: 0.0001, norm: 0, cost: 0.5 ----> 0.953333333333%
method: newton, eta: 0.0001, norm: 1, cost: 0.0001 ----> 0.96%
method: newton, eta: 0.0001, norm: 1, cost: 0.001 ----> 0.966666666667%
method: newton, eta: 0.0001, norm: 1, cost: 0.01 ----> 0.953333333333%
method: newton, eta: 0.0001, norm: 1, cost: 0.1 ----> 0.873333333333%
method: newton, eta: 0.0001, norm: 1, cost: 0.25 ----> 0.786666666667%
method: newton, eta: 0.0001, norm: 1, cost: 0.5 ----> 0.666666666667%
method: newton, eta: 0.0001, norm: 2, cost: 0.0001 ----> 0.973333333333%
method: newton, eta: 0.0001, norm: 2, cost: 0.001 ----> 0.966666666667%
method: newton, eta: 

In [6]:
print("Average method accuracy:")
for method, vals in method_res.items():
    print('    '+method+': '+str(np.mean(vals)))
    
print("Average eta accuracy:")
for eta, vals in eta_res.items():
    print('    '+str(eta)+': '+str(np.mean(vals)))
    
print("Average cost accuracy:")
for cost, vals in cost_res.items():
    print('    '+str(cost)+': '+str(np.mean(vals)))
    
print("Average norm accuracy:")
for norm, vals in norm_res.items():
    print('    '+str(norm)+': '+str(np.mean(vals)))

Average method accuracy:
    steepdesc: 0.689351851852
    newton: 0.885
    stochdesc: 0.610740740741
Average eta accuracy:
    0.0001: 0.5175
    0.1: 0.815
    0.01: 0.791851851852
    0.5: 0.722962962963
    0.001: 0.753888888889
    0.25: 0.768981481481
Average cost accuracy:
    0.0001: 0.791481481481
    0.1: 0.730462962963
    0.01: 0.772962962963
    0.5: 0.623981481481
    0.001: 0.781388888889
    0.25: 0.669907407407
Average norm accuracy:
    3: 0.713395061728
    0: 0.784938271605
    1: 0.716790123457
    2: 0.698333333333


In [8]:
params = {
    "optimize": ['newton', 'stochdesc', 'steepdesc'],
    "eta": [.0001, .001, .01, .1, .25, .5],
    "norm": [0, 1, 2, 3],
    "cost": [.0001, .001, .01, .1, .25, .5]
}
cv = GridSearchCV(LogRegClassifier(), params, n_jobs=-1)
cv.fit(ds.data, ds.target)
print('Best score: '+str(cv.best_score_))
print(cv.best_params_)

Best score: 0.68
{'optimize': 'newton', 'eta': 0.1, 'norm': 0, 'cost': 0.0001}
