# Stochastic Gradient Descent Classifier
Classifing student success data by means of the [SGDClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier) from the sklearn module.

## Import Data
Import the data and create the response vector (r *x* 1) and design matrix (r *x* c). Create a scaled and normalized design matrix for comparison of accuracy to the original design matrix.

In [1]:
import time
import random
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import SGDClassifier
from sklearn.utils.extmath import cartesian
from sklearn import metrics
from sklearn import preprocessing

df = pd.read_csv('student-por2.csv')
df = pd.get_dummies(df)#, drop_first=True)

def response_conv(arr):
    new = []
    for i in arr:
        if (i > 0 and i < 10):            # condition where student failed
            new.append(0)                 
                                          
        elif (i >= 10):                   # condition where student passed
            new.append(1)                 
    
        else:                             # condition where student received an incomplete
            new.append(2)
    return(new)                           # 1-dimensional response varibale returned

X = df.drop('G3',1)                       # this is the design matrix
y = list(df.G3)                           # this is the discrete response vector
y_new = response_conv(y)                  # this is the multinomial response vector

clf = SGDClassifier()
clf.fit(X,y)

model = SelectFromModel(clf,prefit=True)
newX = model.transform(X)                 # design matrix with most influential predictors only

X_scale = preprocessing.scale(newX)
X_norm = preprocessing.normalize(newX)

random.seed(42)
X1_train, X1_test, y1_train, y1_test = train_test_split(X, y_new, test_size=0.33, random_state=42)
X2_train, X2_test, y2_train, y2_test = train_test_split(X_scale, y_new, test_size=0.33, random_state=42)
X3_train, X3_test, y3_train, y3_test = train_test_split(X_norm, y_new, test_size=0.33, random_state=42)



## Niave Accuracy
Before we start training and selecting parametrs for our model, we must find the distribution of the classes amongst the response variable. Depnding on which class is the dominate class, our model should preform better than just guessing the dominate class for each observation. For example, if the dominate class is 1 and 1's comprise of 83% of the response data, then our model should have higher than 83% accuracy. 

In [2]:
zero = 0
one = 0
two = 0

for i in y1_test:
    if i == 0:
        zero += 1
    elif i == 1:
        one += 1
    else:
        two += 1

num1 = round((zero/len(y1_test))*100,2)
num2 = round((one/len(y1_test))*100,2)
num3 = round((two/len(y1_test))*100,2)
print("The testing response vector has the following distribution: \nzeros: %r zeros comprising of %r percent of the response data. \nones: %r ones comprising of %r percent of the response data. \ntwos: %r twos comprising of %r percent of the response data." % (zero,num1,one,num2,two,num3))
print("\n")

The testing response vector has the following distribution: 
zeros: 23 zeros comprising of 10.7 percent of the response data. 
ones: 187 ones comprising of 86.98 percent of the response data. 
twos: 5 twos comprising of 2.33 percent of the response data.




## Optimal Loss Function and Design Matrix Split
Next we find the optimal loss function by 10-fold cross validation finding the maximum accuracy obtained by each option of loss function. Retrun the optimal method for loss function.

In [3]:
start_time = time.time()
combos = ['hinge','log','modified_huber','squared_hinge','perceptron','squared_loss','huber','epsilon_insensitive','squared_epsilon_insensitive']
    
def opt(X,y):
    acc = []
    for l in combos:
        sgd = SGDClassifier(loss=str(l),random_state=42)
        scores = cross_val_score(sgd, X, y, cv=10, scoring='accuracy')
        acc.append(scores.mean())
    

    opt_ = combos[acc.index(max(acc))]
    return(opt_)

l1 = opt(X1_train,y1_train)
l2 = opt(X2_train,y2_train)
l3 = opt(X3_train,y3_train)

print ("The optimal loss function for Non-standardized SGD is %s." % (l1))
print ("The optimal loss function for Standardized SGD is %s." % (l2))
print ("The optimal loss function for Normalized SGD is %s." % (l3))
print("Run time: %r minutes" % (round((int(time.time() - start_time)/60),2)))

The optimal loss function for Non-standardized SGD is perceptron.
The optimal loss function for Standardized SGD is log.
The optimal loss function for Normalized SGD is log.
Run time: 0.0 minutes


## Fit and Predict
Fit the SGDClassifier model to each design matrix and create a dataframe comparing each models predictions to the actual value of the test set.

In [4]:
sgd1 = SGDClassifier(loss=l1,random_state=42).fit(X1_train,y1_train)
sgd2 = SGDClassifier(loss=l2,random_state=42).fit(X2_train,y2_train)
sgd3 = SGDClassifier(loss=l3,random_state=42).fit(X3_train,y3_train)


sgd_pred1 = sgd1.predict(X1_test)
sgd_pred2 = sgd2.predict(X2_test)
sgd_pred3 = sgd3.predict(X3_test)

pred = pd.DataFrame(list(zip(y1_test, sgd_pred1,sgd_pred2,sgd_pred3)), columns=['y_act','y_sgd','y_sgd_stand','y_sgd_norm'])
pred.index.name = 'Obs'
pred


Unnamed: 0_level_0,y_act,y_sgd,y_sgd_stand,y_sgd_norm
Obs,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,1,1,1
1,1,1,1,1
2,1,1,1,1
3,1,1,1,1
4,1,1,1,1
5,1,1,1,1
6,1,1,1,1
7,0,0,1,0
8,1,0,1,1
9,1,1,1,1


## Results
Returns model accuracay, confusion matrix and classifiaction report for each respective model. The standardized design matrix yields the most powerful predictive model for SGD. This will be the model we select for our final model. **Note** Accuracy may vary as a random generator was used to shuffle the data.

In [5]:
cm_sgd1 = pd.DataFrame(metrics.confusion_matrix(y1_test, sgd_pred1), index = ['Fail(0)','Pass(1)','Inc(2)'],columns=['Fail(0)','Pass(1)','Inc(2)'])
cm_sgd2 = pd.DataFrame(metrics.confusion_matrix(y2_test, sgd_pred2), index = ['Fail(0)','Pass(1)','Inc(2)'],columns=['Fail(0)','Pass(1)','Inc(2)'])
cm_sgd3 = pd.DataFrame(metrics.confusion_matrix(y3_test, sgd_pred3), index = ['Fail(0)','Pass(1)','Inc(2)'],columns=['Fail(0)','Pass(1)','Inc(2)'])


print ("The accuracy of the Non-standarized SGD model is: ", sgd1.score(X1_test,y1_test))
print("\n")
print ("The accuracy of the Standardized SGD model is: ", sgd2.score(X2_test,y2_test))
print("\n")
print ("The accuracy of the Normalized SGD model is: ", sgd3.score(X3_test,y3_test))
print("\n")

print("Non-standarized SGD Confusion Matrix: \n", cm_sgd1)
print("\n")
print("Standarized SGD Confusion Matrix: \n", cm_sgd2)
print("\n")
print("Normalized SGD Confusion Matrix: \n", cm_sgd3)
print("\n")

print("Classification report for Non-standardized design matrix:\n", metrics.classification_report(y1_test,sgd_pred1))
print("\n")
print("Classification report for standardized design matrix:\n", metrics.classification_report(y2_test,sgd_pred2))
print("\n")
print("Classification report for Normalized design matrix:\n", metrics.classification_report(y3_test,sgd_pred3))

The accuracy of the Non-standarized SGD model is:  0.827906976744


The accuracy of the Standardized SGD model is:  0.906976744186


The accuracy of the Normalized SGD model is:  0.883720930233


Non-standarized SGD Confusion Matrix: 
          Fail(0)  Pass(1)  Inc(2)
Fail(0)       19        4       0
Pass(1)       29      157       1
Inc(2)         2        1       2


Standarized SGD Confusion Matrix: 
          Fail(0)  Pass(1)  Inc(2)
Fail(0)       13       10       0
Pass(1)        7      180       0
Inc(2)         2        1       2


Normalized SGD Confusion Matrix: 
          Fail(0)  Pass(1)  Inc(2)
Fail(0)       16        7       0
Pass(1)       13      174       0
Inc(2)         4        1       0


Classification report for Non-standardized design matrix:
              precision    recall  f1-score   support

          0       0.38      0.83      0.52        23
          1       0.97      0.84      0.90       187
          2       0.67      0.40      0.50         5

avg / 

  'precision', 'predicted', average, warn_for)
