# Stochastic Gradient Descent
Load the `mnist` dataset. Split it into training and test sets. Train and test a stochastic gradient descent model using scikit-learn. Check the documentation to identify the most important hyperparameters, attributes, and methods of the model. Use them in practice.

## Import Modules

In [7]:
import pandas as pd
import sklearn.metrics
import sklearn.linear_model
import sklearn.model_selection
import plotly.express as px

## Loading the Dataset

In [8]:
df = pd.read_csv("../../datasets/mnist.csv")
df = df.set_index("id")
df.head(3)

Unnamed: 0_level_0,class,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
31953,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
34452,8,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
60897,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Splitting Data into Training and Test Sets

In [9]:
x = df.drop(["class"], axis=1)
y = df["class"]
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y)

## Scaling the Features

In [10]:
scaler = sklearn.preprocessing.StandardScaler(with_mean=False)
scaler.fit(x_train)

x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)

## Model Selection and Hyperparameter Tuning 

In [11]:
parameters_grid = {
    "loss": ["squared_loss", "hinge"],
    "penalty": ["l1", "l2"],
    "alpha": [0.1, 0.001, 0.0001, 0.00001, 0.00005],
    "max_iter": [100, 1000, 3000],
    "tol": [None, 0.0001, 0.001, 0.01],
    "shuffle": [True, False],
    "learning_rate": ["constant", "optimal"],
    "eta0": [0.001, 0.01, 0.1],
    "early_stopping": [True, False],
    "n_iter_no_change": [1, 5, 10]
}
model_1 = sklearn.model_selection.RandomizedSearchCV(sklearn.linear_model.SGDClassifier(), 
                                                     parameters_grid, n_iter=100, scoring="accuracy", cv=5, n_jobs=-1)
model_1.fit(x_train, y_train)
print("Accuracy of best Random Forest classfier = {:.2f}".format(model_1.best_score_))
print("Best found hyperparameters of Random Forest classfier = {}".format(model_1.best_params_))



Accuracy of best Random Forest classfier = 0.88
Best found hyperparameters of Random Forest classfier = {'tol': None, 'shuffle': False, 'penalty': 'l2', 'n_iter_no_change': 5, 'max_iter': 3000, 'loss': 'hinge', 'learning_rate': 'optimal', 'eta0': 0.001, 'early_stopping': False, 'alpha': 0.1}




## Testing the Trained Model

In [6]:
y_predicted = model_1.predict(x_test)
accuracy = sklearn.metrics.accuracy_score(y_test, y_predicted)
cm = sklearn.metrics.confusion_matrix(y_test, y_predicted)
precision, recall, f1, support = sklearn.metrics.precision_recall_fscore_support(y_test, y_predicted)

print("Accuracy =", accuracy)
print("Precision =", precision)
print("Recall =", recall)
print("F1-Score =", f1)
print("Confusion Matrix:\n", cm)

Accuracy = 0.87
Precision = [0.95192308 0.95081967 0.85555556 0.89690722 0.93617021 0.72151899
 0.88043478 0.87850467 0.77165354 0.82954545]
Recall = [0.94285714 0.93548387 0.83695652 0.87878788 0.88       0.75
 0.9        0.88679245 0.83760684 0.8021978 ]
F1-Score = [0.94736842 0.94308943 0.84615385 0.8877551  0.90721649 0.73548387
 0.89010989 0.88262911 0.80327869 0.81564246]
Confusion Matrix:
 [[ 99   0   0   0   0   3   1   0   2   0]
 [  0 116   0   0   1   3   1   0   3   0]
 [  0   0  77   2   2   2   2   3   3   1]
 [  2   2   3  87   0   2   1   0   0   2]
 [  0   0   1   0  88   0   2   2   2   5]
 [  1   0   0   2   0  57   3   0  12   1]
 [  0   0   2   0   1   3  81   0   3   0]
 [  1   2   3   0   0   0   0  94   0   6]
 [  0   2   3   4   0   8   1   1  98   0]
 [  1   0   1   2   2   1   0   7   4  73]]


