# Stochastic Gradient Descent
You should build an end-to-end machine learning pipeline using a stochastic gradient descent model. In particular, you should do the following:
- Load the `mnist` dataset using [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html). You can find this dataset in the datasets folder.
- Split the dataset into training and test sets using [Scikit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html).
- Build an end-to-end machine learning pipeline, including a [stochastic gradient descent](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html) model.
- Optimize your pipeline by validating your design decisions.
- Test the best pipeline on the test set and report various [evaluation metrics](https://scikit-learn.org/0.15/modules/model_evaluation.html).  
- Check the documentation to identify the most important hyperparameters, attributes, and methods of the model. Use them in practice.

Importing Libraries

In [57]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

Reading the Dataset

In [58]:
df = pd.read_csv('https://raw.githubusercontent.com/m-mahdavi/teaching/refs/heads/main/datasets/mnist.csv')

Splitting the Dataset

In [59]:
df_train, df_test = train_test_split(df, test_size=0.2, random_state=42)
df_train.shape,df_test.shape

((3200, 786), (800, 786))

EDA

In [60]:
df_train.head()

Unnamed: 0,id,class,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
3994,13260,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
423,10953,3,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2991,37374,3,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1221,31597,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
506,69405,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [61]:
df_train.isnull().sum()

Unnamed: 0,0
id,0
class,0
pixel1,0
pixel2,0
pixel3,0
...,...
pixel780,0
pixel781,0
pixel782,0
pixel783,0


In [62]:
df_train.dtypes.value_counts()

Unnamed: 0,count
int64,786


Data Preprocessing

In [63]:
x_train = df_train.drop('id',axis=1)
y_train = df_train['class']
x_test = df_test.drop('id',axis=1)
y_test = df_test['class']

In [64]:
print(x_train.shape,y_train.shape)
print(x_test.shape,y_test.shape)

(3200, 785) (3200,)
(800, 785) (800,)


Feature Engineering

In [65]:
sk = StandardScaler()
x_train = sk.fit_transform(x_train)
x_test = sk.transform(x_test)
x_train.shape, x_test.shape

((3200, 785), (800, 785))

Model Training

In [None]:
sd_param = {
      'loss': ['hinge', 'log_loss', 'modified_huber'],
    'penalty': ['l2', 'l1', 'elasticnet'],
    'alpha': [1e-6, 1e-5, 1e-4, 1e-3, 1e-2],
    'learning_rate': ['constant', 'optimal', 'invscaling', 'adaptive'],
    'eta0': [1e-4, 1e-3, 1e-2, 0.05, 0.1],
    'power_t': [0.1, 0.25, 0.5, 0.75, 0.9],
    'max_iter': [500, 1000, 1500, 2000],
    'tol': [1e-4, 1e-3, 1e-2],
    'early_stopping': [True, False],
    'fit_intercept': [True, False]
}

sd_search = RandomizedSearchCV(SGDClassifier(),sd_param,n_iter=50,scoring='accuracy',n_jobs=-1,cv=5,verbose=1,random_state=42)
sd_search.fit(x_train,y_train)

Fitting 5 folds for each of 50 candidates, totalling 250 fits


Model Evaluation