## **Classsification using Ridge Classifier**

* Ridge Classifier casts the problem as the **least-squares classification** & finds the optimal weight using some matrix decompostion technique such as **Singular-Value Decompostion (SVD)**.

* To train the ridge classifier, the labels should be $y ∈ {+1 ,-1}$.

* The classifer also by default implements **L2 regularization**. However, we first implement it without regularization by setting `alpha = 0`

#### **Importing new libraries**

In [1]:
# Common imports
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl

from scipy.stats import loguniform
from sklearn.metrics import roc_curve, roc_auc_score
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import precision_score, recall_score, classification_report
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import log_loss

from sklearn.model_selection import cross_validate, RandomizedSearchCV, cross_val_predict
from sklearn.linear_model import SGDClassifier, RidgeClassifier, LogisticRegression
from sklearn.dummy import DummyClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import MinMaxScaler

from pprint import pprint

# to make this notebook's output stable across runs
np.random.seed(42)

# To plot pretty figures
%matplotlib inline
sns.set()

# global settings
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)
mpl.rc('figure', figsize=(8, 6))

import warnings 
warnings.filterwarnings('ignore')

#### **Getting Data**

In [2]:
from sklearn.datasets import fetch_openml
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)

#### **Data Preprocessing and Splitting**

In [3]:
X = X.to_numpy()
y = y.to_numpy()

In [4]:
scaler = MinMaxScaler()
X = scaler.fit_transform(X)

In [5]:
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

In [6]:
# initialize new variables names with all -1.
y_train_0 = np.ones((len(y_train)))
y_test_0 = np.ones((len(y_test)))

# find indices of digit 0 image
# remember original labels are of type str not int
indx_0 = np.where(y_train == '0')

# use those indices to modify y_train_0 & y_test_0
y_train_0[indx_0] = 1
indx_0 = np.where(y_test == '0')
y_test_0[indx_0] = 1

#### **Model Building**

First taking a look into the parameters of the class : 

**RidgeClassifier** (
    `alpha=1.0,
    *,
    fit_intercept=True,
    normalize='deprecated',
    copy_X=True,
    max_iter=None,
    tol=0.001,
    class_weight=None,
    solver='auto',
    positive=False,
    random_state=None,`
)
 
**Note :** The parameter `normalize` is depreciated.

In [7]:
estimator = RidgeClassifier(normalize=False ,alpha=0)

pipe_ridge = make_pipeline(MinMaxScaler() ,estimator)
pipe_ridge.fit(X_train ,y_train_0)

**Checking on performance of model**

In [8]:
y_hat_test_0 = pipe_ridge.predict(X_test)
print(classification_report(y_test_0, y_hat_test_0))

              precision    recall  f1-score   support

         1.0       1.00      1.00      1.00     10000

    accuracy                           1.00     10000
   macro avg       1.00      1.00      1.00     10000
weighted avg       1.00      1.00      1.00     10000



#### **Cross Validation**

In [9]:
cv_ridge_clf = cross_validate(
                            pipe_ridge,
                            X_train ,y_train_0 ,cv=5,
                            scoring=['precision' ,'recall', 'f1'],
                            return_train_score=True ,
                            return_estimator=True)

pprint(cv_ridge_clf)

{'estimator': [Pipeline(steps=[('minmaxscaler', MinMaxScaler()),
                ('ridgeclassifier', RidgeClassifier(alpha=0, normalize=False))]),
               Pipeline(steps=[('minmaxscaler', MinMaxScaler()),
                ('ridgeclassifier', RidgeClassifier(alpha=0, normalize=False))]),
               Pipeline(steps=[('minmaxscaler', MinMaxScaler()),
                ('ridgeclassifier', RidgeClassifier(alpha=0, normalize=False))]),
               Pipeline(steps=[('minmaxscaler', MinMaxScaler()),
                ('ridgeclassifier', RidgeClassifier(alpha=0, normalize=False))]),
               Pipeline(steps=[('minmaxscaler', MinMaxScaler()),
                ('ridgeclassifier', RidgeClassifier(alpha=0, normalize=False))])],
 'fit_time': array([4.25958037, 4.51285076, 4.62808609, 4.67159963, 4.39858842]),
 'score_time': array([0.09599876, 0.10518098, 0.1021142 , 0.10399866, 0.09025025]),
 'test_f1': array([1., 1., 1., 1., 1.]),
 'test_precision': array([1., 1., 1., 1., 1.]),
 'test_re

Best estimator ID

In [10]:
best_estimator_id = np.argmax(cv_ridge_clf['train_f1']) 
best_estimator_id

0

Best Estimator

In [11]:
best_estimator = cv_ridge_clf['estimator'][best_estimator_id]
best_estimator

Lets evaluate the performance of the best classsifier on the test set.

In [12]:
y_hat_test_0 = best_estimator.predict(X_test)
print(classification_report(y_test_0 ,y_hat_test_0))

              precision    recall  f1-score   support

         1.0       1.00      1.00      1.00     10000

    accuracy                           1.00     10000
   macro avg       1.00      1.00      1.00     10000
weighted avg       1.00      1.00      1.00     10000



#### **Further Exploration**

Let's see what these classifiers learnt about the digit 0.

In [13]:
# models = (pipe_sgd ,pipe_sgd_l2 ,pipe_logit ,pipe_ridge)
# titles = ('SGD' ,'Regularized SGD', 'Logit' ,'Ridge')

# plt.figure(figsize=(5,5))
# for i in range(0,4):
#     w = models[i][1].coef_
#     w_matrix = w.reshape(28,28)
#     plt.subplot(2,2,i+1)
#     plt.imshow(w_matrix ,cmap='gray')
#     plt.title(titles[i])
#     plt.axis('off')
#     plt.grid(False)
# plt.show()