In [None]:
import numpy as np
import pandas as pd

### PRACTICE ASSIGNMENT

Q1. Write a function `compute_score(X_train, y_train, X_test, y_test)`
 to do the following on the `Iris` dataset-

Write your code keeping in mind:

Split the `Iris` dataset into `train` and `test` set with $70:30$ ratio
Import `svm.SVC` as *model*, kernel as *rbf*, regularization parameter as $20$ and gamma as *auto*

Take `random_state=42`.

Train the model using training data and predict the computed model's score using test data.
  
Which of the following options is the computed score?

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split 
from sklearn.svm import SVC

In [None]:
def compute_score(X_train, y_train, X_test, y_test) :
  svc = SVC(kernel = 'rbf', C = 20, gamma = 'auto', random_state = 42)
  svc.fit(X_train, y_train)
  # y_test_pred = svc.predict(X_test)
  return svc.score(X_test, y_test)

In [None]:
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size = .3, random_state = 42)
compute_score(X_train, y_train, X_test, y_test)


1.0

Q2. In *Question 1*, apply a `pipeline` containing a `MinMaxScaler()` function called scaler and a `svm.svc()` called classifier with the parameters : `kernel='linear', decision_function_shape='ovr', C=1, class_weight=None`. Calculate the `precision` value and `f1` score and mark the correct option from the list below.

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import precision_score, f1_score, recall_score, accuracy_score

In [None]:
def compute_score(X_train, y_train, X_test, y_test) :
  svc = SVC(kernel = 'linear', decision_function_shape='ovr', C = 1, gamma = 'auto', class_weight=None, random_state = 42)
  model = Pipeline([('scaler', MinMaxScaler()),
                       ('classifier', svc)])
  model.fit(X_train, y_train)
  y_test_pred = model.predict(X_test)
  return precision_score(y_test, y_test_pred, average = 'weighted'), f1_score(y_test, y_test_pred, average = 'weighted')

In [None]:
compute_score(X_train, y_train, X_test, y_test)

(1.0, 1.0)

Q3. Import the `iris` dataset and drop the rows where `class=Iris-versicolor`. Apply a `pipeline` containing a `MinMaxScaler()` function called scaler and a `svm.svc()` called classifier. Split the `iris` dataset into 50:50 ratio with `random_state=0`. Mark the correct recall value from the given options.

In [None]:
X, y = data.data[:100], data.target[:100]

In [None]:
X_train_, X_test_, y_train_, y_test_ = train_test_split(X, y, test_size = .5, random_state = 0)

In [None]:
def compute_score(X_train, y_train, X_test, y_test) :
  svc = SVC(kernel = 'linear', decision_function_shape='ovr', C = 1, gamma = 'auto', class_weight=None, random_state = 42)
  model = Pipeline([('scaler', MinMaxScaler()),
                       ('classifier', svc)])
  model.fit(X_train, y_train)
  y_test_pred = model.predict(X_test)
  return recall_score(y_test, y_test_pred, average = 'weighted')

In [None]:
compute_score(X_train_, y_train_, X_test_, y_test_)

1.0

Q4. Write the function `compute_score(X_train, y_train, X_test, y_test)`
 to do the following on the `Iris` dataset-

Write your code keeping in mind:

Split the `Iris` dataset into train and test set with 70:30 ratio and `random_state=42`

Import `sklearn.svm.LinearSVC` as *model*

Consider loss function `loss=hinge`, `random_state=42` and `penalty=l2`

Train the *model* and mark the computed score.

In [None]:
from sklearn.svm import LinearSVC

In [None]:
def compute_score(X_train, y_train, X_test, y_test) :
  svm = LinearSVC(loss = 'hinge', random_state = 42, penalty = 'l2')
  svm.fit(X_train, y_train)
  return svm.score(X_test, y_test)

In [None]:
compute_score(X_train, y_train, X_test, y_test)



0.9777777777777777

Q5. Write a function `hyperparameter_search` which accepts the `kernel` and `regularization parameter` as inputs and returns the `avg_score` of the models with the below mentioned hyperparameters.

Split the `Iris` dataset into train and test set with $70:30$ ratio
`kernels = ['linear', 'rbf']`, Regularization `C = [5, 10,100]`, `Cross Validation = 10`, `random_state=42`.

Which of the following options give the `accuracy_score` for the `iris` dataset?

In [None]:
from sklearn.model_selection import GridSearchCV

In [None]:
def hyperparameter_search(kernel, C) :
  data = load_iris()
  X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size = .3, random_state = 42)
  params = {'kernel': kernel, 'C': C}
  svm = SVC(gamma = 'auto', random_state = 42)
  gscv = GridSearchCV(svm, param_grid=params, cv = 10)
  gscv.fit(X_train, y_train)
  y_test_pred = gscv.predict(X_test)
  return accuracy_score(y_test, y_test_pred)

In [None]:
kernels = ('linear' , 'rbf')
C = [5, 10, 100]
hyperparameter_search(kernels, C)

0.9777777777777777

### GRADED ASSIGNMENT

Q1. Write a function `compute_GridSearchCV` which accepts the `Kernel` and `regularization parameters` as inputs and returns the `Mean cross-validated score` of the best_estimator, denoted with `best_score_` of the models with the below mentioned hyperparameters:

Split the `Iris` dataset into train and test set with $70:30$ ratio

Import `svm.SVC` as *model*, `kernels = ['linear' , 'rbf']`, `Regularization = [1,15,25]`, `gamma = 'auto'`, `Cross Validation = 4`, `random_state=0`

In [None]:
from sklearn.model_selection import GridSearchCV

In [None]:
def compute_GridSearchCV(kernel, C) :
  data = load_iris()
  X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size = .3, random_state = 0)
  params = {'kernel': kernel, 'C': C}
  svm = SVC(gamma = 'auto', random_state = 0)
  gscv = GridSearchCV(svm, param_grid=params, cv = 4)
  gscv.fit(X_train, y_train)
  return gscv.best_score_

In [None]:
kernels = ('linear' , 'rbf')
C = [1,15,25]
compute_GridSearchCV(kernels, C)

0.9807692307692308

**(Consider the statement for Q.NO 2 and Q.NO 3)**

Read the instructions given below to answer the two questions given below.

Split the *Social_Network_Ads*
 dataset (https://drive.google.com/file/d/1qUa1GlG4X4ZY_4E0e7jPR-z7AG7NIDbE/view?usp=sharing) into training and test set in 75:25 ratio.

Fit and transform the train and test set of the feature matrix by applying `StandardScaler` transformer.

Fit a linear SVM (with `random_state = 0` and `linear kernel`) to training data.

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
data = pd.read_csv("/content/drive/MyDrive/Assignment_files/MLP/Social_Network_Ads.csv")

In [None]:
X_train, X_test, y_train, y_test = train_test_split(data.iloc[:,:-1], data.iloc[:,-1], test_size = .25, random_state = 0)

svm = SVC(kernel = 'linear', random_state = 0)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)
svm.fit(X_train_scaled, y_train)

SVC(kernel='linear', random_state=0)

Q2. The predicted data returns an accuracy_score on test data. Which of the following option represents the calculated accuracy_score?

In [None]:
y_test_pred = svm.predict(X_test_scaled)
accuracy_score(y_test, y_test_pred)

0.88

Q3. Calculate the confusion matrix obtained from the above predicted data.

In [None]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_test_pred)

array([[63,  5],
       [ 7, 25]])

**(Consider the statement for Q.NO 4 and Q.NO 5)**

Read the instructions given below to answer the two questions given below.

From the `MNIST` dataset, consider the first $20,000$ data samples as training data and the next $5,000$ data samples as test data. Fit a `pipeline` with `MinMaxScaler` and a *classifier* with `SVC`, `linear kernel`, `one vs rest decision_function_shape` and `class_weight = None`to this dataset.

In [None]:
from sklearn.datasets import fetch_openml
data = fetch_openml('mnist_784', version = 1)

In [None]:
X_train, X_test, y_train, y_test = data.data[:20000], data.data[20000:25000], data.target[:20000], data.target[20000:25000]

model = Pipeline([('scaler', MinMaxScaler()),
                  ('classifier', SVC(kernel = 'linear', decision_function_shape = 'ovr', class_weight = None))])

model.fit(X_train, y_train)
y_test_pred = model.predict(X_test)

In [None]:
X_train.shape, X_test.shape

((20000, 784), (5000, 784))

Q4. What is the sum of the main diagonal elements of the confusion matrix?

In [None]:
confusion_matrix(y_test, y_test_pred).trace()

4623

Q5. Which of the following are the correct values of precision, recall and f1_Score?

In [None]:
print("Precision:", precision_score(y_test, y_test_pred, average='weighted'), "Recall:", recall_score(y_test, y_test_pred, average='weighted'), "f1_Score:", f1_score(y_test, y_test_pred, average='weighted'))

Precision: 0.9245834577056025 Recall: 0.9246 f1_Score: 0.9242341822537996


Q6. Consider the `MNIST` dataset, split it into training and test set in $50:50$ ratio with `random_state = 42`. Fit a `SVM` model using `pipeline` with `StandardScalar`, `SVM` classifier `kernel='poly'` and `degree = 3`, `decision_function_shape='ovr'` and `class_weight='balanced'`, `C=10`. Train the model on training data, and make predictions for test data. Generate the Classification report and choose the correct value for weighted avg of `f1_score`.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size = 0.5, random_state = 42)

model = Pipeline([('scaler', StandardScaler()),
                  ('classifier', SVC(kernel = 'poly', degree = 3, decision_function_shape = 'ovr', class_weight = 'balanced', C = 10))])

model.fit(X_train, y_train)
y_test_pred = model.predict(X_test)

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_test_pred))

              precision    recall  f1-score   support

           0       0.99      0.98      0.99      3463
           1       0.99      0.99      0.99      3927
           2       0.96      0.97      0.96      3520
           3       0.98      0.96      0.97      3551
           4       0.96      0.98      0.97      3333
           5       0.97      0.97      0.97      3144
           6       0.98      0.98      0.98      3490
           7       0.98      0.97      0.97      3718
           8       0.96      0.96      0.96      3344
           9       0.96      0.96      0.96      3510

    accuracy                           0.97     35000
   macro avg       0.97      0.97      0.97     35000
weighted avg       0.97      0.97      0.97     35000



Q7. Write the function `compute_score(X_train, y_train, X_test, y_test)` to do the following on the `Iris` dataset-

Write your code keeping in mind:

Split the `Iris` dataset into `train` and `test` set with $70:30$ ratio

Import `svm.SVC` as *model*, `kernel` as `'poly'`, `regularization parameter` as `10` and `gamma` as `'auto'`

Train the *model* and mark the computed score.

In [None]:
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size = 0.3, random_state = 42)

model = SVC(kernel = 'poly', C = 10, gamma = 'auto')
model.fit(X_train, y_train)
model.score(X_test, y_test)

1.0

Q8. Write the function `compute_score(X_train, y_train, X_test, y_test)` to do the following on the `Iris` dataset-

Write your code keeping in mind:

Split the `Iris` dataset into train and test set with $70:30$ ratio

Import `svm.SVC` as *model*, `kernel` as `'sigmoid'`, `regularization parameter` as `25` and `gamma` as `'auto'`

Train the *model* and mark the computed score

In [None]:
model = SVC(kernel = 'sigmoid', C = 25, gamma = 'auto')
model.fit(X_train, y_train)
model.score(X_test, y_test)

0.28888888888888886

Q9. Import the `iris` dataset and drop the rows where `class=Iris-setosa`. Apply a `pipeline` containing a `MinMaxScaler()` function called *scaler* and a `svm.svc()` called *classifier*. Split the `iris` dataset into $75:25$ ratio with `random_state=0`. Mark the correct `precision_score`.

In [None]:
data = load_iris()
X, y = data.data[50:], data.target[50:]

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .25, random_state = 0)

In [None]:
def compute_score(X_train, y_train, X_test, y_test) :
  svc = SVC(kernel = 'linear', decision_function_shape='ovr', C = 1, gamma = 'auto', class_weight=None, random_state=0)
  model = Pipeline([('scaler', MinMaxScaler()),
                       ('classifier', svc)])
  model.fit(X_train, y_train)
  y_test_pred = model.predict(X_test)
  return precision_score(y_test, y_test_pred, average = 'weighted')

In [None]:
compute_score(X_train, y_train, X_test, y_test)

0.963076923076923