# Assignment

We already did some hyper-parameter tuning in previous lectures, but we were a little loose about how we did it: (1) we didn't use the validation data like we should have, and (2) we had to write a lot of custom-code to collect the results. If we try a few different models we can get away with being a little sloppy, but now we're going to do things right. You should not be surprised to find out that hyper-parameter tuning being a common ML task, there's functionality in `sklearn` to help us with it. In this assignment, we are going to use it to try different combinations of hyper-parameters for the SVM classifier we trained in the lecture.

In [1]:
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score, recall_score
from sklearn.svm import SVC
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt

from sklearn.model_selection import GridSearchCV

In [None]:
#we split the data for training and testing and we did the hot encoding and normaliziation as pre processing the data.
bank = pd.read_csv("data/bank-full.csv", sep = ";")

num_cols = bank.select_dtypes(['integer', 'float']).columns
cat_cols = bank.select_dtypes(['object']).drop(columns = "y").columns

X_train, X_test, y_train, y_test = train_test_split(bank.drop(columns = "y"), bank["y"], 
                                                    test_size = 0.10, random_state = 42)

X_train = X_train.reset_index(drop = True)
X_test = X_test.reset_index(drop = True)

onehoter = OneHotEncoder(sparse_output= False)
onehoter.fit(X_train[cat_cols])
onehot_cols = onehoter.get_feature_names_out(cat_cols)
X_train_onehot = pd.DataFrame(onehoter.transform(X_train[cat_cols]), columns = onehot_cols)
X_test_onehot = pd.DataFrame(onehoter.transform(X_test[cat_cols]), columns = onehot_cols)

znormalizer = StandardScaler()
znormalizer.fit(X_train[num_cols])
X_train_norm = pd.DataFrame(znormalizer.transform(X_train[num_cols]), columns = num_cols)
X_test_norm = pd.DataFrame(znormalizer.transform(X_test[num_cols]), columns = num_cols)

X_train_featurized = X_train_onehot # add one-hot-encoded columns
X_test_featurized = X_test_onehot   # add one-hot-encoded columns
X_train_featurized[num_cols] = X_train_norm # add numeric columns
X_test_featurized[num_cols] = X_test_norm   # add numeric columns

del X_train_norm, X_test_norm, X_train_onehot, X_test_onehot

print("Featurized training data has {} rows and {} columns.".format(*X_train_featurized.shape))
print("Featurized test data has {} rows and {} columns.".format(*X_test_featurized.shape))

X_train_featurized.head()

NameError: name 'pd' is not defined

There are three main ways to search the **hyper-parameter space**:

- **Grid search:** tries every combination of hyper-parameters
- **Random search:** tries a random subset of all combinations of hyper-parameters
- **Bayesian optimization:** tries a subset of all combinations of hyper-parameters (like random search) but does so in a more intelligent way, based on trading off the need to **explore** (trying a part of the hyper-parameter space thus far unexplored) and the need to **exploit** (focusing on a part of the hyper-parameter space that thus far seems promising)

We will use a grid search algorithm here, as implemented by the `GridSearchCV` function. As a bonus, the grid search algorithm uses cross-validation (CV) to evaluate the model. Cross-validation can slow down the process, but we can use a lower number of **folds** to speed it up.

SVMs have two important **high-level hyper-parameters** and then some lower-level ones that depend on the high-level ones. The high-level hyper-parameters are `C`, `kernel`. Depending on the choice of `kernel`, we can also specify `degree` and `gamma`. You can read more about that [here](https://scikit-learn.org/stable/modules/svm.html#kernel-functions).

In addition to the hyper-parameters mentioned above, `SVC` also has some important arguments such as `max_iter` and `class_weight`, or `cache_size` which we should be aware of.

- Use `GridSearchCV` to train multiple `SVC` classifiers with different hyper-parameter combinations. <span style="color:red" float:right>[20 point]</span>
  - The hyper-parameters you want to try are `kernel`, `degree`, `C` and `gamma`. You should pick two or three different choices for each. 
  - For `SVC` setting `probability = True` slows down training considerably, so it's not a good idea to use it during grid search. (Instead, we can retrain the final model using the hyper-parameters combinations that we found and set `probability = True` to if we need to get soft predictions but we won't worry about that here.) 
  - We leave it to you to read the documentation for `SVC` to see what choices make sense. Moreover, your grid search should perform 5-fold cross-validation to select the best model.
  - It's best to avoid running everything in one line. So try to break your code into a few different steps to make it easy to follow.

In [3]:
#here we defined our  SVC model and we assigned 2 to degree for the kernel trick.
from sklearn.multiclass import OneVsRestClassifier
model=SVC(degree=2)


In [12]:
from sklearn.model_selection import GridSearchCV

In [5]:
#we defined the hyperparameters for our modeling. we chose two values for c and gammad and two function for kernel tricks.
param_grid={'C':[0.1, 1],'gamma':[1, 0.1],'kernel':['rbf','poly']}

- Run your grid search to train all the models. (Later, we will see how to pick the model with the best combination of hyper-parameters: in this context, hyper-parameter tuning is often also referred to as **model selection**.)  <span style="color:red" float:right>[5 point]</span>

In [6]:
#we defined our object for doing the GridSearchCV to find the the best hyperparameter and also the best estimator.
grid = GridSearchCV(model, param_grid, refit = True, verbose = 3)


In [7]:
grid.fit(X_train_featurized, y_train)


Fitting 5 folds for each of 8 candidates, totalling 40 fits
[CV 1/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.884 total time= 3.6min
[CV 2/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.884 total time= 3.7min
[CV 3/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.884 total time= 3.6min
[CV 4/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.884 total time= 3.5min
[CV 5/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.884 total time= 4.0min
[CV 1/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.903 total time= 1.7min
[CV 2/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.906 total time= 1.7min
[CV 3/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.904 total time= 1.6min
[CV 4/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.904 total time= 1.7min
[CV 5/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.905 total time= 1.7min
[CV 1/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.898 total time=  34.4s
[CV 2/5] END ......C=0.1, gamma=0.1, kernel=rbf;,

In [13]:
#based on our modeling.we run the fitting 40 times based on the hyperparameters.

All the results generated from the work done by the grid search is stored in the `cv_results_` attribute. For example, if we want to know the combination of hyper-parameters that was tried in the 10th iteration, we can run `clf.cv_results_['params'][9]` (assuming the trained model is called `clf`) and if we want to know the cross-validated evaluation score for that 10th iteration, we can run `clf.cv_results_['mean_test_score'][9]`.

Note that we need to be careful about terminology here. Unfortunately, the hyper-parameters are called `params` by `GridSearchCV`. But in ML **parameters** are the things that the algorithm learns from the data (such as the coefficients in the prediction equation), whereas **hyper-parameters** cannot be learned from the data, which is why we have to tune them by trying different combinations. Also, the cross-validated score is called `mean_test_score` even though we are not using the test data to evaluate it. At least not during model selection. We will use the test data later to evaluate the final model.

Time to pull the best model. We can explicitly call the `clf.best_estimator_` method. However, calling `clf.best_estimator_` explicitly is not necessary: by calling `clf.estimator` it is **implied** that we are calling the best estimator. This means that if we call `clf.predict`, we would be using the best estimator to get predictions.

- Get predictions on the training and test data for the best model. Finally, get the precision and recall of the best estimator to see how they compare to what we got from logistic regression during the lecture. <span style="color:red" float:right>[5 point]</span>

In [24]:
#here we found the best hyperparameter for our modeling.
print(grid.best_params_)


{'C': 1, 'gamma': 0.1, 'kernel': 'rbf'}


In [25]:
#the best estimator is found with grid.best_estimator_ . we can use it for model prediction.
best_model=grid.best_estimator_

In [26]:
best_model

In [20]:
predict_train=best_model.predict(X_train_featurized)


In [22]:
#after prediction,we can see the results of our modeling. and we can see that there is an improvement in the prediction of traing set.
print(classification_report(y_train, predict_train))

              precision    recall  f1-score   support

          no       0.93      0.98      0.96     35954
         yes       0.78      0.46      0.58      4735

    accuracy                           0.92     40689
   macro avg       0.86      0.72      0.77     40689
weighted avg       0.91      0.92      0.91     40689



predict_test=best_model.predict(X_test_featurized)
predict

In [27]:
from sklearn.metrics import classification_report, confusion_matrix
#after prediction,we can see the results of our modeling.
#and we can see that there is not much improvement in predicting  the testing set comparing to the traing set and the result of the logistic regrssion accuracy.

print(classification_report(y_test, predict))

              precision    recall  f1-score   support

          no       0.92      0.97      0.95      3968
         yes       0.67      0.39      0.49       554

    accuracy                           0.90      4522
   macro avg       0.79      0.68      0.72      4522
weighted avg       0.89      0.90      0.89      4522



The results of the logistic regression for traing and testing set are:
Precision = 65% and recall = 35% on the training data.
Precision = 63% and recall = 34% on the validation data.

When we compared our results with the logistic regression used in the lecture, we have an improvement in the prediction of the traing set.
but there is not much diffrent in the accuracy of testing set between two models.

# End of assignment