# Parameter Play

There are a lot of parameters that are set by default when working with these classifiers. Intellisense in VS Code can help you dig into them. Adopt one of the ML Classification Techniques in this lesson and retrain models tweaking various parameter values. Build a notebook explaining why some changes help the model quality while others degrade it. Be detailed in your answer.

# Linear SVC
The Linear Support Vector Classifier (Linear SVC) is a variant of the Support Vector Machine (SVM) algorithm that uses a linear kernel. The Linear SVC model has its own set of parameters that affect its performance. 

It's important to carefully select and tune these parameters to achieve the best results for a specific problem. Hyperparameter optimization techniques, such as cross-validation or grid search, can be employed to find the optimal combination of parameters. 

Linear SVC has many paramters that influence its performance. However, we will take the Regularization and random_state paramters, tweek them and see how the model will perform.

### Tuning Regularization (C) Parameter of Linear SVC

Regularization parameter denoted as C, is important in Linear SVC. It controls the trade-off between maximizing the margin and minimizing the training errors. A smaller C value allows for a larger margin but may result in misclassification of some training points. A larger C value aims to classify all training points correctly but may lead to a smaller margin.

Let's use the cleaned cuisine dataset  to train the model. We load the data first into pandas dataframe. 

In [2]:
import pandas as pd
cuisines_df = pd.read_csv("data/cleaned_cuisines.csv")
cuisines_df.head()

Unnamed: 0.1,Unnamed: 0,cuisine,almond,angelica,anise,anise_seed,apple,apple_brandy,apricot,armagnac,...,whiskey,white_bread,white_wine,whole_grain_wheat_flour,wine,wood,yam,yeast,yogurt,zucchini
0,0,indian,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,indian,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2,indian,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3,indian,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,4,indian,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0


We then divide the X and y coordinates into two dataframes for training. cuisine can be the labels dataframe:

In [3]:
cuisines_label_df = cuisines_df['cuisine']
cuisines_label_df.head()

0    indian
1    indian
2    indian
3    indian
4    indian
Name: cuisine, dtype: object

In [4]:
cuisines_feature_df = cuisines_df.drop(['Unnamed: 0', 'cuisine'], axis=1)
cuisines_feature_df.head()

Unnamed: 0,almond,angelica,anise,anise_seed,apple,apple_brandy,apricot,armagnac,artemisia,artichoke,...,whiskey,white_bread,white_wine,whole_grain_wheat_flour,wine,wood,yam,yeast,yogurt,zucchini
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0


We can now import the necessary libraries and the SVC classifier:

In [5]:
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve
import numpy as np

Let's split the data into training and testing:

In [7]:
X_train, X_test, y_train, y_test = train_test_split(cuisines_feature_df, cuisines_label_df, test_size=0.3)

Now we can create a Linear SVC model with regularization parameter C, set to 10:

In [11]:
C=10
Linear_SVC_model = SVC(kernel='linear', C=C, probability=True,random_state=0)

Training the model:

In [12]:
Linear_SVC_model.fit(X_train, np.ravel(y_train))

Since our model is now trained, we can use it to predict the classes of the test data so as to see its performance:

In [15]:
y_pred = Linear_SVC_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy (train) for %s with C = %d is:   %0.1f%% " % ('Linear_SVC_model', C, accuracy * 100))
print(classification_report(y_test,y_pred))

Accuracy (train) for Linear_SVC_model with C = 10 is:   77.0% 
              precision    recall  f1-score   support

     chinese       0.66      0.71      0.68       248
      indian       0.87      0.86      0.87       234
    japanese       0.77      0.73      0.75       234
      korean       0.87      0.75      0.80       249
        thai       0.71      0.80      0.76       234

    accuracy                           0.77      1199
   macro avg       0.78      0.77      0.77      1199
weighted avg       0.78      0.77      0.77      1199



With C set to 10 the accuracy of our model is 77%. Let's now set the value to 20, to see what will happen:

In [18]:
C=20
Linear_SVC_model = SVC(kernel='linear', C=C, probability=True,random_state=0)


In [19]:
Linear_SVC_model.fit(X_train, np.ravel(y_train))

y_pred = Linear_SVC_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy (train) for %s with C = %d is:   %0.1f%% " % ('Linear_SVC_model', C, accuracy * 100))
print(classification_report(y_test,y_pred))

Accuracy (train) for Linear_SVC_model with C = 20 is:   77.4% 
              precision    recall  f1-score   support

     chinese       0.67      0.71      0.69       248
      indian       0.87      0.87      0.87       234
    japanese       0.78      0.74      0.76       234
      korean       0.86      0.75      0.80       249
        thai       0.72      0.80      0.76       234

    accuracy                           0.77      1199
   macro avg       0.78      0.78      0.78      1199
weighted avg       0.78      0.77      0.77      1199



The accuracy has now increased to 77.4%. However, the training time has also increased from 37 seconds to 43 seconds.
Let's change the value to 50:

In [20]:
C=50
Linear_SVC_model = SVC(kernel='linear', C=C, probability=True,random_state=0)

In [21]:
Linear_SVC_model.fit(X_train, np.ravel(y_train))

y_pred = Linear_SVC_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy (train) for %s with C = %d is:   %0.1f%% " % ('Linear_SVC_model', C, accuracy * 100))
print(classification_report(y_test,y_pred))

Accuracy (train) for Linear_SVC_model with C = 50 is:   77.5% 
              precision    recall  f1-score   support

     chinese       0.67      0.71      0.69       248
      indian       0.87      0.88      0.88       234
    japanese       0.79      0.73      0.76       234
      korean       0.86      0.75      0.80       249
        thai       0.71      0.82      0.76       234

    accuracy                           0.77      1199
   macro avg       0.78      0.78      0.78      1199
weighted avg       0.78      0.77      0.78      1199



Still the model's accuracy has increased to 77.5% and the training time t0 55 seconds.

From this, we can conclude that, the higher the Regualarization parameter, the higher the model predicts the classess correctly. However, the training time is becoming larger, which is a disadvantage.

Let us now reduce the value to 5 and see:

In [22]:
C=5
Linear_SVC_model = SVC(kernel='linear', C=C, probability=True,random_state=0)

In [23]:
Linear_SVC_model.fit(X_train, np.ravel(y_train))

y_pred = Linear_SVC_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy (train) for %s with C = %d is:   %0.1f%% " % ('Linear_SVC_model', C, accuracy * 100))
print(classification_report(y_test,y_pred))

Accuracy (train) for Linear_SVC_model with C = 5 is:   77.1% 
              precision    recall  f1-score   support

     chinese       0.69      0.70      0.69       248
      indian       0.84      0.85      0.85       234
    japanese       0.78      0.73      0.75       234
      korean       0.87      0.76      0.81       249
        thai       0.70      0.82      0.75       234

    accuracy                           0.77      1199
   macro avg       0.78      0.77      0.77      1199
weighted avg       0.78      0.77      0.77      1199



Changing the 'C' value from 10 to 5: increases the accuracy from 77% to 77.1%. The training time also dropped fro 37s to to 26s.

This is pretty good!

Let's reduce the Regularization parameter to 2:

In [8]:
C=2
Linear_SVC_model = SVC(kernel='linear', C=C, probability=True,random_state=0)

In [9]:
Linear_SVC_model.fit(X_train, np.ravel(y_train))

y_pred = Linear_SVC_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy (train) for %s with C = %d is:   %0.1f%% " % ('Linear_SVC_model', C, accuracy * 100))
print(classification_report(y_test,y_pred))

Accuracy (train) for Linear_SVC_model with C = 2 is:   79.2% 
              precision    recall  f1-score   support

     chinese       0.74      0.70      0.72       256
      indian       0.92      0.87      0.89       248
    japanese       0.74      0.78      0.76       226
      korean       0.87      0.76      0.81       253
        thai       0.71      0.87      0.78       216

    accuracy                           0.79      1199
   macro avg       0.80      0.79      0.79      1199
weighted avg       0.80      0.79      0.79      1199



The training time has a little bit dropped to 25s,and the accuracy has drastically increased to 79.2%.

Wow!

Let's now change the value of C to 1:

In [10]:
C=1
Linear_SVC_model = SVC(kernel='linear', C=C, probability=True,random_state=0)

In [11]:
Linear_SVC_model.fit(X_train, np.ravel(y_train))

y_pred = Linear_SVC_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy (train) for %s with C = %d is:   %0.1f%% " % ('Linear_SVC_model', C, accuracy * 100))
print(classification_report(y_test,y_pred))

Accuracy (train) for Linear_SVC_model with C = 1 is:   79.1% 
              precision    recall  f1-score   support

     chinese       0.74      0.70      0.72       256
      indian       0.92      0.88      0.90       248
    japanese       0.73      0.77      0.75       226
      korean       0.87      0.75      0.81       253
        thai       0.71      0.87      0.78       216

    accuracy                           0.79      1199
   macro avg       0.79      0.79      0.79      1199
weighted avg       0.80      0.79      0.79      1199



The trainig time drops to 24s, whereas the accuracy remains thesame.

This indicates that the lower the Regularization parameter, the faster the training, and also the higher the accuracy.

That's fantastic!

### Tuning Regularization and random_state Parameters

Let's set them all to 10 and see

In [38]:
C=10
Linear_SVC_model = SVC(kernel='linear', C=C, probability=True,random_state=10)

In [39]:
Linear_SVC_model.fit(X_train, np.ravel(y_train))

y_pred = Linear_SVC_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy (train) for %s with C = %d is:   %0.1f%% " % ('Linear_SVC_model', C, accuracy * 100))
print(classification_report(y_test,y_pred))

Accuracy (train) for Linear_SVC_model with C = 10 is:   79.0% 
              precision    recall  f1-score   support

     chinese       0.73      0.70      0.71       256
      indian       0.90      0.88      0.89       248
    japanese       0.75      0.76      0.75       226
      korean       0.83      0.75      0.79       253
        thai       0.73      0.87      0.80       216

    accuracy                           0.79      1199
   macro avg       0.79      0.79      0.79      1199
weighted avg       0.79      0.79      0.79      1199



The accuracy has dropped to 79.0%

Let's change them to 5:

In [43]:
C=5
Linear_SVC_model = SVC(kernel='linear', C=C, probability=True,random_state=5)

In [44]:
Linear_SVC_model.fit(X_train, np.ravel(y_train))

y_pred = Linear_SVC_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy (train) for %s with C = %d is:   %0.1f%% " % ('Linear_SVC_model', C, accuracy * 100))
print(classification_report(y_test,y_pred))

Accuracy (train) for Linear_SVC_model with C = 5 is:   79.1% 
              precision    recall  f1-score   support

     chinese       0.73      0.71      0.72       256
      indian       0.93      0.87      0.90       248
    japanese       0.72      0.76      0.74       226
      korean       0.87      0.76      0.81       253
        thai       0.72      0.88      0.79       216

    accuracy                           0.79      1199
   macro avg       0.80      0.79      0.79      1199
weighted avg       0.80      0.79      0.79      1199



The accuracy has now went back to 79.1% but with a slower training rate.

Let's set them to 3:

In [45]:
C=3
Linear_SVC_model = SVC(kernel='linear', C=C, probability=True,random_state=3)

In [46]:
Linear_SVC_model.fit(X_train, np.ravel(y_train))

y_pred = Linear_SVC_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy (train) for %s with C = %d is:   %0.1f%% " % ('Linear_SVC_model', C, accuracy * 100))
print(classification_report(y_test,y_pred))

Accuracy (train) for Linear_SVC_model with C = 3 is:   79.4% 
              precision    recall  f1-score   support

     chinese       0.72      0.70      0.71       256
      indian       0.92      0.88      0.89       248
    japanese       0.76      0.78      0.77       226
      korean       0.87      0.76      0.81       253
        thai       0.72      0.87      0.79       216

    accuracy                           0.79      1199
   macro avg       0.80      0.80      0.79      1199
weighted avg       0.80      0.79      0.79      1199



Wow! The accuracy is now 79.4%

Let's check when C=2 and random_state=3:

In [51]:
C=2
Linear_SVC_model = SVC(kernel='linear', C=C, probability=True,random_state=3)

In [52]:
Linear_SVC_model.fit(X_train, np.ravel(y_train))

y_pred = Linear_SVC_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy (train) for %s with C = %d is:   %0.1f%% " % ('Linear_SVC_model', C, accuracy * 100))
print(classification_report(y_test,y_pred))

Accuracy (train) for Linear_SVC_model with C = 2 is:   79.2% 
              precision    recall  f1-score   support

     chinese       0.74      0.70      0.72       256
      indian       0.92      0.87      0.89       248
    japanese       0.74      0.78      0.76       226
      korean       0.87      0.76      0.81       253
        thai       0.71      0.87      0.78       216

    accuracy                           0.79      1199
   macro avg       0.80      0.79      0.79      1199
weighted avg       0.80      0.79      0.79      1199



The accuracy is 79.2%

### Conclusion

We can say that our model better performs with an accuracy of 79.4% when both Regularization and random_state paramters set to 3.