<a href="https://colab.research.google.com/github/shreyashrestha07/CUS615_ShreyaShrestha/blob/master/Problem_set_03_SVM_Classifer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook is part of Dr. Christoforos Christoforou's course materials. You may not, nor may you knowingly allow others to reproduce or distribute lecture notes, course materials or any of their derivatives without the instructor's express written consent.

# Problem Set 03 - Support Vector Machines Classifiers
**Professor:** Dr. Christoforos Christoforou

For this problem set you will need the following libraries, which are pre-installed with the colab environment: 

* [Numpy](https://www.numpy.org/) is an array manipulation library, used for linear algebra, Fourier transform, and random number capabilities.
* [Pandas](https://pandas.pydata.org/) is a library for data manipulation and data analysis.
* [Matplotlib](https://matplotlib.org/) is a library which generates figures and provides graphical user interface toolkit.

You can load them using the following import statement:

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pylab as plt


from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.metrics import plot_confusion_matrix
from sklearn.metrics import classification_report


from sklearn.svm import SVC
from sklearn.svm import LinearSVC



## 1. Objective 
As part of this problem set, you will expore work on the `wine quality dataset`  in order to: 
- To explore the physiocochemical properties of red wine
- To determine an optimal machine learning model for red wine quality classification

For that, you will be using an `instance-based` classifier, namely K-NN algorithm. Review the information provided in the problem set, and complete all challenges listed.  

## 2. Wine Quality Dataset - Data Description

For this dataset you will be using the `wine quality dataset`. Below is a description of the various parameters listed in that dataset (i.e. potential features):

* fixed.acidity (tartaric acid - g / dm^3): most acids involved with wine or fixed or nonvolatile (do not evaporate readily) 
* volatile.acidity (acetic acid - g / dm^3): the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste 
* citric.acid (g / dm^3): the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste 
* residual.sugar (g / dm^3): the amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet 
* chlorides (sodium chloride - g / dm^3): the amount of salt in the wine 
* free.sulfur.dioxide (mg / dm^3): the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine 
* total.sulfur.dioxide (mg / dm^3): amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine 
* density (g / cm^3): the density of water is close to that of water depending on the percent alcohol and sugar content 
* pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale 
* sulphates (potassium sulphate - g / dm3): a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant 
* alcohol (% by volume): the percent alcohol content of the wine 
* quality: quality score between 0 and 10



## Download dataset from kaggle
You will use the Kaggle CLI to dowload the `Wine Quality Dataset` to your colab enviroment. You will need to upload your kaggle API (see problem_set 01 for direction on how to obtain your API key. 

In [4]:
# install kaggle CLI
!pip install -q kaggle

In [5]:
# Upload the kaggle API key of your account 
from google.colab import files 
files.upload()
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle
!chmod 600 ~/.kaggle/kaggle.json

Saving kaggle.json to kaggle (2).json
mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [6]:
# View list of data files available in the dataset. 
# Format : kaggle dataset files <dataset-URI>
!kaggle datasets files cchristoforou/practice-dataset-for-tutorials

name                  size  creationDate         
-------------------  -----  -------------------  
wine.data             11KB  2021-01-23 15:26:18  
countries.csv          2KB  2021-01-23 15:26:18  
country_total.csv    533KB  2021-01-23 15:26:18  
wineQualityReds.csv   92KB  2021-01-23 15:26:18  


In [7]:
# Download - Specify the parameters.  
kaggle_dataset_URI = "cchristoforou/practice-dataset-for-tutorials"
output_folder = "sample_data/problem_set02"
kaggle_data_file1 = "wineQualityReds.csv"

In [8]:
# Download the first file from dataset - countries.csv
!kaggle datasets download $kaggle_dataset_URI --file $kaggle_data_file1 --path $output_folder 


wineQualityReds.csv: Skipping, found more recently modified local copy (use --force to force download)


## Load the data 
The code below showcase how to load the data in a pandas `DataFrame` and apply a train_test_split on the data. 

In [9]:
# Code to load the data from file. Here we use the pandas library to read the csv file. 
datafile = "./sample_data/problem_set02/wineQualityReds.csv"
wine_df = pd.read_csv(datafile)
wine_df.drop(wine_df.columns[0],axis=1,inplace=True)

In [10]:
# Split the data into a training and testing set using the sklearn function train_test_split
# Noteice that 
X_train, X_test, y_train, y_test = train_test_split(wine_df.drop('quality',axis=1), wine_df['quality'], test_size=.25, random_state=42)


## Challenge 1
Use the variables `X_train`, `X_test`, `y_train`, and `y_test` to explore your data. In particular, calculate and display the following information.

* Number of samples in the training set in total and in each class.
* Number of samples in the testing set in total and in each class.
* Number of features in the dataset. 
* Number of classes in the dataset.
* IDs of the number of classes.


In [11]:
# Solution 1:

wine_df_train, wine_df_features = X_train.shape 
wine_df_test, _ = X_test.shape
wine_df_classes = len(np.unique(y_train))

#1 Number of samples in the training set in total and in each class
print("Number of samples in training set:  %d \n(%d : Class 3, %d : Class 4, %d : Class 5, %d Class 6, %d : Class 7, %d : Class 8) " % (wine_df_train, np.sum(y_train==3), np.sum(y_train==4), np.sum(y_train==5), np.sum(y_train==6), np.sum(y_train==7), np.sum(y_train==8)))

#2 Number of samples in the testing set in total and in each class
print("Number of samples in training set: %d \n(%d : Class 3, %d : Class 4, %d : Class 5, %d Class 6, %d : Class 7, %d : Class 8)" % (wine_df_test, np.sum(y_test==3), np.sum(y_test==4), np.sum(y_test==5), np.sum(y_test==6), np.sum(y_test==7), np.sum(y_test==8)))

#3 Number of features in the dataset.
print("Number of features: "+ str(wine_df_features))

#4 Number of classes in the dataset.
print("Number of classes: " + str(wine_df_classes))

#5 IDs of the number of classes.
print("IDs for class labels: " + str(np.unique(y_train)))




Number of samples in training set:  1199 
(9 : Class 3, 40 : Class 4, 517 : Class 5, 469 Class 6, 151 : Class 7, 13 : Class 8) 
Number of samples in training set: 400 
(1 : Class 3, 13 : Class 4, 164 : Class 5, 169 Class 6, 48 : Class 7, 5 : Class 8)
Number of features: 11
Number of classes: 6
IDs for class labels: [3 4 5 6 7 8]


# Challenge 2

Train a **SVM** classifier using the `(X_train,y_train)` dataset and use the trained model to predict the underlying classes for the observations in the test dataset `X_test`. Store your prediction in a variable called `y_pred`.

In [12]:
# To train the SVM classifier
model = LinearSVC()

# To fit the model
model.fit(X_train, y_train)




LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
          intercept_scaling=1, loss='squared_hinge', max_iter=1000,
          multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
          verbose=0)

In [13]:
# To predict output
y_pred = model.predict(X_test)
print(f"x_test.shape {X_test.shape} y_pred.shape{y_pred.shape}")

x_test.shape (400, 11) y_pred.shape(400,)


In [14]:
# Inspect the content of y_pred
y_pred

array([5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 7, 5, 5, 7, 5, 5, 5,
       5, 7, 5, 5, 7, 5, 5, 7, 5, 5, 5, 5, 5, 5, 6, 5, 5, 6, 5, 5, 7, 5,
       5, 6, 7, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 6, 5,
       7, 5, 7, 5, 7, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5,
       7, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
       7, 5, 5, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
       5, 7, 5, 5, 5, 5, 5, 5, 5, 5, 6, 5, 5, 5, 6, 5, 5, 5, 5, 6, 5, 5,
       5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 7, 5, 5, 5,
       5, 5, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 7, 5, 5, 5, 5, 6, 5, 5, 5, 5,
       5, 5, 5, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5,
       5, 5, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 7, 6, 5, 5, 5, 5, 5, 5, 5, 5,
       5, 5, 5, 5, 6, 5, 7, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 5, 7, 5, 7,
       6, 6, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5,
       5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5, 7, 5, 7, 5,

# Challenge 3

Evaluate the performance of your classifier. Calculate and display the following:
* print the `confusion matrix`.
* `normalized confusion matrix`. 
* the probablitity of correct classification (accuracy score). 
* the `precision`, `recall`, and `f1-score` for each class.

In [15]:
y_true = y_test

In [16]:

# Print the confusing matrix  
print("\n This is the confusion matrix")
cnf_mx = metrics.confusion_matrix(y_true, y_pred)
print(cnf_mx)

# Normalized cnfusion matrix
print("\n This is the normalized confusion matrix")
cnf_mx_joint = cnf_mx.astype('float')/cnf_mx.sum()
print(cnf_mx_joint)




 This is the confusion matrix
[[  0   0   1   0   0   0]
 [  0   0  13   0   0   0]
 [  0   0 159   1   4   0]
 [  0   0 141  13  15   0]
 [  0   0  25   7  16   0]
 [  0   0   1   1   3   0]]

 This is the normalized confusion matrix
[[0.     0.     0.0025 0.     0.     0.    ]
 [0.     0.     0.0325 0.     0.     0.    ]
 [0.     0.     0.3975 0.0025 0.01   0.    ]
 [0.     0.     0.3525 0.0325 0.0375 0.    ]
 [0.     0.     0.0625 0.0175 0.04   0.    ]
 [0.     0.     0.0025 0.0025 0.0075 0.    ]]


In [17]:
# The probablitity of correct classification (accuracy score).
acc = metrics.accuracy_score(y_true, y_pred)
print("\n Accuracy: %.3f" % acc)


 Accuracy: 0.470


In [18]:
#report classificaiton results (the precision, recall, and f1-score for each class.)
print(classification_report(y_test,y_pred)) 

              precision    recall  f1-score   support

           3       0.00      0.00      0.00         1
           4       0.00      0.00      0.00        13
           5       0.47      0.97      0.63       164
           6       0.59      0.08      0.14       169
           7       0.42      0.33      0.37        48
           8       0.00      0.00      0.00         5

    accuracy                           0.47       400
   macro avg       0.25      0.23      0.19       400
weighted avg       0.49      0.47      0.36       400



  _warn_prf(average, modifier, msg_start, len(result))


# Challenge 4

The code below loads the same dataset, but treats it as a binary classification problem. That is, instead of classifying an observation into one of 10 categories (0..10), we consider all observations with score above 5 as being good and all observation below or equal to five as being bad.





In [19]:
# Code to load the data from file. Here we use the pandas library to read the csv file. 
datafile = "./sample_data/problem_set02/wineQualityReds.csv"
wine_df = pd.read_csv(datafile)
wine_df.drop(wine_df.columns[0],axis=1,inplace=True)

wine_df['quality'] = np.where(wine_df['quality']>5,"Good","Bad")

In [20]:
wine_df.head()

Unnamed: 0,fixed.acidity,volatile.acidity,citric.acid,residual.sugar,chlorides,free.sulfur.dioxide,total.sulfur.dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,Bad
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,Bad
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,Bad
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,Good
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,Bad


In [21]:
X_train, X_test, y_train, y_test = train_test_split(wine_df.drop('quality',axis=1), wine_df['quality'], test_size=.25, random_state=42)


## Callenge 4.1
Use the variables `X_train`, `X_test`, `y_train`, and `y_test` to explore your data. In particular, calculate and display the following information.
* Number of samples in the training set in total and in each class.
* Number of samples in the testing set in total and in each class.
* Number of features in the dataset. 
* Number of classes in the dataset.
* IDs of the number of classes.




In [22]:
# Solution 4.1:

wine_df_train, wine_df_features = X_train.shape 
wine_df_test, _ = X_test.shape
wine_df_classes = len(np.unique(y_train))

#1 Number of samples in the training set in total and in each class
print("Number of samples in training set:  %d  (%d Good, %d Bad) " % (wine_df_train, np.sum(y_train=="Good"), np.sum(y_train=="Bad")))

#2 Number of samples in the testing set in total and in each class
print("Number of samples in testing set:  %d  (%d Good, %d Bad) " % (wine_df_test, np.sum(y_test=="Good"), np.sum(y_test=="Bad")))

#3 Number of features in the dataset.
print("Number of features: "+ str(wine_df_features))

#4 Number of classes in the dataset.
print("Number of classes: " + str(wine_df_classes))

#5 IDs of the number of classes.
print("IDs for class labels: " + str(np.unique(y_train)))


Number of samples in training set:  1199  (633 Good, 566 Bad) 
Number of samples in testing set:  400  (222 Good, 178 Bad) 
Number of features: 11
Number of classes: 2
IDs for class labels: ['Bad' 'Good']


## Challenge 4.2 
Train a **Support Vector Machine** classifier using the `(X_train,y_train)` dataset and use trained model to predict the underlying classes for the observations in the test dataset `X_test`. Store your prediction in a variable called `y_pred`.

In [23]:
# To train the SVM classifier
model = LinearSVC()

# To fit the model
model.fit(X_train, y_train)




LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
          intercept_scaling=1, loss='squared_hinge', max_iter=1000,
          multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
          verbose=0)

In [24]:
# To predict output
y_pred = model.predict(X_test)
print(f"x_test.shape {X_test.shape} y_pred.shape{y_pred.shape}")

# Inspect the content of y_pred
y_pred

x_test.shape (400, 11) y_pred.shape(400,)


array(['Bad', 'Bad', 'Good', 'Good', 'Good', 'Bad', 'Bad', 'Bad', 'Good',
       'Good', 'Good', 'Bad', 'Good', 'Bad', 'Bad', 'Good', 'Bad', 'Bad',
       'Good', 'Bad', 'Bad', 'Bad', 'Good', 'Good', 'Bad', 'Bad', 'Good',
       'Bad', 'Bad', 'Good', 'Bad', 'Bad', 'Good', 'Bad', 'Good', 'Bad',
       'Good', 'Good', 'Bad', 'Good', 'Bad', 'Bad', 'Good', 'Bad', 'Good',
       'Good', 'Good', 'Bad', 'Bad', 'Good', 'Bad', 'Bad', 'Good', 'Good',
       'Bad', 'Bad', 'Good', 'Bad', 'Good', 'Bad', 'Bad', 'Good', 'Bad',
       'Bad', 'Good', 'Bad', 'Good', 'Bad', 'Good', 'Bad', 'Good', 'Bad',
       'Good', 'Good', 'Good', 'Bad', 'Good', 'Good', 'Good', 'Good',
       'Bad', 'Good', 'Good', 'Bad', 'Good', 'Good', 'Bad', 'Good',
       'Good', 'Bad', 'Good', 'Bad', 'Good', 'Good', 'Bad', 'Good', 'Bad',
       'Good', 'Bad', 'Good', 'Bad', 'Bad', 'Good', 'Good', 'Good',
       'Good', 'Bad', 'Bad', 'Good', 'Bad', 'Good', 'Bad', 'Good', 'Bad',
       'Good', 'Good', 'Good', 'Bad', 'Bad', 'Good', 

## Challenge 4.3
Evaluate the performance of your classifier. Calculate and display the following:
* print the `confusion matrix`.
* `normalized confusion matrix`. 
* the probablitity of correct classification (accuracy score). 
* the `precision`, `recall`, and `f1-score` for each class.

In [25]:
y_true = y_test

In [26]:

# Print the confusing matrix  
print("\n This is the confusion matrix")
cnf_mx = metrics.confusion_matrix(y_true, y_pred)
print(cnf_mx)

# Normalized cnfusion matrix
print("\n This is the normalized confusion matrix")
cnf_mx_joint = cnf_mx.astype('float')/cnf_mx.sum()
print(cnf_mx_joint)



 This is the confusion matrix
[[132  46]
 [ 65 157]]

 This is the normalized confusion matrix
[[0.33   0.115 ]
 [0.1625 0.3925]]


In [27]:
# The probablitity of correct classification (accuracy score).
acc = metrics.accuracy_score(y_true, y_pred)
print("\n Accuracy: %.3f" % acc)


 Accuracy: 0.723


In [28]:
#report classificaiton results (the precision, recall, and f1-score for each class.)
print(classification_report(y_test,y_pred)) 

              precision    recall  f1-score   support

         Bad       0.67      0.74      0.70       178
        Good       0.77      0.71      0.74       222

    accuracy                           0.72       400
   macro avg       0.72      0.72      0.72       400
weighted avg       0.73      0.72      0.72       400



# Challenge 5

The **SVM** classifier accepts a number of parameters. These parameters include the parameter `C` (i.e. the regularization parameter), the `kernel` which specified the kernel function to be used, and the parameter `gamma` which can be used to specify the kernel coefficents for certain kernels (i.e. `rbf`, `poly` and `sigmoid`). You can find more information about the various parameters in implementation of the SVM classifier on the following website:

- [SVM documentation on sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)
- [User Guide on Support Vector Machines](https://scikit-learn.org/stable/modules/svm.html#svm-classification)
- [Kernel Function Supported by sklearn library](https://scikit-learn.org/stable/modules/svm.html#svm-kernels)


After reading the documentation to understand how the various parameters are used, evaluate the classifier for different values of C, gamma and kernel parameters and identify which configuration achieve the best performance on the testing set. Plot or print your results.


In [29]:
# Support Vector Machine - Kernel  
model = SVC(C=10, kernel="rbf", gamma=25)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

#report classificaiton results 
print(classification_report(y_test,y_pred)) 



              precision    recall  f1-score   support

         Bad       1.00      0.19      0.32       178
        Good       0.61      1.00      0.76       222

    accuracy                           0.64       400
   macro avg       0.80      0.60      0.54       400
weighted avg       0.78      0.64      0.56       400



In [30]:
# Support Vector Machine - Kernel  
model = SVC(C=10, kernel="rbf", gamma=25)

# To fit the model
model.fit(X_train, y_train)

# To predict output
y_pred = model.predict(X_test)

#report classificaiton results 
print(classification_report(y_test,y_pred)) 

              precision    recall  f1-score   support

         Bad       1.00      0.19      0.32       178
        Good       0.61      1.00      0.76       222

    accuracy                           0.64       400
   macro avg       0.80      0.60      0.54       400
weighted avg       0.78      0.64      0.56       400



In [31]:
# Support Vector Machine - Kernel  
model = SVC(C=10, kernel="sigmoid", gamma=25)

# To fit the model
model.fit(X_train, y_train)

# To predict output
y_pred = model.predict(X_test)

#report classificaiton results 
print(classification_report(y_test,y_pred)) 

              precision    recall  f1-score   support

         Bad       0.00      0.00      0.00       178
        Good       0.56      1.00      0.71       222

    accuracy                           0.56       400
   macro avg       0.28      0.50      0.36       400
weighted avg       0.31      0.56      0.40       400



  _warn_prf(average, modifier, msg_start, len(result))


In [32]:
# Support Vector Machine - Kernel  
model = SVC(C=15, kernel="rbf", gamma=10)

# To fit the model
model.fit(X_train, y_train)

# To predict output
y_pred = model.predict(X_test)

#report classificaiton results 
print(classification_report(y_test,y_pred)) 

              precision    recall  f1-score   support

         Bad       0.97      0.20      0.33       178
        Good       0.61      1.00      0.76       222

    accuracy                           0.64       400
   macro avg       0.79      0.60      0.55       400
weighted avg       0.77      0.64      0.57       400



In [44]:
# Support Vector Machine - Kernel  
model = SVC(C=20, kernel="rbf", gamma=50)

# To fit the model
model.fit(X_train, y_train)

# To predict output
y_pred = model.predict(X_test)

#report classificaiton results 
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

         Bad       1.00      0.19      0.31       178
        Good       0.60      1.00      0.75       222

    accuracy                           0.64       400
   macro avg       0.80      0.59      0.53       400
weighted avg       0.78      0.64      0.56       400



In [34]:
# Support Vector Machine - Kernel  
model = SVC(C=15, kernel="sigmoid", gamma=10)

# To fit the model
model.fit(X_train, y_train)

# To predict output
y_pred = model.predict(X_test)

#report classificaiton results 
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

         Bad       0.00      0.00      0.00       178
        Good       0.56      1.00      0.71       222

    accuracy                           0.56       400
   macro avg       0.28      0.50      0.36       400
weighted avg       0.31      0.56      0.40       400



  _warn_prf(average, modifier, msg_start, len(result))



Copyright Statement: Copyright © 2020 Christoforou. The materials provided by the instructor of this course, including this notebook, are for the use of the students enrolled in the course. Materials are presented in an educational context for personal use and study and should not be shared, distributed, disseminated or sold in print — or digitally — outside the course without permission. You may not, nor may you knowingly allow others to reproduce or distribute lecture notes, course materials as well as any of their derivatives without the instructor's express written consent.

In [35]:
# Support Vector Machine - Kernel  
model = SVC(C=20, kernel="rbf", gamma=15)

# To fit the model
model.fit(X_train, y_train)

# To predict output
y_pred = model.predict(X_test)

#report classificaiton results 
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

         Bad       0.97      0.19      0.32       178
        Good       0.61      1.00      0.75       222

    accuracy                           0.64       400
   macro avg       0.79      0.59      0.54       400
weighted avg       0.77      0.64      0.56       400



In [47]:
# Support Vector Machine - Kernel  without gamma
model = SVC(C=10, kernel="rbf")

# To fit the model
model.fit(X_train, y_train)

# To predict output
y_pred = model.predict(X_test)

#report classificaiton results 
print(classification_report(y_test,y_pred)) 

              precision    recall  f1-score   support

         Bad       0.64      0.69      0.66       178
        Good       0.73      0.69      0.71       222

    accuracy                           0.69       400
   macro avg       0.69      0.69      0.69       400
weighted avg       0.69      0.69      0.69       400



In [45]:
# Support Vector Machine - Kernel  
model = SVC(C=20, kernel="sigmoid", gamma=45)

# To fit the model
model.fit(X_train, y_train)

# To predict output
y_pred = model.predict(X_test)

#report classificaiton results 
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

         Bad       0.00      0.00      0.00       178
        Good       0.56      1.00      0.71       222

    accuracy                           0.56       400
   macro avg       0.28      0.50      0.36       400
weighted avg       0.31      0.56      0.40       400



  _warn_prf(average, modifier, msg_start, len(result))


In [48]:
# Support Vector Machine - Kernel  without gamma
model = SVC(C=20, kernel="sigmoid")

# To fit the model
model.fit(X_train, y_train)

# To predict output
y_pred = model.predict(X_test)

#report classificaiton results 
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

         Bad       0.35      0.35      0.35       178
        Good       0.47      0.46      0.47       222

    accuracy                           0.41       400
   macro avg       0.41      0.41      0.41       400
weighted avg       0.42      0.41      0.42       400



In [39]:
# Support Vector Machine - Kernel  
model = SVC(C=25, kernel="rbf", gamma=15)

# To fit the model
model.fit(X_train, y_train)

# To predict output
y_pred = model.predict(X_test)

#report classificaiton results 
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

         Bad       0.97      0.19      0.32       178
        Good       0.61      1.00      0.75       222

    accuracy                           0.64       400
   macro avg       0.79      0.59      0.54       400
weighted avg       0.77      0.64      0.56       400



In [40]:
# Support Vector Machine - Kernel  
model = SVC(C=20, kernel="sigmoid", gamma=15)

# To fit the model
model.fit(X_train, y_train)

# To predict output
y_pred = model.predict(X_test)

#report classificaiton results 
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

         Bad       0.00      0.00      0.00       178
        Good       0.56      1.00      0.71       222

    accuracy                           0.56       400
   macro avg       0.28      0.50      0.36       400
weighted avg       0.31      0.56      0.40       400



  _warn_prf(average, modifier, msg_start, len(result))


In [49]:
# Support Vector Machine - Kernel  without gamma
model = SVC(C=50, kernel="rbf")

# To fit the model
model.fit(X_train, y_train)

# To predict output
y_pred = model.predict(X_test)

#report classificaiton results 
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

         Bad       0.64      0.75      0.69       178
        Good       0.77      0.67      0.71       222

    accuracy                           0.70       400
   macro avg       0.71      0.71      0.70       400
weighted avg       0.71      0.70      0.71       400



In [43]:
# Support Vector Machine - Kernel  
model = SVC(C=30, kernel="sigmoid", gamma=25)

# To fit the model
model.fit(X_train, y_train)

# To predict output
y_pred = model.predict(X_test)

#report classificaiton results 
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

         Bad       0.00      0.00      0.00       178
        Good       0.56      1.00      0.71       222

    accuracy                           0.56       400
   macro avg       0.28      0.50      0.36       400
weighted avg       0.31      0.56      0.40       400



  _warn_prf(average, modifier, msg_start, len(result))


In [50]:
# Support Vector Machine - Kernel  without gamma
model = SVC(C=35, kernel="rbf")

# To fit the model
model.fit(X_train, y_train)

# To predict output
y_pred = model.predict(X_test)

#report classificaiton results 
print(classification_report(y_test,y_pred)) 

              precision    recall  f1-score   support

         Bad       0.65      0.75      0.69       178
        Good       0.77      0.67      0.72       222

    accuracy                           0.70       400
   macro avg       0.71      0.71      0.70       400
weighted avg       0.71      0.70      0.71       400

