<a href="https://colab.research.google.com/github/maktaurus/ML-Work/blob/main/Classic_Algorithams/Support_Vector_Machines.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Support vector Machines (SVM)

Support Vector Machines (SVMs) are a type of supervised learning algorithm used for classification and regression tasks.

SVMs are supervised learning algorithms that:

Find the best hyperplane (decision boundary) to separate classes.

Maximize the margin (distance) between the hyperplane and closest data points (support vectors).

A hyperplane is a fundamental concept in mathematics, particularly in linear algebra and machine learning.

Types of SVMs:

Linear SVM: Separates classes using a linear hyperplane.

Non-Linear SVM: Uses kernel functions to handle non-linear relationships.

Soft Margin SVM: Allows misclassifications, trading off margin and accuracy.

 **Input needs to be
rescaled to [-1,1]**

# Pet Adoption dataset

In [63]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler,LabelEncoder,OrdinalEncoder
from sklearn import svm, datasets
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import accuracy_score,pair_confusion_matrix,classification_report,precision_score,recall_score,f1_score


In [2]:
!wget 'http://storage.googleapis.com/download.tensorflow.org/data/petfinder-mini.zip'

--2024-10-05 15:14:52--  http://storage.googleapis.com/download.tensorflow.org/data/petfinder-mini.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 64.233.181.207, 173.194.193.207, 173.194.194.207, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|64.233.181.207|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1668792 (1.6M) [application/zip]
Saving to: ‘petfinder-mini.zip’


2024-10-05 15:14:52 (64.9 MB/s) - ‘petfinder-mini.zip’ saved [1668792/1668792]



In [3]:
!unzip /content/petfinder-mini.zip

Archive:  /content/petfinder-mini.zip
   creating: petfinder-mini/
  inflating: petfinder-mini/README.md  
  inflating: __MACOSX/petfinder-mini/._README.md  
  inflating: petfinder-mini/petfinder-mini.csv  
  inflating: __MACOSX/petfinder-mini/._petfinder-mini.csv  


In [62]:
data = pd.read_csv("/content/petfinder-mini/petfinder-mini.csv")
data.head()

Unnamed: 0,Type,Age,Breed1,Gender,Color1,Color2,MaturitySize,FurLength,Vaccinated,Sterilized,Health,Fee,Description,PhotoAmt,AdoptionSpeed
0,Cat,3,Tabby,Male,Black,White,Small,Short,No,No,Healthy,100,Nibble is a 3+ month old ball of cuteness. He ...,1,2
1,Cat,1,Domestic Medium Hair,Male,Black,Brown,Medium,Medium,Not Sure,Not Sure,Healthy,0,I just found it alone yesterday near my apartm...,2,0
2,Dog,1,Mixed Breed,Male,Brown,White,Medium,Medium,Yes,No,Healthy,0,Their pregnant mother was dumped by her irresp...,7,3
3,Dog,4,Mixed Breed,Female,Black,Brown,Medium,Short,Yes,No,Healthy,150,"Good guard dog, very alert, active, obedience ...",8,2
4,Dog,1,Mixed Breed,Male,Black,No Color,Medium,Short,No,No,Healthy,0,This handsome yet cute boy is up for adoption....,3,2


In orginal dataset target columns AdoptionSpeed specify the speed at which a pet will be adopted (e.g. in the first week, the first month, the first three months, and so on).

In this example we will transform the column wheter the pet was adopted or not by coverting it to binary classification problem.

In [64]:
data["AdoptionSpeed"] = np.where(data["AdoptionSpeed"]==4,0,1)

**Drop the uncessary Description column from dataset**

In [65]:
data.drop("Description",axis=1,inplace=True)

In [66]:
data["AdoptionSpeed"].value_counts()

Unnamed: 0_level_0,count
AdoptionSpeed,Unnamed: 1_level_1
1,8457
0,3080


In [67]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11537 entries, 0 to 11536
Data columns (total 14 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Type           11537 non-null  object
 1   Age            11537 non-null  int64 
 2   Breed1         11537 non-null  object
 3   Gender         11537 non-null  object
 4   Color1         11537 non-null  object
 5   Color2         11537 non-null  object
 6   MaturitySize   11537 non-null  object
 7   FurLength      11537 non-null  object
 8   Vaccinated     11537 non-null  object
 9   Sterilized     11537 non-null  object
 10  Health         11537 non-null  object
 11  Fee            11537 non-null  int64 
 12  PhotoAmt       11537 non-null  int64 
 13  AdoptionSpeed  11537 non-null  int64 
dtypes: int64(4), object(10)
memory usage: 1.2+ MB


**Split the dataset**

In [68]:
x = data.drop("AdoptionSpeed",axis=1)
y = data["AdoptionSpeed"]

In [69]:
cat_columns = []
int_columns = []

for col in x.columns:
  if x[col].dtype == "object":
    cat_columns.append(col)
  else:
    int_columns.append(col)
print(cat_columns)
print(int_columns)

['Type', 'Breed1', 'Gender', 'Color1', 'Color2', 'MaturitySize', 'FurLength', 'Vaccinated', 'Sterilized', 'Health']
['Age', 'Fee', 'PhotoAmt']


In [70]:
oe = OrdinalEncoder()
x[cat_columns] = oe.fit_transform(x[cat_columns])

In [71]:
scaler = MinMaxScaler(feature_range=(-1,1))
x = scaler.fit_transform(x)

In [72]:
x.shape,y.shape

((11537, 13), (11537,))

In [73]:
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=42,stratify=y)

In [118]:
clf = svm.SVC()

In [121]:
parameters = {"kernel":["linear","poly","rbf"],
              "C":range(1,10)}

In [122]:
grid_search = GridSearchCV(clf,parameters)
grid_search.fit(x_train,y_train)

  _data = np.array(data, dtype=dtype, copy=copy,


In [123]:
grid_search.best_params_

{'C': 9, 'kernel': 'rbf'}

In [125]:
best_clf = grid_search.best_estimator_
best_clf.fit(x_train,y_train)

In [126]:
best_clf.score(x_test,y_test)

0.7365684575389948

In [131]:
correct = 0
wrong = 0

for x,y in zip(x_test,y_test):
  pred = best_clf.predict([x])
  if pred == y:
    correct += 1
  else:
    wrong += 1
  # break

In [132]:
correct, wrong

(1700, 608)

# Mushroom dataset

The datset is available under my ML-Work/Data repository

In [135]:
mush_data = pd.read_csv("/content/mushrooms-full-dataset.csv")
mush_data.head()

Unnamed: 0,poisonous,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,...,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,p,x,s,n,t,p,f,c,n,k,...,s,w,w,p,w,o,p,k,s,u
1,e,x,s,y,t,a,f,c,b,k,...,s,w,w,p,w,o,p,n,n,g
2,e,b,s,w,t,l,f,c,b,n,...,s,w,w,p,w,o,p,n,n,m
3,p,x,y,w,t,p,f,c,n,n,...,s,w,w,p,w,o,p,k,s,u
4,e,x,s,g,f,n,f,w,b,k,...,s,w,w,p,w,o,e,n,a,g


In [137]:
mush_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8124 entries, 0 to 8123
Data columns (total 22 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   poisonous                 8124 non-null   object
 1   cap-shape                 8124 non-null   object
 2   cap-surface               8124 non-null   object
 3   cap-color                 8124 non-null   object
 4   bruises                   8124 non-null   object
 5   odor                      8124 non-null   object
 6   gill-attachment           8124 non-null   object
 7   gill-spacing              8124 non-null   object
 8   gill-size                 8124 non-null   object
 9   gill-color                8124 non-null   object
 10  stalk-shape               8124 non-null   object
 11  stalk-surface-above-ring  8124 non-null   object
 12  stalk-surface-below-ring  8124 non-null   object
 13  stalk-color-above-ring    8124 non-null   object
 14  stalk-color-below-ring  

**split the dataset and transform and rescale it**

In [138]:
x = mush_data.drop("poisonous",axis=1)
y = mush_data["poisonous"]

In [139]:
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.2,random_state=42,stratify=y)

In [143]:
label_en = OrdinalEncoder()
target_en = LabelEncoder()

In [144]:
x_train = label_en.fit_transform(x_train)
x_test = label_en.transform(x_test)
y_train = target_en.fit_transform(y_train)
y_test = target_en.transform(y_test)

In [147]:
rescale = MinMaxScaler(feature_range=(-1,1))
x_train = rescale.fit_transform(x_train)
x_test = rescale.transform(x_test)

**Create a grid search and get the best model parameter**

In [148]:
params = {"kernel":["linear","poly","rbf"],
              "C":range(1,10)}

In [149]:
svm_clf = svm.SVC()

In [150]:
grid_search = GridSearchCV(svm_clf,params)
grid_search.fit(x_train,y_train)

In [151]:
# check for best parameter for classifier
grid_search.best_params_

{'C': 2, 'kernel': 'poly'}

In [152]:
# create a instance of best estimator and train the model
grid_clf = grid_search.best_estimator_
grid_clf.fit(x_train,y_train)

In [153]:
# Check the score on test dataset
grid_clf.score(x_test,y_test)

1.0