In [1]:
## Support Vector Machines - 1

Q1. What is the mathematical formula for a linear SVM?

Ans:
    
    f(x, y) = x^T y + c

    y represents the input vector
    x is the weight vector that determines the orientation of the hyperplane
    c is the bias term that shifts the hyperplane away from the origin. 

Q2. What is the objective function of a linear SVM?

Ans:

    The objective function of a linear Support Vector Machine (SVM) is to find the optimal hyperplane that maximizes the margin between the two classes. The margin is defined as the distance between the hyperplane and the closest training examples of each class.

Q3. What is the kernel trick in SVM?

Ans:

    The kernel trick is a technique used in Support Vector Machine (SVM) algorithms to transform non-linearly separable data into a higher-dimensional feature space where the data can be more easily separated by a linear hyperplane. It allows SVMs to perform complex, non-linear classification tasks without the need for explicitly computing the transformation to the higher-dimensional space.

Q4. What is the role of support vectors in SVM Explain with example 

Ans:

    In Support Vector Machine (SVM) algorithms, support vectors are the training examples that are closest to the decision boundary (hyperplane) between the two classes. These are the training examples that have the largest margin, and are the most informative for defining the decision boundary.

    Example:

        Consider a binary classification problem with two classes, red and blue. The SVM algorithm identifies the support vectors as the red and blue data points closest to the decision boundary. The decision boundary is then computed as the hyperplane that maximizes the margin between the two support vectors. When a new data point is presented to the SVM, it is classified based on which side of the decision boundary it falls on.

Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in SVM?

ans:
    
    1. Hyperplane: In SVM, the hyperplane is the decision boundary that separates the two classes in a binary classification problem.
    
    2. Marginal Plane : The marginal plane in SVM is a plane parallel to the hyperplane that touches the closest data points of each class. The distance between the hyperplane and the marginal plane is called the margin. The goal of SVM is to find the hyperplane with the maximum margin
    
    3. Soft Margin : In a soft margin SVM, the goal is to find a hyperplane that separates the two classes with the fewest possible misclassifications. This is useful when the data is not perfectly separable, or when there is noise or outliers in the data.
    
    4. Hard Margin : In a hard margin SVM, the goal is to find a hyperplane that completely separates the two classes without any misclassifications. This is only possible if the data is linearly separable.

Q6. SVM Implementation through Iris dataset.

In [2]:
import seaborn as sns

In [3]:
df = sns.load_dataset("iris")

In [4]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [5]:
df['species'].unique()

array(['setosa', 'versicolor', 'virginica'], dtype=object)

In [6]:
## Encoding of species
# 0 = setosa
# 1 = versicolor
# 2 = virginica

from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
df['species'] = encoder.fit_transform(df['species'])

In [7]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [8]:
df['species'].unique()

array([0, 1, 2])

In [9]:
# Check Null Values

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    int64  
dtypes: float64(4), int64(1)
memory usage: 6.0 KB


In [10]:
df.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
count,150.0,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333,1.0
std,0.828066,0.435866,1.765298,0.762238,0.819232
min,4.3,2.0,1.0,0.1,0.0
25%,5.1,2.8,1.6,0.3,0.0
50%,5.8,3.0,4.35,1.3,1.0
75%,6.4,3.3,5.1,1.8,2.0
max,7.9,4.4,6.9,2.5,2.0


In [11]:
# Dependent(y) and independent(X) variable
X = df[['sepal_length', 'sepal_width', 'petal_length','petal_width']]
y = df['species']

In [12]:
X

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


In [13]:
y

0      0
1      0
2      0
3      0
4      0
      ..
145    2
146    2
147    2
148    2
149    2
Name: species, Length: 150, dtype: int64

In [14]:
# Train Test Split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=20)

In [15]:
X_train.count()

sepal_length    130
sepal_width     130
petal_length    130
petal_width     130
dtype: int64

In [16]:
y_train.count()

130

In [17]:
X_test.count()

sepal_length    20
sepal_width     20
petal_length    20
petal_width     20
dtype: int64

In [18]:
y_test.count()

20

In [19]:
# Model Training through SVR 

from sklearn.svm import SVR
svr = SVR()
svr.fit(X_train, y_train)

In [20]:
# Predict the test value
y_pred = svr.predict(X_test)

In [21]:
# Check the peformance 

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [22]:
confusion_matrix(y_pred, y_t

ValueError: Classification metrics can't handle a mix of continuous and multiclass targets