1. Introduction:

Support Vector Machine (SVM) is a supervised learning algorithm used for both classification and regression tasks. It works by finding the optimal boundary (or hyperplane) that separates data into two classes. The key goal is to maximize the margin, which is the distance between the hyperplane and the nearest data points, called support vectors.

SVM is particularly useful for binary classification problems and performs well with small to medium-sized datasets. It can also handle high-dimensional data, but the performance decreases with large datasets or noisy data. The choice of kernel and data preprocessing plays a crucial role in the model's performance.

2. Types of Kernels in SVM:


SVM uses different kernels to handle various types of data. The kernel function transforms the input data into a higher-dimensional space to make it easier to separate using a linear decision boundary. There are several types of kernels:

Linear Kernel: Suitable for linearly separable data. It draws a straight line to separate the data into different classes.
Polynomial Kernel: Used for data that is not linearly separable but can be separated with a polynomial decision boundary.
Gaussian (RBF) Kernel: Useful for non-linear data, creating a boundary in a higher-dimensional space to separate the classes effectively.
Sigmoid Kernel: Based on the sigmoid function, commonly used in certain classification problems.

In [16]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score , confusion_matrix , classification_report

In [17]:
df=pd.read_csv('/content/diabetes.csv')
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [18]:
df.isnull().sum()

Unnamed: 0,0
Pregnancies,0
Glucose,0
BloodPressure,0
SkinThickness,0
Insulin,0
BMI,0
DiabetesPedigreeFunction,0
Age,0
Outcome,0


In [19]:
X=df.drop(columns=['Outcome'])
y=df['Outcome']
scaler=StandardScaler()
x_scaled=scaler.fit_transform(X)

In [20]:
X_train, X_test, y_train , y_test=train_test_split(x_scaled, y, test_size=0.3, random_state=42)
model=SVC(kernel='linear',random_state=42)
model.fit(X_train, y_train)
y_pred= model.predict(X_test)

In [21]:
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Accuracy Score:")
print(accuracy_score(y_test, y_pred))



Confusion Matrix:
[[123  28]
 [ 30  50]]
Classification Report:
              precision    recall  f1-score   support

           0       0.80      0.81      0.81       151
           1       0.64      0.62      0.63        80

    accuracy                           0.75       231
   macro avg       0.72      0.72      0.72       231
weighted avg       0.75      0.75      0.75       231

Accuracy Score:
0.7489177489177489
