# SVM (Support Vector Machine)

SVM is a supervised machine learning algorithm used for classification and regression tasks. It works by finding the optimal hyperplane that separates data points of different classes with the maximum margin. The objective of SVM is to classify data by mapping it to a higher-dimensional space where the classes become linearly separable.



#### SVM Binary Classifier
A binary SVM classifier is used for classifying data into two classes (e.g., positive and negative). It works by finding a decision boundary (hyperplane) that maximizes the margin between the two classes. The model is trained using the features of the data points and tries to minimize classification errors while ensuring the margin between classes is as wide as possible.



### Concepts of Hyperplane and Support Vectors

- **Hyperplane**: A hyperplane is a decision boundary that separates different classes in the feature space. In an SVM, it is defined by the equation:

  $
  w \cdot x + b = 0
  $

  where `w` is the weight vector and `b` is the bias term.

- **Support Vectors**: Support vectors are the data points that are closest to the hyperplane and directly influence the position of the hyperplane. These points are critical in defining the optimal margin.

---

### Hinge Loss

Hinge loss is used in SVMs to penalize misclassified data points. It ensures that the margin is as wide as possible while allowing some flexibility for misclassification. The hinge loss function for a single data point is:

$$
L(y, f(x)) = \max(0, 1 - y \cdot f(x))
$$

where `y` is the true label and `f(x)` is the predicted value.

- **Gradients for dw and db**: To update the weights and bias, the gradients of the hinge loss with respect to `w` and `b` are computed as follows:

  $$ 
  dw = \frac{1}{m} \sum_{i=1}^{m} \left[ \lambda w - y_i x_i \right] \text{ if } y_i (w \cdot x_i + b) < 1
  $$

  $$ 
  db = \frac{1}{m} \sum_{i=1}^{m} \left[ -y_i \right] \text{ if } y_i (w \cdot x_i + b) < 1
  $$

  where `m` is the number of data points, and `x_i` and `y_i` are the feature and label of the i-th data point.

### Gradient Descent

Gradient descent is an optimization algorithm used to minimize the loss function by iteratively adjusting the model parameters (weights and bias). In SVMs, it is used to find the optimal values for `w` and `b` by updating them in the direction of the steepest decrease in the loss function. The update rules are:

$
$$ w \leftarrow w - \eta \cdot \frac{\partial L}{\partial w} $$
$

$
$$ b \leftarrow b - \eta \cdot \frac{\partial L}{\partial b} $$
$

where `η` is the learning rate and `J(w, b)` is the loss function.


### SVM Binary Classifier class
The classifier is initialized with three parameters: learning rate, number of iterations, and lambda parameter (which helps in regularization).  
The fit function trains the model by iterating through the dataset and updating the weights (w) and bias (b) to minimize classification errors.  
The update_weights function adjusts the model’s weights based on whether a data point is correctly classified or not. If a point is correctly classified, the weights are slightly adjusted; otherwise, they are updated more significantly.     
Finally, the predict function takes new data and uses the trained weights to classify each point as 0 or 1. The model uses dot product and sign function to determine the classification output.

In [138]:
import numpy as np

In [139]:
class SVM_classifier():
    
    def __init__(self,learning_rate,num_iterations,lambda_parameter):
        self.learning_rate=learning_rate
        self.num_iterations=num_iterations
        self.lambda_parameter=lambda_parameter
        
    def fit(self,X,Y):
        self.X=X
        self.Y=Y
        self.m,self.n=X.shape # m->no of rows, n->no of cols(features)
        
        self.w=np.zeros(self.n)
        self.b=0
        
        for i in range(self.num_iterations):
            self.update_weights()
        
    def update_weights(self):
        y_label= np.where(self.Y <=0, -1, 1)
        
        for index, x_i in enumerate (self.X):
            condition= y_label[index]*(np.dot(x_i,self.w)-self.b)>= 1
            
            if condition:
                dw=2*self.lambda_parameter*self.w
                db=0
            else:
                dw=2*self.lambda_parameter*self.w- np.dot(x_i,y_label[index])
                db=y_label[index]
        
        self.w=self.w- self.learning_rate*dw
        self.b=self.b- self.learning_rate*db
        
    def predict(self,X):
        output=np.dot(X, self.w)-self.b
        predicted_labels = np.sign(output)
        y_hat=np.where(predicted_labels<=-1,0,1)
        
        return y_hat
        

In [140]:
# IMPORTING LIBRARIES
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

### Loading Dataset and Preprocessing

In [141]:
data=pd.read_csv('diabetes_data.csv')

In [142]:
data.head(4)

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0


In [143]:
data.shape

(768, 9)

In [144]:
data.describe()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


In [145]:
data['Outcome'].value_counts()
# 0 --> Non-diabetic
# 1 --> Diabetic

0    500
1    268
Name: Outcome, dtype: int64

In [146]:
# Splitting data into features and target variables
features=data.drop(columns='Outcome',axis=1)
target=data['Outcome']

In [147]:
features.head(3)

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32


In [148]:
target.head(3)

0    1
1    0
2    1
Name: Outcome, dtype: int64

####  Data Standardization
This ensures that features are scaled appropriately for algorithms that are sensitive to feature magnitudes, such as gradient descent-based methods.

In [149]:
# data Standardization
scaler= StandardScaler()

In [150]:
scaler.fit(features)

In [151]:
 std_data=scaler.transform(features)

In [152]:
std_data

array([[ 0.63994726,  0.84832379,  0.14964075, ...,  0.20401277,
         0.46849198,  1.4259954 ],
       [-0.84488505, -1.12339636, -0.16054575, ..., -0.68442195,
        -0.36506078, -0.19067191],
       [ 1.23388019,  1.94372388, -0.26394125, ..., -1.10325546,
         0.60439732, -0.10558415],
       ...,
       [ 0.3429808 ,  0.00330087,  0.14964075, ..., -0.73518964,
        -0.68519336, -0.27575966],
       [-0.84488505,  0.1597866 , -0.47073225, ..., -0.24020459,
        -0.37110101,  1.17073215],
       [-0.84488505, -0.8730192 ,  0.04624525, ..., -0.20212881,
        -0.47378505, -0.87137393]])

In [153]:
# traintest split
X_train,X_test,Y_train,Y_test=train_test_split(std_data,target,test_size=0.2,random_state=2)

#### Creating an object of the SVM_Classifier class

In [154]:
model=SVM_classifier(0.05,1000,0.01)

In [155]:
#Fitting (training) model on diabetes dataset
model.fit(X_train,Y_train)

In [156]:
# Model Evaluation

# training data accuracy
X_train_preds=model.predict(X_train)
training_data_acc=accuracy_score(Y_train,X_train_preds)
print(f"Training accuracy: {training_data_acc}")

Training accuracy: 0.6856677524429967


In [157]:
# testing data accuracy
X_test_preds=model.predict(X_test)
testing_data_acc=accuracy_score(Y_test,X_test_preds)
print(f"Testing accuracy: {testing_data_acc}")

Testing accuracy: 0.7207792207792207


### Prediction System
Model predicts for new data points unknown to the model.

In [158]:
input_data = (7,107,74,0,0,29.6,0.254,31)

# change the input data to numpy array
input_data_as_numpy_array = np.asarray(input_data)

# reshape the array
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

# standardizing the input data
std_data_new = scaler.transform(input_data_reshaped)
print(std_data_new)

prediction = model.predict(std_data_new)
print(prediction)

if (prediction[0] == 0):
  print('The person is not diabetic')

else:
  print('The Person is diabetic')

[[ 0.93691372 -0.43485916  0.25303625 -1.28821221 -0.69289057 -0.30366421
  -0.65801229 -0.19067191]]
[0]
The person is not diabetic


