### Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?

In machine learning, especially in Support Vector Machines (SVMs), kernel functions play a crucial role in enabling algorithms to handle non-linear data by implicitly mapping it into a higher-dimensional space. Polynomial functions are one type of kernel function that can be used for this purpose.

Kernel Functions
A kernel function is a function that computes the dot product of two vectors in a higher-dimensional space without explicitly performing the transformation. This is known as the "kernel trick," which allows SVMs to operate in a high-dimensional space efficiently.

Polynomial Kernel
A polynomial kernel is a specific type of kernel function. The polynomial kernel of degree d between two vectors 𝑥 and 𝑦 is defined as:
k(x,y) = (x*y + c)**d

By using a polynomial kernel, SVMs can learn non-linear decision boundaries. This is particularly useful for datasets that are not linearly separable in the original feature space.

### Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

#### while we are create the model we can just pass kernel = "poly"

In [11]:
x,y = make_classification(n_samples=100, n_features=20, n_classes=2,n_clusters_per_class=2)
X_train, X_test, y_train, y_test = train_test_split(x,y,test_size=0.20, random_state=42)
# X_train.shape, y_train.shape, X_test.shape, y_test.shape
svc_ = SVC(kernel="poly")
svc_.fit(X_train,y_train)
y_predict = svc_.predict(X_test)
print(f"accuracy of the model is : {accuracy_score(y_test, y_predict)}")

accuracy of the model is : 0.95


### Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

In Support Vector Regression (SVR), the parameter ϵ (epsilon) defines a margin of tolerance where no penalty is given to errors. It essentially specifies a tube (or band) around the regression function within which predictions are considered acceptable and do not contribute to the loss function. The value of ϵ has a direct impact on the number of support vectors used in the model. Here's how:

Impact of Increasing ϵ on the Number of Support Vectors 

Larger 𝜖 Value:
Wider Margin: Increasing ϵ widens the tube around the regression function where no penalty is applied.
Fewer Support Vectors: With a wider margin, more data points fall within the ϵ-tube, and therefore, fewer data points lie outside the tube. Only the points outside the ϵ-tube become support vectors because they are the ones that contribute to the cost function. Thus, increasing ϵ typically decreases the number of support vectors.

Smaller ϵ Value:
Narrower Margin: Decreasing ϵ narrows the tube around the regression function.
More Support Vectors: With a narrower margin, more data points lie outside the ϵ-tube and thus become support vectors. Therefore, decreasing ϵ typically increases the number of support vectors, as more points contribute to the cost function.


Increasing ϵ in SVR results in a wider ϵ-tube, which typically leads to fewer support vectors, as more data points fall within the margin of tolerance.
Decreasing ϵ results in a narrower ϵ-tube, which typically leads to more support vectors, as fewer data points fall within the margin of tolerance and more points contribute to the cost function.

### Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?

The performance of Support Vector Regression (SVR) is influenced by several key parameters: the choice of kernel function, the regularization parameter 
C, the epsilon parameter ϵ, and the gamma parameter γ (when using certain kernels like the RBF kernel). Each parameter plays a specific role and adjusting them can help tailor the model to the specific characteristics of the data.

## Kernel Function
The kernel function determines the way in which the input data is transformed into a higher-dimensional space where it might become more linearly separable.

Common kernel functions include:

Linear Kernel: Use when the data is approximately linearly separable or when the dataset is high-dimensional.

Polynomial Kernel: Use when the data has polynomial relationships. Higher-degree polynomials can capture more complex relationships but can also overfit.

RBF Kernel: Use when the data has complex relationships and you do not know the exact form of the non-linearity. It's a versatile kernel suitable for most problems.


## Regularization Parameter 𝐶

The regularization parameter 𝐶 controls the trade-off between maximizing the margin and minimizing the classification error. In the context of SVR, it balances the trade-off between the smoothness of the function and the error term. 

High 𝐶: The model tries to fit the training data as well as possible, with less emphasis on maintaining a wide margin. This can lead to overfitting, especially in the presence of noise.
Low 𝐶: The model allows more slack (errors) and focuses more on maximizing the margin. This can lead to underfitting but can improve generalization.

## Epsilon Parameter 𝜖

The epsilon parameter ϵ specifies the margin of tolerance where no penalty is given to errors. It defines a tube within which the predictions are considered acceptable.

High 𝜖: The tube is wider, allowing more deviations from the actual values without penalty. This results in a simpler model with fewer support vectors.
Low ϵ: The tube is narrower, penalizing more deviations from the actual values. This results in a more complex model with more support vectors.

## Gamma Parameter 𝛾

The gamma parameter γ is specific to certain kernels like the RBF kernel. It defines the influence of a single training example.

High γ: Each training example has a high influence, leading to a more complex model that can capture fine details in the data but may overfit.
Low 𝛾: Each training example has a low influence, leading to a smoother decision boundary that may underfit.



Kernel Function: Choose based on the complexity and nature of the data.

C Parameter: Balances the trade-off between a smooth model and fitting the training data closely.

Epsilon Parameter: Defines the tolerance margin where no penalty is given to errors.

Gamma Parameter: Controls the influence of individual training points in certain kernels.


## Q5. Assignment:
     Import the necessary libraries and load the dataseg
     Split the dataset into training and testing setZ
     Preprocess the data using any technique of your choice (e.g. scaling, normaliMationK
     Create an instance of the SVC classifier and train it on the training datW
     hse the trained classifier to predict the labels of the testing datW
     Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy, precision, recall, F1-scoreK
     Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to
     improve its performanc_
     Train the tuned classifier on the entire dataseg
     Save the trained classifier to a file for future use.

In [36]:
import numpy as np 
import pandas as pd
import seaborn as sns 
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [31]:
from sklearn.model_selection import train_test_split 
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

In [12]:
data = pd.read_csv(r"C:\Users\dhruv\Documents\ML\datasets\diabetes.csv")
data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [13]:
data.isnull().sum()

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64

In [15]:
data.duplicated().sum()

0

In [17]:
scaler

In [19]:
X,y = data.drop("Outcome",axis=1), data["Outcome"]
X.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32
3,1,89,66,23,94,28.1,0.167,21
4,0,137,40,35,168,43.1,2.288,33


In [20]:
y

0      1
1      0
2      1
3      0
4      1
      ..
763    0
764    0
765    0
766    1
767    0
Name: Outcome, Length: 768, dtype: int64

In [26]:
col = X.columns
col

Index(['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
       'BMI', 'DiabetesPedigreeFunction', 'Age'],
      dtype='object')

In [29]:
X = pd.DataFrame(scaler.fit_transform(X), columns=col)

In [30]:
X

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,0.639947,0.848324,0.149641,0.907270,-0.692891,0.204013,0.468492,1.425995
1,-0.844885,-1.123396,-0.160546,0.530902,-0.692891,-0.684422,-0.365061,-0.190672
2,1.233880,1.943724,-0.263941,-1.288212,-0.692891,-1.103255,0.604397,-0.105584
3,-0.844885,-0.998208,-0.160546,0.154533,0.123302,-0.494043,-0.920763,-1.041549
4,-1.141852,0.504055,-1.504687,0.907270,0.765836,1.409746,5.484909,-0.020496
...,...,...,...,...,...,...,...,...
763,1.827813,-0.622642,0.356432,1.722735,0.870031,0.115169,-0.908682,2.532136
764,-0.547919,0.034598,0.046245,0.405445,-0.692891,0.610154,-0.398282,-0.531023
765,0.342981,0.003301,0.149641,0.154533,0.279594,-0.735190,-0.685193,-0.275760
766,-0.844885,0.159787,-0.470732,-1.288212,-0.692891,-0.240205,-0.371101,1.170732


In [32]:
y

0      1
1      0
2      1
3      0
4      1
      ..
763    0
764    0
765    0
766    1
767    0
Name: Outcome, Length: 768, dtype: int64

In [33]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.20, random_state=42)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((614, 8), (154, 8), (614,), (154,))

In [37]:
svc_ = SVC()
svc_.fit(X_train, y_train)
y_predict = svc_.predict(X_test)
score = accuracy_score(y_test, y_predict)
print(f"accuracy is : {score}")

accuracy is : 0.7272727272727273


In [38]:
## lets optimize the model 

model_params = {
    "C" : [1, 0.01, 0.1, 0.001],
    "kernel" : ["linear", "poly", "rbf"],
    "degree" : [2,3,4],
    "gamma" : ["scale", "auto"]
}

In [39]:
svc_ = SVC()
grid = GridSearchCV(svc_, param_grid= model_params, cv=5,verbose=True, scoring="accuracy")

In [40]:
grid

In [41]:
grid.fit(X_train,y_train)

Fitting 5 folds for each of 72 candidates, totalling 360 fits


In [42]:
grid.best_estimator_

In [43]:
best_model = SVC(C=1, degree=2)
best_model

In [44]:
best_model.fit(X_train,y_train)
y_pre = best_model.predict(X_test)
score_ = accuracy_score(y_test, y_pre)
print(f"the accuracy score is : {score_}")

the accuracy score is : 0.7272727272727273


In [45]:
score_

0.7272727272727273

In [46]:
best_model.fit(X,y)

In [47]:
import pickle

In [52]:
pickle.dump(best_model, open("model.pkl","wb"))

In [54]:
model = pickle.load(open("model.pkl","rb"))

In [55]:
model