In [None]:
""" Q1. What is the relationship between polynomial functions and kernel functions in machine learning
algorithms? """

# ans
""" In machine learning, polynomial functions and kernel functions are related through the use of kernel trick, which allows linear algorithms like Support Vector Machines (SVM) to handle nonlinear data by implicitly transforming it into a higher-dimensional feature space. Polynomial functions are a type of kernel function used for this purpose.

The kernel trick works by computing the dot product between data points in the transformed feature space without explicitly performing the transformation. In the case of polynomial kernels, it's used to capture polynomial relationships between features. """

In [None]:
""" Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn? """

# ans
""" from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Load a sample dataset (e.g., the diabetes dataset)
data = datasets.load_diabetes()
X = data.data
y = data.target

# Split the data into a training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an SVR model with a polynomial kernel
svr = SVR(kernel='poly', degree=3, C=1.0, epsilon=0.2, gamma='scale') # You can adjust hyperparameters like degree, C, epsilon, and gamma

# Fit the model to the training data
svr.fit(X_train, y_train)

# Make predictions on the test data
y_pred = svr.predict(X_test)

# Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

In the code above, we import the necessary libraries, load a sample dataset, split it into training and testing sets, create an SVR model with a polynomial kernel (set by kernel='poly'), specify hyperparameters like degree, C, epsilon, and gamma, and then fit the model to the training data. Finally, we make predictions and evaluate the model's performance using mean squared error.

You can adjust the hyperparameters like degree, C, epsilon, and gamma to fine-tune the model's performance based on your specific problem.
"""

In [None]:
""" Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?  """

# ans
""" In Support Vector Regression (SVR), the epsilon parameter (ϵ) defines the width of the margin around the regression line within which data points are not considered errors. Increasing the value of epsilon has a direct impact on the number of support vectors.

Smaller Epsilon: When you use a smaller value of epsilon, the margin around the regression line becomes narrower. As a result, fewer data points are within the margin, and more data points are considered errors (violations of the margin). This typically leads to a larger number of support vectors.

Larger Epsilon: Conversely, when you increase the value of epsilon, the margin around the regression line becomes wider. More data points are now within the margin, and fewer data points are considered errors. This results in a smaller number of support vectors.

The choice of epsilon controls the trade-off between model flexibility and the tolerance for errors. A smaller epsilon leads to a more rigid model with a smaller margin and more support vectors, potentially fitting the training data more closely. In contrast, a larger epsilon allows for a more flexible model with a wider margin and fewer support vectors, potentially tolerating some level of error in the training data.

It's essential to tune the epsilon parameter carefully to strike the right balance between model flexibility and the avoidance of overfitting to the noise in the training data. """

In [None]:
""" Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter
affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works
and provide examples of when you might want to increase or decrease its value?  """

# ans
""" The choice of kernel function, C parameter, epsilon parameter, and gamma parameter in Support Vector Regression (SVR) significantly affects the performance of the model. Here's an explanation of each parameter and how it can be adjusted:

Kernel Function:
The kernel function determines how SVR models nonlinear relationships in the data.
Common choices include the linear kernel, polynomial kernel, and radial basis function (RBF) kernel.
When to use:
Linear Kernel: When the data has a linear relationship.
Polynomial Kernel: When the data has polynomial relationships.
RBF Kernel: When the data has complex, nonlinear relationships.

C Parameter:
The C parameter controls the trade-off between fitting the training data and minimizing the margin violation (error).
Smaller C values result in a wider margin but may tolerate some margin violations (soft margin).
Larger C values lead to a narrower margin but aim to minimize margin violations (hard margin).
When to increase/decrease:
Increase C: If you want to fit the training data more closely and reduce margin violations. Use a smaller margin and more support vectors.
Decrease C: If you want a wider margin and can tolerate some margin violations. Use a larger margin with fewer support vectors.

Epsilon Parameter:
The epsilon parameter (ϵ) defines the width of the margin around the regression line within which data points are not considered errors.
Smaller epsilon values result in a narrower margin, while larger epsilon values create a wider margin.
When to increase/decrease:
Increase epsilon: If you want a wider margin and can tolerate some data points falling within the margin.
Decrease epsilon: If you want a narrower margin and aim to minimize data points within the margin.

Gamma Parameter (RBF Kernel):
The gamma parameter (γ) controls the shape and flexibility of the RBF kernel.
Smaller gamma values result in a smoother, wider curve, while larger gamma values lead to a narrower, more flexible curve.
When to increase/decrease:
Increase gamma: If you want a more flexible model that can capture intricate patterns in the data but risk overfitting.
Decrease gamma: If you want a smoother model with less complexity and avoid overfitting to noise. """

In [5]:
from sklearn.datasets import load_breast_cancer

In [6]:
X, y = load_breast_cancer(return_X_y=True)

In [7]:
from sklearn.model_selection import train_test_split

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=10)

In [9]:
from sklearn.preprocessing import StandardScaler

In [10]:
scaler=StandardScaler()

In [11]:
X_train_scaled=scaler.fit_transform(X_train)

In [12]:
X_test_scaled=scaler.transform(X_test)

In [13]:
from sklearn.svm import SVC

In [14]:
clf=SVC()

In [15]:
clf.fit(X_train_scaled, y_train)

In [16]:
y_pred=clf.predict(X_test_scaled)

In [17]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [18]:
accuracy_score(y_test, y_pred)

0.9790209790209791

In [19]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.95      1.00      0.97        52
           1       1.00      0.97      0.98        91

    accuracy                           0.98       143
   macro avg       0.97      0.98      0.98       143
weighted avg       0.98      0.98      0.98       143



## Hyperparameter Tuning

In [20]:
from sklearn.model_selection import GridSearchCV

In [21]:
parameters={"kernel":["poly", "linear", "rbf"], 
           "C":[1,5,10],
           "gamma":["scale","auto"]}

In [22]:
grid=GridSearchCV(estimator=SVC(), param_grid=parameters, cv=5)

In [23]:
grid.fit(X_train_scaled, y_train)

In [24]:
grid.best_params_

{'C': 5, 'gamma': 'scale', 'kernel': 'rbf'}

In [25]:
y_pred=grid.predict(X_test_scaled)

In [26]:
from sklearn.metrics import accuracy_score

In [27]:
accuracy_score(y_test, y_pred)

0.986013986013986