# Assignment

### Ans1)

Polynomial functions are mathematical functions that involve variables raised to different powers. In the context of machine learning, polynomial functions can be used to transform the original input data into a higher-dimensional feature space, where the relationship between variables can become linear or more separable.

Kernel functions, on the other hand, are used in kernel methods, such as the kernel trick in SVM. Kernel functions compute the similarity or inner product between pairs of data points in the original or transformed feature space, without explicitly calculating the transformation. They effectively measure the proximity or similarity of data points, allowing algorithms to capture complex relationships and make non-linear classifications.

### Ans2)

In [4]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test= train_test_split(X,y, test_size=0.2)
svm_pol=SVC(kernel='poly',degree=3, C=1000)
svm_pol.fit(X_train, y_train)
y_pred = svm_pol.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:',accuracy)

Accuracy: 0.9333333333333333


### Ans3)

In Support Vector Regression (SVR), epsilon (ε) is a hyperparameter that determines the width of the epsilon-insensitive tube around the predicted regression line. It controls the tolerance for errors, allowing some data points to fall within the tube without being penalized.

Increasing the value of epsilon in SVR will typically result in an increase in the number of support vectors. This is because a larger value of epsilon allows more training samples to be within the margin, and hence more support vectors will be required to define the boundary.

### Ans4)

The choice of kernel function, C parameter, epsilon parameter, and gamma parameter in Support Vector Regression (SVR) can significantly impact the performance of the model. Here's an explanation of each parameter and how they influence SVR, along with examples of when you might want to increase or decrease their values:

1. Kernel Function:
   - The kernel function determines the type of non-linear transformation applied to the input data. It maps the data into a higher-dimensional feature space where the regression problem may become linearly separable.
   - Example: If the data exhibits complex non-linear patterns, you might choose a kernel function like the Radial Basis Function (RBF) kernel. If the data has a clear polynomial relationship, you might opt for a polynomial kernel.
   
2. C Parameter:
   - The C parameter controls the trade-off between the model's ability to fit the training data and the smoothness of the resulting regression line. It controls the penalty for deviations from the regression line.
   - Example: If you want the model to fit the training data more closely, you can increase the value of C. On the other hand, if you want a smoother regression line that is less affected by outliers, you can decrease the value of C.

3. Epsilon Parameter:
   - The epsilon parameter defines the width of the margin around the regression line. It represents the tolerance for errors in the SVR model.
   - Example: If you want to allow more errors or deviations from the regression line, you can increase the value of epsilon. Conversely, if you want to enforce a stricter fit to the training data, you can decrease the value of epsilon.

4. Gamma Parameter:
   - The gamma parameter determines the influence of individual training samples on the SVR model. It controls the shape and smoothness of the regression curve.
   - Example: Increasing the value of gamma makes the model more sensitive to individual training samples, resulting in a more wiggly or intricate regression curve. Decreasing gamma makes the model less sensitive to individual samples, leading to a smoother regression curve.


### Ans5)

In [1]:
import pandas as pd
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


In [7]:
df=pd.read_csv('winequality-red.csv')

In [8]:
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [10]:
X=df
y=df.pop('quality')

In [11]:
X_train, X_test, y_train, y_test= train_test_split(X,y, test_size=0.1, random_state=35)

In [15]:
scaler = StandardScaler()
X_train_scaled= scaler.fit_transform(X_train)
X_test_scaled=scaler.transform(X_test)

In [16]:
svc=SVC()
svc.fit(X_train_scaled, y_train)

In [17]:
y_pred=svc.predict(X_test_scaled)

In [18]:
accuracy=accuracy_score(y_test, y_pred)
print(accuracy)

0.56875


In [19]:
param_grid={'C':[0.1, 1, 10, 100], 'gamma':[0.1, 1, 10], 'kernel':['linear', 'rbf']}
grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=3)
grid.fit(X_train_scaled, y_train)

Fitting 5 folds for each of 24 candidates, totalling 120 fits
[CV 1/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.576 total time=   0.1s
[CV 2/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.583 total time=   0.1s
[CV 3/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.618 total time=   0.1s
[CV 4/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.594 total time=   0.1s
[CV 5/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.554 total time=   0.1s
[CV 1/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.576 total time=   0.1s
[CV 2/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.587 total time=   0.1s
[CV 3/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.618 total time=   0.1s
[CV 4/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.615 total time=   0.1s
[CV 5/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.568 total time=   0.1s
[CV 1/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.576 total time=   0.1s
[CV 2/5] END .....C=0.1, gamma=1, kernel=linear

In [20]:
best_params_svc = grid.best_estimator_
X_scaled = scaler.fit_transform(X)
best_params_svc.fit(X_scaled, y)

In [22]:
import pickle
pickle.dump(best_params_svc, open("model.pkl",'wb'))