# Support Vector Machines-2

#### Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?

 The relationship between polynomial functions and kernel functions in machine learning algorithms lies in the concept of the "kernel trick." A polynomial kernel is a specific type of kernel function used in Support Vector Machines (SVM) and other machine learning algorithms. Kernel functions allow us to implicitly perform computations in a higher-dimensional feature space without actually transforming the data into that space.

Polynomial kernel functions are a type of kernel that computes the inner product of transformed data points in a higher-dimensional space, where the transformation is defined by a polynomial function. This allows SVM to effectively learn non-linear decision boundaries in the original feature space by mapping the data into a higher-dimensional space.

#### Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

To implement an SVM with a polynomial kernel in Python using Scikit-learn, we can use the SVC class and specify the kernel parameter as 'poly'. We can also set the degree parameter to control the degree of the polynomial. 

In [1]:
# Here's an example:
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM with polynomial kernel
svm_poly = SVC(kernel='poly', degree=3)
svm_poly.fit(X_train, y_train)

# Predict and evaluate
y_pred = svm_poly.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

Accuracy: 1.0


#### Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

In Support Vector Regression (SVR), the epsilon parameter (often denoted as ε) is used to define the margin of tolerance for errors in the regression prediction. It determines the width of the ε-tube around the predicted regression line where data points are considered to be within an acceptable error range. Increasing the value of epsilon allows more data points to be within the ε-tube and potentially become support vectors.

#### Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?

The performance of Support Vector Regression (SVR) is influenced by various parameters:
* **Kernel Function:** The choice of kernel function (linear, polynomial, radial basis function, etc.) defines the transformation used to map the data into a higher-dimensional space. The appropriate kernel depends on the data's distribution and the complexity of the underlying relationship.
* **C Parameter:** The C parameter trades off between finding a larger-margin hyperplane and a lower training error. Smaller values of C create a wider margin but might tolerate more errors, while larger values of C create a narrower margin but aim to minimize errors.
* **Epsilon Parameter:** As mentioned earlier, the epsilon parameter defines the width of the ε-tube in SVR. A larger epsilon allows more data points within the tube, potentially resulting in a larger number of support vectors.
* **Gamma Parameter:** For non-linear kernels like RBF, the gamma parameter defines the influence of a single training example. Higher values of gamma lead to more complex decision boundaries and can cause overfitting if not properly tuned.

For example, increasing the C parameter might be useful when the dataset has noisy or overlapping points, while tuning the epsilon parameter might be necessary to control the trade-off between fitting the data closely and generalizing well to new data.

#### Q5. Assignment:
* Import the necessary libraries and load the dataseg
* Split the dataset into training and testing setZ
* Preprocess the data using any technique of your choice (e.g. scaling, normaliMationK
* Create an instance of the SVC classifier and train it on the training datW
* hse the trained classifier to predict the labels of the testing datW
* Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy, precision, recall, F1-score
* Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to improve its performanc_
* Train the tuned classifier on the entire dataseg
* Save the trained classifier to a file for future use.

**Note:** You can use any dataset of your choice for this assignment, but make sure it is suitable for
classification and has a sufficient number of features and samples.

In [2]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV
from joblib import dump

In [3]:
# Loading the Dataset and spliting them
ds = load_wine()
x=ds.data[:, :2]
y=ds.target
X_train,X_test,y_train,y_test = train_test_split(x,y,test_size=0.10,random_state=27, stratify =y)

# Scaling the X dataset
ss = StandardScaler()
x_train = ss.fit_transform(X_train)
x_test = ss.transform(X_test)

# Model Training
model = SVC()
model.fit(x_train,y_train)

# Prediction & Evaluaation
y_pred = model.predict(x_test)
acc = accuracy_score(y_test,y_pred)
print(f"Accuracy: {acc:.1f}")

# Tune hyperparameters using GridSearchCV
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf', 'poly'], 'gamma': ['scale', 'auto']}
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(x_train, y_train)

# Get the best tuned classifier
best_model = grid_search.best_estimator_

# Train the tuned classifier on the entire dataset
best_model.fit(x_train, y_train)

# Save the trained classifier to a file
dump(best_model, 'tuned_svc_classifier.joblib')

Accuracy: 0.9


['tuned_svc_classifier.joblib']