Q1. What is the relationship between polynomial functions and kernel functions in machine learning
algorithms?


Polynomial functions can be used as kernel functions in machine learning algorithms, particularly in Support Vector Machines (SVMs), to map data into higher-dimensional spaces, enabling nonlinear classification by finding hyperplanes in transformed feature spaces.







Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

- Import the necessary libraries:
- Split the data into training and testing sets:
- Create an SVM model with a polynomial kernel and train it on the training data:
- Evaluate the model on the testing data:

Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

In Support Vector Regression (SVR), increasing the value of epsilon affects the number of support vectors as follows:

- Small Epsilon:
With a small epsilon value, SVR focuses on fitting the training data more tightly, leading to fewer support vectors. This can result in a more complex model with higher variance but potentially lower bias.
- Large Epsilon:
A larger epsilon allows for a wider margin around the predicted function, leading to more support vectors. This can result in a simpler model with lower variance but potentially higher bias.

Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter
affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works
and provide examples of when you might want to increase or decrease its value?

Sure, let's break down each parameter in Support Vector Regression (SVR) and how it affects the model's performance:

1. **Kernel Function**:
   - **Linear Kernel**: Suitable for linear relationships between features. Use when the data has a clear linear pattern.
   - **RBF (Radial Basis Function) Kernel**: Default choice, works well for non-linear data. Use when data is not linearly separable or has complex patterns.
   - **Polynomial Kernel**: Suitable for data with polynomial relationships. Use when the data shows polynomial trends.

2. **C Parameter**:
   - Controls the trade-off between smoothness of the decision boundary and fitting the training data. 
   - **Increase C**: Leads to a more complex model with fewer support vectors. Use when you want the model to fit the training data closely, potentially leading to overfitting.
   - **Decrease C**: Results in a simpler model with more support vectors. Use when you want to avoid overfitting and prioritize smoother predictions.

3. **Epsilon Parameter**:
   - Defines the margin of tolerance where no penalty is given to errors. It's crucial in determining the width of the tube around the predicted function.
   - **Increase Epsilon**: Allows for more errors to be tolerated, resulting in a wider tube. Use when you want the model to be less sensitive to small variations in the data.
   - **Decrease Epsilon**: Makes the model more sensitive to errors, resulting in a narrower tube. Use when you want the model to fit the training data tightly.

4. **Gamma Parameter**:
   - Defines the influence of a single training example. A higher gamma value means closer training points have higher influence.
   - **Increase Gamma**: Leads to a more complex decision boundary that can capture intricate patterns in the data. Use when the data is non-linear and requires a more detailed model.
   - **Decrease Gamma**: Results in a smoother decision boundary with a simpler model. Use when the data is simpler or when you want to avoid overfitting.

Examples:
- **Use Case 1 - High Complexity**:
  - Data: Non-linear and complex patterns.
  - Approach: RBF kernel with high gamma and C values for detailed modeling and tight fitting.
- **Use Case 2 - Avoid Overfitting**:
  - Data: Linear or simple relationships.
  - Approach: Linear kernel or low gamma and C values to avoid overfitting and prioritize generalization.

In summary, the choice of kernel function, C parameter, epsilon parameter, and gamma parameter in SVR depends on the data's complexity, desired model complexity, and the balance between fitting the training data and generalizing to new data. Adjust these parameters based on the specific characteristics of your dataset and the goals of your modeling task.

Q5. Assignment:
- Import the necessary libraries and load the dataseg
- Split the dataset into training and testing setZ
- Preprocess the data using any technique of your choice (e.g. scaling, normaliMationK
- Create an instance of the SVC classifier and train it on the training datW
- hse the trained classifier to predict the labels of the testing datW
- Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy,
precision, recall, F1-scoreK
- Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to
improve its performanc_
- Train the tuned classifier on the entire dataseg
- Save the trained classifier to a file for future use.

In [18]:
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
import pandas as pd

In [19]:
iris_data = load_iris()

# Convert features (X) to a DataFrame
df = pd.DataFrame(data=iris_data.data, columns=iris_data.feature_names)

# Add the target variable (y) to the DataFrame
df['target'] = iris_data.target

In [20]:
df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2
146,6.3,2.5,5.0,1.9,2
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2


In [21]:
# from sklearn.preprocessing import MinMaxScaler
# scaler = MinMaxScaler()
# df[['sex']] = scaler.fit_transform(df[['sex']])

In [22]:
X,y=load_iris(return_X_y=True )

In [23]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=.3, random_state=42)

In [24]:
from sklearn.svm import SVC
svc_classifier = SVC(kernel='linear', random_state=42)
svc_classifier.fit(X_train, y_train)

In [25]:
y_pred = svc_classifier.predict(X_test)
y_pred

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
       0])

In [26]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import f1_score

In [27]:
print("Accuracy Score:", accuracy_score(y_test, y_pred))
print("Precision Score:", precision_score(y_test, y_pred, average='macro'))
print("F1 Score:", f1_score(y_test, y_pred, average='macro'))

Accuracy Score: 1.0
Precision Score: 1.0
F1 Score: 1.0


In [28]:
param_grid = {
    'C': [0.1, 1, 10, 100],
    'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
    'gamma': ['scale', 'auto'],
}

In [29]:
svc_classifier = SVC(random_state=42)


In [31]:
from sklearn.model_selection import GridSearchCV
grid_search = GridSearchCV(estimator=svc_classifier, param_grid=param_grid, cv=5)


In [32]:
grid_search.fit(X_train, y_train)

In [33]:
best_params = grid_search.best_params_
print("Best Hyperparameters:", best_params)


Best Hyperparameters: {'C': 1, 'gamma': 'scale', 'kernel': 'poly'}


In [35]:
best_svc_classifier = SVC(['C': 1, 'gamma': 'scale', 'kernel': 'poly'], random_state=42)
best_svc_classifier.fit(X_train, y_train)


SyntaxError: invalid syntax (616912766.py, line 1)