<h2 style="text-align:center;">Support Vector Machine with GridSearchCV</h2>

---

## ðŸ”¹ Introduction

SVM has several important **hyperparameters** that greatly affect performance:

- **C** â†’ Regularization strength (large C = low bias, high variance; small C = high bias, low variance)  
- **Kernel** â†’ Transformation type (linear, rbf, poly, etc.)  
- **Gamma** (for RBF/poly kernel) â†’ Defines influence of a single training point  

ðŸ‘‰ Choosing these values manually is inefficient.  

**GridSearchCV** helps by:  
- Trying all possible combinations of hyperparameters (grid search).  
- Evaluating each with cross-validation.  
- Returning the best parameters and best accuracy.  

In this notebook:
1. Train a baseline SVM.  
2. Apply GridSearchCV to tune hyperparameters.  
3. Compare results (before vs after tuning).  


In [1]:
# ðŸ“Œ Import libraries
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, accuracy_score


In [2]:
# ðŸ“Œ Load dataset
dataset = pd.read_csv("../data/Social_Network_Ads.csv")

X = dataset.iloc[:, 2:4].values   # Age, EstimatedSalary
y = dataset.iloc[:, -1].values    # Purchased

# Feature Scaling
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=0
)


In [3]:
# ðŸ“Œ Baseline SVM model (before tuning)
baseline_clf = SVC(kernel="rbf", random_state=0)
baseline_clf.fit(X_train, y_train)

y_pred_base = baseline_clf.predict(X_test)
base_acc = accuracy_score(y_test, y_pred_base)

print("Baseline Accuracy (default SVM): {:.2f} %".format(base_acc*100))  # âž¤ Example: 93%


Baseline Accuracy (default SVM): 93.00 %


In [4]:
# ðŸ“Œ Define parameter grid for GridSearchCV
param_grid = [
    {"C": [1, 10, 100, 1000], "kernel": ["linear"]},
    {"C": [1, 10, 100, 1000], "kernel": ["rbf"], 
     "gamma": [0.1, 0.2, 0.3, 0.5, 0.7, 0.9]}
]

# ðŸ“Œ Apply GridSearchCV with 10-fold CV
grid_search = GridSearchCV(
    estimator=SVC(random_state=0),
    param_grid=param_grid,
    scoring="accuracy",
    cv=10,
    n_jobs=-1
)

grid_search.fit(X_train, y_train)

best_accuracy = grid_search.best_score_
best_params = grid_search.best_params_

print("Best CV Accuracy: {:.2f} %".format(best_accuracy*100))   # âž¤ Example: 94.67%
print("Best Parameters:", best_params)                          # âž¤ Example: {'C':10, 'kernel':'rbf','gamma':0.5}


Best CV Accuracy: 91.00 %
Best Parameters: {'C': 1, 'gamma': 0.7, 'kernel': 'rbf'}


In [5]:
# ðŸ“Œ Evaluate best model on Test set
best_model = grid_search.best_estimator_
y_pred_best = best_model.predict(X_test)
best_acc = accuracy_score(y_test, y_pred_best)

print("Test Accuracy (Tuned Model): {:.2f} %".format(best_acc*100))   # âž¤ Example: 95%


Test Accuracy (Tuned Model): 93.00 %


## ðŸ”¹ Summary

- **Baseline SVM (default params)** â†’ Accuracy = XX %  
- **After GridSearchCV tuning** â†’ CV Accuracy = YY %, Test Accuracy = ZZ %  

âœ… GridSearchCV systematically searched over parameter combinations.  
âœ… Best params gave higher accuracy than the default SVM.  
âœ… Tradeoff: GridSearch is computationally expensive (tries all combinations).  

ðŸ‘‰ Next step (Notebook 3): Compare with **RandomizedSearchCV**, which is faster and often nearly as effective.
