<h2 style="text-align:center;">Support Vector Machine with RandomizedSearchCV</h2>

---

## 🔹 Introduction

- **GridSearchCV** tries *all* parameter combinations → very accurate but **computationally expensive**.  
- **RandomizedSearchCV** samples only a fixed number of parameter combinations.  
- It is faster, especially when:
  - Parameter space is very large.
  - We want a good-enough model quickly.  

In this notebook:
1. Train baseline SVM.  
2. Apply RandomizedSearchCV with parameter distributions.  
3. Compare performance vs GridSearch.  


In [1]:
# 📌 Import libraries
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, accuracy_score
from scipy.stats import uniform


In [2]:
# 📌 Load dataset
dataset = pd.read_csv("../data/Social_Network_Ads.csv")

X = dataset.iloc[:, 2:4].values   # Age, EstimatedSalary
y = dataset.iloc[:, -1].values    # Purchased

# Feature Scaling
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=0
)


In [3]:
# 📌 Baseline SVM (before tuning)
baseline_clf = SVC(kernel="rbf", random_state=0)
baseline_clf.fit(X_train, y_train)

y_pred_base = baseline_clf.predict(X_test)
base_acc = accuracy_score(y_test, y_pred_base)

print("Baseline Accuracy (default SVM): {:.2f} %".format(base_acc*100))  # ➤ Example: 93%


Baseline Accuracy (default SVM): 93.00 %


In [4]:
# 📌 Define parameter distributions for RandomizedSearchCV
param_dist = {
    "C": [0.1, 1, 10, 100, 1000],
    "kernel": ["linear", "rbf"],
    "gamma": uniform(0.1, 1.0)   # Continuous distribution between 0.1–1.1
}

# 📌 Apply RandomizedSearchCV with 20 iterations
random_search = RandomizedSearchCV(
    estimator=SVC(random_state=0),
    param_distributions=param_dist,
    n_iter=20,
    scoring="accuracy",
    cv=10,
    random_state=0,
    n_jobs=-1
)

random_search.fit(X_train, y_train)

best_accuracy = random_search.best_score_
best_params = random_search.best_params_

print("Best CV Accuracy (RandomizedSearchCV): {:.2f} %".format(best_accuracy*100))
print("Best Parameters:", best_params)


Best CV Accuracy (RandomizedSearchCV): 91.00 %
Best Parameters: {'C': 100, 'gamma': 0.9579456176227568, 'kernel': 'rbf'}


In [5]:
# 📌 Evaluate best model on Test set
best_model = random_search.best_estimator_
y_pred_best = best_model.predict(X_test)
best_acc = accuracy_score(y_test, y_pred_best)

print("Test Accuracy (Tuned Model): {:.2f} %".format(best_acc*100))


Test Accuracy (Tuned Model): 93.00 %


## 🔹 Summary

- **Baseline SVM (default)** → Accuracy = XX %  
- **GridSearchCV (Notebook 2)** → CV Accuracy = YY %, Test Accuracy = ZZ %  
- **RandomizedSearchCV** → CV Accuracy = AA %, Test Accuracy = BB %  

✅ RandomizedSearchCV is **faster** than GridSearch (only samples parameter space).  
✅ Performance is often very close to GridSearch.  
✅ Useful when we have limited time or very large hyperparameter ranges.  

👉 Next step (Notebook 4): Evaluate model performance with **ROC Curve & AUC**, to move beyond accuracy.
