### **2.6 Hyperparameter Tuning**
Steps:
1. Use GridSearchCV & RandomizedSearchCV to optimize model hyperparameters.
2. Compare optimized models with baseline performance.

Deliverable: Best performing model with optimized hyperparameters

#### **Using RandomizedSearchCV for Random Forest and GridSearchCV for the Rest**

In [129]:
# --- Define parameter grids for each model ---
param_grids = {
    "Logistic Regression": {
        "C": [0.01, 0.1, 1, 10, 100],
        "penalty": ["l1", "l2", "elasticnet", None],
        "solver": ["liblinear", "saga"],
        "class_weight": [None, "balanced"],
    },
    "Decision Tree": {
        "max_depth": [None, 5, 10, 20],
        "min_samples_split": [2, 5, 10],
        "min_samples_leaf": [1, 2, 4, 5],
        "criterion": ["gini", "entropy"],
        "class_weight": [None, "balanced"]
    },
    "Random Forest": {
        "n_estimators": [100, 200, 300, 500],
        "max_depth": [None, 8, 10, 20],
        "min_samples_split": [2, 5, 10],
        "min_samples_leaf": [1, 2, 4, 5],
        "max_features": ["sqrt", "log2"],
        "bootstrap": [True, False],
        "class_weight": [None, "balanced"]
    },
    "SVM": {
        "C": [0.01, 0.1, 1, 10, 100],
        "gamma": [0.01, 0.1, 1, "scale", "auto"],
        "kernel": ["linear", "rbf", "poly"],
        "class_weight": [None, "balanced"],
    }
}

# --- Run GridSearchCV / RandomizedSearchCV ---
best_models = {}
results_tuned = []

for name, model in fitted_models.items():
    print(f"🔎 Tuning {name}...")
    
    # Choose search strategy (Grid or Random)
    if name == "Random Forest":
        search = RandomizedSearchCV(
            model, param_distributions=param_grids[name], 
            n_iter=20, cv=5, scoring="accuracy", n_jobs=-1, random_state=42
        )
    else:
        search = GridSearchCV(
            model, param_grid=param_grids[name],
            cv=5, scoring="accuracy", n_jobs=-1
        )
    
    # Scale for LR and SVM
    X_eval = X_train_scaled if name in ["Logistic Regression", "SVM"] else X_train
    search.fit(X_eval, y_train)
    
    best_models[name] = search.best_estimator_
    
    # Evaluate on test set
    X_test_eval = X_test_scaled if name in ["Logistic Regression", "SVM"] else X_test
    y_pred = best_models[name].predict(X_test_eval)
    
    acc = accuracy_score(y_test, y_pred)
    prec = precision_score(y_test, y_pred, average="weighted")
    rec = recall_score(y_test, y_pred, average="weighted")
    f1 = f1_score(y_test, y_pred, average="weighted")
    
    results_tuned.append([name, acc, prec, rec, f1, search.best_params_])

# --- Collect results ---
results_tuned_df = pd.DataFrame(
    results_tuned,
    columns=["Model","Accuracy","Precision","Recall","F1-score","Best Params"]
)

# Show full params without truncation
pd.set_option("display.max_colwidth", None)

results_tuned_df


🔎 Tuning Logistic Regression...
🔎 Tuning Decision Tree...
🔎 Tuning Random Forest...
🔎 Tuning SVM...


Unnamed: 0,Model,Accuracy,Precision,Recall,F1-score,Best Params
0,Logistic Regression,0.901639,0.903896,0.901639,0.901798,"{'C': 0.01, 'class_weight': None, 'penalty': 'l2', 'solver': 'saga'}"
1,Decision Tree,0.852459,0.852441,0.852459,0.852219,"{'class_weight': None, 'criterion': 'gini', 'max_depth': None, 'min_samples_leaf': 2, 'min_samples_split': 5}"
2,Random Forest,0.934426,0.936647,0.934426,0.934532,"{'n_estimators': 100, 'min_samples_split': 2, 'min_samples_leaf': 2, 'max_features': 'log2', 'max_depth': 8, 'class_weight': None, 'bootstrap': False}"
3,SVM,0.901639,0.901639,0.901639,0.901639,"{'C': 0.01, 'class_weight': None, 'gamma': 0.01, 'kernel': 'linear'}"


#### **Classification Report for Random Forest**

In [100]:
from sklearn.metrics import classification_report

# --- Select best Random Forest model ---
best_rf = best_models["Random Forest"]

print("\nBest Model Selected: Random Forest")
print(best_rf)

# --- Predict on test set ---
y_pred_rf = best_rf.predict(X_test)

# --- Evaluation ---
print("\nClassification Report (Random Forest):")
print(classification_report(y_test, y_pred_rf))


Best Model Selected: Random Forest
RandomForestClassifier(bootstrap=False, max_depth=8, max_features='log2',
                       min_samples_leaf=2, random_state=42)

Classification Report (Random Forest):
              precision    recall  f1-score   support

           0       0.97      0.91      0.94        33
           1       0.90      0.96      0.93        28

    accuracy                           0.93        61
   macro avg       0.93      0.94      0.93        61
weighted avg       0.94      0.93      0.93        61

