#Hyperparameter Tuning
Hyperparameter tuning is the process of optimizing the parameters of a machine learning model that are not learned from the data but set prior to training. These parameters greatly affect the model's performance and generalization ability.

In this section, we use techniques such as Grid Search Cross-Validation to systematically search for the best combination of hyperparameters for models like Decision Trees and Random Forests. This helps improve prediction accuracy and model robustness.

Load Data and Prepare for Modeling
In this step, we:

Load the preprocessed and scaled dataset from a CSV file.

Separate the features (X) from the target variable (y).

Prepare the data for further modeling and hyperparameter tuning.

This sets up the data so that machine learning models can be trained and evaluated efficiently.

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC

df = pd.read_csv('/content/scaled_selected_features.csv')

X = df.drop('13', axis=1)
y = df['13']

Hyperparameter Tuning for Logistic Regression
In this step, we:

Split the dataset into training and testing sets.

Scale the features using StandardScaler for better model performance.

Define a parameter grid to tune the regularization strength C and the solver method.

Use GridSearchCV with 5-fold cross-validation to find the best hyperparameters for Logistic Regression.

Print out the best parameters and the best cross-validation accuracy score.

This process helps us find the most effective model settings to improve prediction accuracy.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


param_grid_lr = {
    'C': [0.01, 0.1, 1, 10],
    'solver': ['lbfgs']
}

grid_lr = GridSearchCV(LogisticRegression(), param_grid_lr, cv=5, scoring='accuracy')
grid_lr.fit(X_train, y_train)

print("🔶 Logistic Regression Best Parameters:", grid_lr.best_params_)
print("🔶 Logistic Regression Best CV Score:", grid_lr.best_score_)

🔶 Logistic Regression Best Parameters: {'C': 1, 'solver': 'lbfgs'}
🔶 Logistic Regression Best CV Score: 0.6073129251700681


Hyperparameter Tuning for Decision Tree Classifier
In this step, we:

Define a parameter grid to tune the decision tree’s maximum depth and the minimum number of samples required to split a node.

Use GridSearchCV with 5-fold cross-validation to search for the best combination of hyperparameters.

Fit the model on the training data and evaluate each parameter set.

Print the best parameters found and the corresponding cross-validation accuracy score.

This tuning improves the model's generalization and prevents overfitting or underfitting.

In [None]:
param_grid_dt = {
    'max_depth': [None, 5, 10, 20],
    'min_samples_split': [2, 5, 10]
}

grid_dt = GridSearchCV(DecisionTreeClassifier(), param_grid_dt, cv=5, scoring='accuracy')
grid_dt.fit(X_train, y_train)

print("🔶 Decision Tree Best Parameters:", grid_dt.best_params_)
print("🔶 Decision Tree Best CV Score:", grid_dt.best_score_)

🔶 Decision Tree Best Parameters: {'max_depth': 5, 'min_samples_split': 5}
🔶 Decision Tree Best CV Score: 0.5699829931972789


Hyperparameter Tuning for Random Forest Classifier
In this step, we:

Define a grid of hyperparameters to tune:

Number of trees (n_estimators)

Maximum depth of each tree (max_depth)

Minimum number of samples required to split a node (min_samples_split)

Use GridSearchCV with 5-fold cross-validation to exhaustively search for the best combination.

Fit the random forest model on the training data for each hyperparameter combination.

Output the best hyperparameters and their corresponding mean cross-validation accuracy.

This process helps improve the model's predictive performance by selecting optimal parameter values.

In [None]:
param_grid_rf = {
    'n_estimators': [100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5]
}

grid_rf = GridSearchCV(RandomForestClassifier(), param_grid_rf, cv=5, scoring='accuracy')
grid_rf.fit(X_train, y_train)

print("🔶 Random Forest Best Parameters:", grid_rf.best_params_)
print("🔶 Random Forest Best CV Score:", grid_rf.best_score_)

🔶 Random Forest Best Parameters: {'max_depth': 10, 'min_samples_split': 2, 'n_estimators': 100}
🔶 Random Forest Best CV Score: 0.6279761904761905


Hyperparameter Tuning for Support Vector Machine (SVM)
In this section, we:

Define a grid of hyperparameters to tune:

Regularization parameter C controls the trade-off between achieving a low training error and a low testing error.

Kernel type, choosing between 'linear' and 'rbf' (Radial Basis Function), which affects how the data is transformed.

Use GridSearchCV with 5-fold cross-validation to test all combinations of these parameters.

Fit the SVM model on the training data with each parameter combination.

Print out the best hyperparameters found and the corresponding cross-validation accuracy score.

This tuning helps optimize the SVM’s ability to classify the data effectively.

In [None]:
param_grid_svm = {
    'C': [0.1, 1, 10],
    'kernel': ['linear', 'rbf']
}

grid_svm = GridSearchCV(SVC(), param_grid_svm, cv=5, scoring='accuracy')
grid_svm.fit(X_train, y_train)

print("🔶 SVM Best Parameters:", grid_svm.best_params_)
print("🔶 SVM Best CV Score:", grid_svm.best_score_)

🔶 SVM Best Parameters: {'C': 1, 'kernel': 'rbf'}
🔶 SVM Best CV Score: 0.6201530612244898


Summary of Best Models and Their Performance
This code prints a concise summary of the best hyperparameters and cross-validation accuracy scores found for each of the four classification models:

Logistic Regression

Decision Tree

Random Forest

Support Vector Machine (SVM)

It helps quickly compare which model and parameter settings performed best during tuning.

In [None]:
print("\n✅ Summary of Best Models:")
print("Logistic Regression:", grid_lr.best_params_, " | Accuracy:", grid_lr.best_score_)
print("Decision Tree:", grid_dt.best_params_, " | Accuracy:", grid_dt.best_score_)
print("Random Forest:", grid_rf.best_params_, " | Accuracy:", grid_rf.best_score_)
print("SVM:", grid_svm.best_params_, " | Accuracy:", grid_svm.best_score_)


✅ Summary of Best Models:
Logistic Regression: {'C': 1, 'solver': 'lbfgs'}  | Accuracy: 0.6073129251700681
Decision Tree: {'max_depth': 5, 'min_samples_split': 5}  | Accuracy: 0.5699829931972789
Random Forest: {'max_depth': 10, 'min_samples_split': 2, 'n_estimators': 100}  | Accuracy: 0.6279761904761905
SVM: {'C': 1, 'kernel': 'rbf'}  | Accuracy: 0.6201530612244898
