# What is Parameter Tuning (Hyperparameter Tuning)?
Parameter tuning is the process of finding the best set of parameters (called hyperparameters) for a machine learning model to improve its performance.

In Scikit-learn models (like LogisticRegression), parameters such as C, penalty, or solver control how the model learns. Choosing the right values can significantly improve accuracy, precision, and generalization.


![image.png](attachment:image.png)

# Load Dataset

In [2]:
import pandas as pd 
df = pd.read_csv("binary_classification_sample.csv")
df

Unnamed: 0,Age,Salary,Experience,Gender,Department,Education,LocationScore,Purchased
0,56,51905.183591,27,Female,HR,Bachelors,67.964728,0
1,69,31258.344158,16,Female,Engineering,High School,21.825389,0
2,46,79176.734217,4,Male,HR,PhD,94.996118,0
3,32,47699.953137,4,Male,Engineering,High School,78.634501,1
4,60,36395.191619,5,Male,Marketing,High School,8.941100,1
...,...,...,...,...,...,...,...,...
195,69,69228.805705,10,Female,Engineering,Bachelors,77.985099,1
196,30,49573.678136,14,Female,Marketing,Bachelors,3.961883,1
197,58,24253.633311,27,Female,HR,High School,48.050695,0
198,20,,12,Female,Sales,High School,,0


# Data Preprocessing and Cleaning

In [3]:
df.dropna(inplace=True)

# Encoding

In [4]:
from sklearn.preprocessing import OrdinalEncoder

encoder = OrdinalEncoder()

df['Gender'] = encoder.fit_transform(df[['Gender']]).astype(int)
df['Education'] = encoder.fit_transform(df[['Education']]).astype(int)
df['Department'] = encoder.fit_transform(df[['Department']]).astype(int)
df

Unnamed: 0,Age,Salary,Experience,Gender,Department,Education,LocationScore,Purchased
0,56,51905.183591,27,0,1,0,67.964728,0
1,69,31258.344158,16,0,0,1,21.825389,0
2,46,79176.734217,4,1,1,3,94.996118,0
3,32,47699.953137,4,1,0,1,78.634501,1
4,60,36395.191619,5,1,2,1,8.941100,1
...,...,...,...,...,...,...,...,...
194,44,33125.723933,13,1,3,3,86.012240,1
195,69,69228.805705,10,0,0,0,77.985099,1
196,30,49573.678136,14,0,2,0,3.961883,1
197,58,24253.633311,27,0,1,1,48.050695,0


# Train test split

In [5]:
from sklearn.model_selection import train_test_split

X = df.drop('Purchased', axis=1)
y = df['Purchased']


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Logistic Regression Model with parameter tuning

![image.png](attachment:image.png)

In [11]:
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Define the model
lg = LogisticRegression()

# Define the parameter grid
param_grid = {
    'C': [0.1, 1, 10],  # Regularization strength (smaller = stronger regularization, prevents overfitting)
    
    'solver': ['liblinear', 'saga'],  # Algorithm to use in optimization:
                                      # 'liblinear' = good for small datasets, supports L1 & L2
                                      # 'saga' = good for large datasets, supports L1 & L2 and works with multinomial loss

    'penalty': ['l1', 'l2']  # Type of regularization:
                             # 'l1' = Lasso (sparse features, can zero out some weights)
                             # 'l2' = Ridge (keeps all weights small but not zero)
}


# Grid search with 5-fold cross-validation
grid = GridSearchCV(lg, param_grid, cv=5)
grid.fit(X_train, y_train)

# Best model prediction
y_pred = grid.predict(X_test)

# Accuracy
print("Best Parameters:", grid.best_params_)
print("Accuracy:", accuracy_score(y_test, y_pred))




Best Parameters: {'C': 0.1, 'penalty': 'l1', 'solver': 'saga'}
Accuracy: 0.5




# What is GridSearchCV?

GridSearchCV is a function in Scikit-learn used to automatically find the best hyperparameters for your machine learning model.

How it works:

It tries all combinations of the hyperparameters you provide in a grid (like C, kernel, max_depth, etc.).

It uses cross-validation to evaluate each combination.

Finally, it selects the combination that gives the best performance (e.g., highest accuracy).


![image.png](attachment:image.png)



# What is cv=5 in GridSearchCV?

cv=5 means 5-fold cross-validation.

Explanation:

The training dataset is split into 5 equal parts (folds).

The model is trained on 4 folds and tested on the 1 remaining fold.

This process is repeated 5 times, each time using a different fold as the validation set.

The results are averaged to evaluate each hyperparameter combination.

# Parameter Tuning for Decision Tree

![image.png](attachment:image.png)

In [9]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score

# Define the model
dt = DecisionTreeClassifier(random_state=42)

# Define the parameter grid
param_grid = {
    'criterion': ['gini', 'entropy'],
    'max_depth': [3, 5, 10, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Grid search with 5-fold cross-validation
grid = GridSearchCV(dt, param_grid, cv=5)
grid.fit(X_train, y_train)

# Best model prediction
y_pred = grid.predict(X_test)

# Accuracy
print("Best Parameters:", grid.best_params_)
print("Accuracy:", accuracy_score(y_test, y_pred))


Best Parameters: {'criterion': 'gini', 'max_depth': 5, 'min_samples_leaf': 2, 'min_samples_split': 2}
Accuracy: 0.5294117647058824


# Parameter Tuning for Random Forest

![image.png](attachment:image.png)

In [13]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score

# Define the model
rf = RandomForestClassifier(random_state=42)

# Define the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200],              # Number of trees
    'max_depth': [None, 10, 20],                 # Max depth of each tree
    'min_samples_split': [2, 5],                 # Min samples to split a node
    'min_samples_leaf': [1, 2],                  # Min samples at a leaf node
    'criterion': ['gini', 'entropy']             # Split criterion
}

# Grid search with 5-fold cross-validation
grid = GridSearchCV(rf, param_grid, cv=5)
grid.fit(X_train, y_train)

# Best model prediction
y_pred = grid.predict(X_test)

# Accuracy
print("Best Parameters:", grid.best_params_)
print("Accuracy:", accuracy_score(y_test, y_pred))


Best Parameters: {'criterion': 'entropy', 'max_depth': None, 'min_samples_leaf': 1, 'min_samples_split': 5, 'n_estimators': 100}
Accuracy: 0.5


#  Parameter Tuning for SVM (Support Vector Classifier)

![image.png](attachment:image.png)


In [None]:
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score

# Define the model
svm = SVC()

# Define the parameter grid
param_grid = {
    'C': [0.1, 1, 10],               # Regularization parameter
    'kernel': ['linear', 'rbf'],     # Kernel type
    'gamma': ['scale', 'auto']       # Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’
}

# Grid search with 5-fold cross-validation
grid = GridSearchCV(svm, param_grid, cv=5)
grid.fit(X_train, y_train)

# Best model prediction
y_pred = grid.predict(X_test)

# Accuracy
print("Best Parameters:", grid.best_params_)
print("Accuracy:", accuracy_score(y_test, y_pred))