# Implementing GridSearchCV for hyper-parameter tuning

**Dataset**: Heart Disease Dataset ([https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset/data](https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset/data))

## 1. Prerequisite

In [1]:
# Importing modules
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report

## 2. Data Loading

In [2]:
# Reading data
df = pd.read_csv("heart.csv")

# Feature matrix
x = df.drop("target", axis=1)

# Target label
y = df["target"]

## 3. Data Summary

In [3]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3,0
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2,0


## 4. Preprocessing

In [4]:
# Splitting training and validation set
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.3, random_state=42)

# Feature scaling
sds = StandardScaler()

x_train = sds.fit_transform(x_train)

x_val = sds.transform(x_val)

## 5. Hyper-parameter tuning

In [5]:
# Model
LR = LogisticRegression()

# Parameters
parameters = {'solver': ('lbfgs', 'liblinear'),
              'max_iter': [50, 100, 500]}

# Running grid search with 5 fold cross-validation
modelGS = GridSearchCV(LR, parameters, verbose=3, cv=5)

modelGS.fit(x_train, y_train)

Fitting 5 folds for each of 6 candidates, totalling 30 fits
[CV 1/5] END .........max_iter=50, solver=lbfgs;, score=0.826 total time=   0.0s
[CV 2/5] END .........max_iter=50, solver=lbfgs;, score=0.875 total time=   0.0s
[CV 3/5] END .........max_iter=50, solver=lbfgs;, score=0.881 total time=   0.0s
[CV 4/5] END .........max_iter=50, solver=lbfgs;, score=0.860 total time=   0.0s
[CV 5/5] END .........max_iter=50, solver=lbfgs;, score=0.797 total time=   0.0s
[CV 1/5] END .....max_iter=50, solver=liblinear;, score=0.826 total time=   0.0s
[CV 2/5] END .....max_iter=50, solver=liblinear;, score=0.875 total time=   0.0s
[CV 3/5] END .....max_iter=50, solver=liblinear;, score=0.881 total time=   0.0s
[CV 4/5] END .....max_iter=50, solver=liblinear;, score=0.860 total time=   0.0s
[CV 5/5] END .....max_iter=50, solver=liblinear;, score=0.797 total time=   0.0s
[CV 1/5] END ........max_iter=100, solver=lbfgs;, score=0.826 total time=   0.0s
[CV 2/5] END ........max_iter=100, solver=lbfgs;,

In [6]:
# Best parameters after hyper-parameter tuning
print(modelGS.best_params_)

{'max_iter': 50, 'solver': 'lbfgs'}


## 6. Training

In [7]:
# Model
model= LogisticRegression(max_iter=50, solver="lbfgs")

# Model fitting
model.fit(x_train, y_train)

## 7. Validation

In [8]:
y_pred = model.predict(x_val)

In [9]:
# Classification report
print(classification_report(y_val, y_pred))

              precision    recall  f1-score   support

           0       0.86      0.75      0.80       159
           1       0.76      0.87      0.81       149

    accuracy                           0.81       308
   macro avg       0.81      0.81      0.80       308
weighted avg       0.81      0.81      0.80       308

