## HyperParamater Tuning Of ML Model 

In [119]:
import pandas as pd
import numpy as np 
dataset=pd.read_csv('diabetes.csv' , header=None)
dataset.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
1,6,148,72,35,0,33.6,0.627,50,1
2,1,85,66,29,0,26.6,0.351,31,0
3,8,183,64,0,0,23.3,0.672,32,1
4,1,89,66,23,94,28.1,0.167,21,0


In [None]:
numeric_cols = [0,1,2,3,4,5,6,7,8]
dataset[numeric_cols] = dataset[numeric_cols].apply(pd.to_numeric, errors='coerce')
print(dataset.info())  


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 769 entries, 0 to 768
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       768 non-null    float64
 1   1       768 non-null    float64
 2   2       768 non-null    float64
 3   3       768 non-null    float64
 4   4       768 non-null    float64
 5   5       768 non-null    float64
 6   6       768 non-null    float64
 7   7       768 non-null    float64
 8   8       768 non-null    float64
dtypes: float64(9)
memory usage: 54.2 KB
None


In [121]:
dataset.isnull().sum()

0    1
1    1
2    1
3    1
4    1
5    1
6    1
7    1
8    1
dtype: int64

In [122]:
dataset.fillna(dataset.median(), inplace=True)


In [123]:
dataset.isnull().sum()

0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    0
dtype: int64

### we have cleaned the data now moving towards model implementation 

In [124]:
values=dataset.values
X=values[:,0:8]
y=values[:8]


In [125]:
y = dataset.iloc[:, 8].values.ravel()

In [None]:
from sklearn.linear_model import LogisticRegression
logReg=LogisticRegression(penalty='l1',max_iter=210 , solver='liblinear', dual=False)


In [266]:
print(y.shape)

(769,)


In [267]:
logReg.fit(X,y)

In [268]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=210, multi_class='ovr', n_jobs=1,
          penalty='l1', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)


In [269]:
logReg.score(X,y)

0.7802340702210663

#### our model performed well with the accuracy of 78 percent while traning now we will use cross-validation to check our model's real world performance.

In [270]:
from sklearn.model_selection import KFold , cross_val_score

In [271]:
kfold_cross_validator=KFold(n_splits=3,shuffle=True,random_state=42)

In [272]:
results=cross_val_score(logReg,X,y,cv=kfold_cross_validator,scoring='accuracy')
print(results.mean())

0.7776528047989624


#### with the cross validation on the testing data our model's accuracy decreased a bit , now we will fix this by apply hyperparamater tuning

In [273]:
from sklearn.model_selection import RandomizedSearchCV
import time
from scipy.stats import loguniform

In [274]:
param_dist = {
    'C': loguniform(0.0001, 10),  # Ensure a wider range for better tuning
    'penalty': ['l1'],  
    'solver': ['liblinear'],  
    'max_iter': [100, 200, 300, 400, 500],  # Allow more iterations
    'dual': [False]  # Remove True (since L1 with dual=True is not supported)
}

In [None]:
random_search = RandomizedSearchCV(
                                   estimator=logReg, 
                                   param_distributions=param_dist, 
                                   n_iter=20,  
                                   cv=10,  # 5-fold cross-validation
                                   scoring='accuracy',
                                   n_jobs=-1,  # Use all CPU cores
                                   random_state=42)

In [264]:
random_search.fit(X, y)

# Print Best Parameters & Score
print("Best Parameters:", random_search.best_params_)
print("Best Accuracy:", random_search.best_score_)

Best Parameters: {'C': 1.9938938493554557, 'dual': False, 'max_iter': 500, 'penalty': 'l1', 'solver': 'liblinear'}
Best Accuracy: 0.7738038277511962


#### After tuning the model paramater with RandomSearchCV , got a nice accuracy of 77 percent.