### LinearRegression: Ridge and Lasso

### House-Price-Prediction


In [822]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [823]:
csv_file_path = "boston_housing.csv"
dataset = pd.read_csv(csv_file_path)

In [824]:
dataset.head()


Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,Price
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


### Note : 
- Independent_features (X) :   "crim-zn-indus-chas-nox-rm-age-dis-rad-tax-ptratio-b-lstat-Price"
- Dependent_variable (Y)   :   "0-1-2-3-4-5-6-7-8-9"

In [825]:
dataset.columns

Index(['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',
       'ptratio', 'b', 'lstat', 'Price'],
      dtype='object')

In [826]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   crim     506 non-null    float64
 1   zn       506 non-null    float64
 2   indus    506 non-null    float64
 3   chas     506 non-null    int64  
 4   nox      506 non-null    float64
 5   rm       506 non-null    float64
 6   age      506 non-null    float64
 7   dis      506 non-null    float64
 8   rad      506 non-null    int64  
 9   tax      506 non-null    int64  
 10  ptratio  506 non-null    float64
 11  b        506 non-null    float64
 12  lstat    506 non-null    float64
 13  Price    506 non-null    float64
dtypes: float64(11), int64(3)
memory usage: 55.5 KB


In [827]:
dataset.describe()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,Price
count,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0
mean,3.613524,11.363636,11.136779,0.06917,0.554695,6.284634,68.574901,3.795043,9.549407,408.237154,18.455534,356.674032,12.653063,22.532806
std,8.601545,23.322453,6.860353,0.253994,0.115878,0.702617,28.148861,2.10571,8.707259,168.537116,2.164946,91.294864,7.141062,9.197104
min,0.00632,0.0,0.46,0.0,0.385,3.561,2.9,1.1296,1.0,187.0,12.6,0.32,1.73,5.0
25%,0.082045,0.0,5.19,0.0,0.449,5.8855,45.025,2.100175,4.0,279.0,17.4,375.3775,6.95,17.025
50%,0.25651,0.0,9.69,0.0,0.538,6.2085,77.5,3.20745,5.0,330.0,19.05,391.44,11.36,21.2
75%,3.677083,12.5,18.1,0.0,0.624,6.6235,94.075,5.188425,24.0,666.0,20.2,396.225,16.955,25.0
max,88.9762,100.0,27.74,1.0,0.871,8.78,100.0,12.1265,24.0,711.0,22.0,396.9,37.97,50.0


In [828]:
##  Extract the independent features (X) using iloc
##  which selects all rows (:) and all columns except the last one (:-1).

x = dataset.iloc[:,:-1]                                         ## independent_features_x

## Extract the dependent variable (y) using iloc
## which selects all rows (:) and only the last column (-1)

y = dataset.iloc[:, -1]                                         ## dependent_variable_y

![image-2.png](attachment:image-2.png)

In [829]:
x.head()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33


In [830]:
y.head()

0    24.0
1    21.6
2    34.7
3    33.4
4    36.2
Name: Price, dtype: float64

In [831]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
lin_reg = LinearRegression()

## calculates the mean squared error (MSE) for a linear regression model using cross-validation.
## cv=5: In this case, it is set to 5, meaning the dataset will be divided into 5 folds for cross-validation.
mse=cross_val_score(lin_reg ,  x ,  y , scoring='neg_mean_squared_error' , cv=5) 
mean_mse=np.mean(mse)
print(mean_mse)

-37.13180746769923


### Ridge Regression

- https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html

#### Parameters:
- class sklearn.linear_model.Ridge ( ):
    - alpha=1.0
    - fit_intercept=True
    - copy_X=True
    - max_iter=None
    - tol=0.0001
    - solver='auto'
    - positive=False
    - random_state=None

#### Attributes:
- class sklearn.linear_model.Ridge ( ):
    - coef_
    - intercept_
    - n_iter_
    - n_features_in_
    - feature_names_in_  
 
#### Methods :
- class sklearn.linear_model.Ridge ( ):
    - fit(X, y[, sample_weight]) : Fit Ridge regression model.
    - get_params([deep]) : Get parameters for this estimator.
    - predict(X) : Predict using the linear model.
    - score(X, y[, sample_weight]) : Return the coefficient of determination of the prediction.
    - set_params(**params) : Set the parameters of this estimator.

In [832]:
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
ridge = Ridge()

params = {'alpha' :[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,20,30,35,40,45,50,55,100]}
ridge_regressor = GridSearchCV(ridge,params,scoring = "neg_mean_squared_error",cv=10)
ridge_regressor.fit(x , y)


In [833]:
ridge_regressor.best_params_

{'alpha': 100}

In [834]:
print(ridge_regressor.best_params_)
print(ridge_regressor.best_score_)

{'alpha': 100}
-29.615220097335133


### Compare two result dataframes :

- For Linear Regression result ==== -37.13180746769923  ( We selected Linear Regression)
    - performance is good .
    
- For Ridge result ================ -29.615220097335133
    - reduce a over-fitting
    - performance is not good .


### Lasso Regression

- https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html

#### Parameters:
- class sklearn.linear_model.Ridge ( ):
    - alpha : float, default=1.0
    - fit_intercept : bool, default=True
    - copy_X : bool, default=True
    - max_iter : int, default=1000
    - tol : float, default=1e-4
    - precompute : bool or array-like of shape (n_features, n_features), default=False
    - positive : bool, default=False
    - selection{‘cyclic’, ‘random’}, default=’cyclic’
    - random_state : int, RandomState instance, default=None
    - warm_start : bool, default=False

#### Attributes:
- class sklearn.linear_model.Ridge ( ):
    - coef_ : ndarray of shape (n_features,) or (n_targets, n_features)
    - dual_gap_ : float or ndarray of shape (n_targets,) 
    - sparse_coef_ : sparse matrix of shape (n_features, 1) or (n_targets, n_features)
    - intercept_ : float or ndarray of shape (n_targets,)
    - n_iter_ : int or list of int
    - n_features_in_ : int
    - feature_names_in_ : ndarray of shape (n_features_in_,)
 
#### Methods :
- class sklearn.linear_model.Ridge ( ):
    - fit(X, y[, sample_weight]) : Fit Ridge regression model.
    - get_params([deep]) : Get parameters for this estimator.
    - predict(X) : Predict using the linear model.
    - score(X, y[, sample_weight]) : Return the coefficient of determination of the prediction.
    - set_params(**params) : Set the parameters of this estimator.
    - path(X, y, *[, l1_ratio, eps, n_alphas, ...]) : Compute elastic net path with coordinate descent.

In [835]:
from sklearn.linear_model import Lasso 
from sklearn.model_selection import GridSearchCV
lasso = Lasso()

params = {'alpha' :[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,20,30,35,40,45,50,55,100]}   ## For alpha add more data (30,35,40,45,50,55,100)
lasso_regressor = GridSearchCV(lasso,params,scoring = "neg_mean_squared_error",cv=10)
lasso_regressor.fit( x , y )


  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(


In [836]:
lasso_regressor.best_params_

{'alpha': 0.01}

In [837]:
print(lasso_regressor.best_params_)
print(lasso_regressor.best_score_)

{'alpha': 0.01}
-34.45554381307912


### train_test_split

- https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

- sklearn.model_selection.train_test_split (*arrays, test_size=None, train_size=None, random_state=None, shuffle=True, stratify=None)


In [838]:
x.head()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33


### Compare three result  :

- For Linear Regression result-----> -37.13180746769923  
    
- For Ridge result------------ ----> -29.615220097335133 

- For Lasso result-----------------> -34.45554381307912


### train_test_split:

- https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

In [839]:

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)

In [840]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
lin_reg = LinearRegression()
lin_reg.fit(x_train , y_train )   
mse=cross_val_score(lin_reg , x_train , y_train , scoring='neg_mean_squared_error' , cv=5) 
mean_mse=np.mean(mse)
print(mean_mse)

-25.18787473928514


In [841]:
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
ridge = Ridge()

params = {'alpha' :[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,20,30,35,40,45,50,55,100]}
ridge_regressor = GridSearchCV(ridge,params,scoring = "neg_mean_squared_error",cv=10)
ridge_regressor.fit(x_train , y_train)


In [842]:
print(ridge_regressor.best_params_)
print(ridge_regressor.best_score_)

{'alpha': 0.01}
-25.472067363367742


In [843]:
from sklearn.linear_model import Lasso 
from sklearn.model_selection import GridSearchCV
lasso = Lasso()

params = {'alpha' :[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,20,30,35,40,45,50,55,100]}   ## For alpha add more data (30,35,40,45,50,55,100)
lasso_regressor = GridSearchCV(lasso,params,scoring = "neg_mean_squared_error",cv=10)
lasso_regressor.fit(x_train , y_train)

  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(


In [844]:
print(lasso_regressor.best_params_)
print(lasso_regressor.best_score_)

{'alpha': 1e-08}
-25.473094572833244


### Compare after train_test_split  :

- train_test_split-----------------> -25.18787473928514 ------>(The performance is better than other result because going to ZERO (0))
    
- For Ridge result------------ ----> -25.472067363367742

- For Lasso result-----------------> -25.473094572833244

In [845]:
lasso_regressor.predict(x_test)

array([28.53469457, 36.61870038, 15.63751051, 25.50144953, 18.70967356,
       23.16471553, 17.31011033, 14.0773636 , 23.01064349, 20.5422349 ,
       24.91632311, 18.41098048, -6.52079694, 21.83372577, 19.14903066,
       26.05873213, 20.30232607,  5.74943563, 40.33137805, 17.4579146 ,
       27.47486675, 30.21707564, 10.80555628, 23.8772175 , 17.99492226,
       16.02608761, 23.26828778, 14.36825218, 22.38116931, 19.30920694,
       22.17284558, 25.05925451, 25.13780726, 18.46730239, 16.60405678,
       17.46564111, 30.71367735, 20.05106816, 23.98977653, 24.94322399,
       13.97945361, 31.64706961, 42.48057194, 17.70042803, 26.92507866,
       17.15897728, 13.68918092, 26.14924236, 20.27823036, 29.99003508,
       21.21260346, 34.03649177, 15.41837559, 25.95781066, 39.13897287,
       22.9611842 , 18.8031058 , 33.07865363, 24.74384153, 12.83640948,
       22.41963416, 30.64804998, 31.5956712 , 16.34088222, 20.95043064,
       16.70145827, 20.23215651, 26.1437865 , 31.12160899, 11.89

### sklearn.metrics.r2_score

- https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html

- - sklearn.metrics.r2_score(y_true, y_pred, *, sample_weight=None, multioutput='uniform_average', force_finite=True)

#### Parameters:
- class sklearn.metrics.r2_score:
    - y_true : array-like of shape (n_samples,) or (n_samples, n_outputs)
    - y_pred : array-like of shape (n_samples,) or (n_samples, n_outputs)
    - sample_weight : array-like of shape (n_samples,), default=None
    - multioutput : {‘raw_values’, ‘uniform_average’, ‘variance_weighted’}, array-like of shape (n_outputs,) 
    - raw_values :
    - uniform_average :
    - variance_weighted :
    - force_finitebool, default=True

#### Returns:
- class sklearn.metrics.r2_score:
    - z : float or ndarray of floats

### Lasso :

In [846]:
y_pred = lasso_regressor.predict(x_test)
from sklearn.metrics import r2_score


r2_score1=r2_score(y_pred, y_test)
r2_score_percentage = r2_score1 * 100

In [847]:
print("R-squared score: {:.2f}%".format(r2_score_percentage))

R-squared score: 67.10%


### Ridge :

In [848]:
y_pred = ridge_regressor.predict(x_test)
from sklearn.metrics import r2_score


r2_score1=r2_score(y_pred, y_test)
r2_score_percentage = r2_score1 * 100

In [849]:
print("R-squared score: {:.2f}%".format(r2_score_percentage))

R-squared score: 67.09%


### Liner Regression :

In [850]:
y_pred = lin_reg.predict(x_test)
from sklearn.metrics import r2_score


r2_score1=r2_score(y_pred, y_test)
r2_score_percentage = r2_score1 * 100

In [851]:
print("R-squared score: {:.2f}%".format(r2_score_percentage))

R-squared score: 67.10%


### Compare after sklearn.metrics.r2_score :

- For linear regression result-----------------> 0.6709558976744432

- For Ridge result ----------------------------> 0.6708743257533069

- For Lasso result ----------------------------> 0.6709558959121945

### Logistic Regression :

- https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
- https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html



#### Parameters:
- class sklearn.metrics.r2_score:
    - penalty : {‘l1’, ‘l2’, ‘elasticnet’, None}, default=’l2’
    - dual : bool, default=False 
    - tol : float, default=1e-4 
    - C : float, default=1.0
    - fit_intercept : bool, default=True
    - intercept_scaling : float, default=1 
    - class_weight : dict or ‘balanced’, default=None
    - random_state : int, RandomState instance, default=None
    - solver : {‘lbfgs’, ‘liblinear’, ‘newton-cg’, ‘newton-cholesky’, ‘sag’, ‘saga’}, default=’lbfgs’ 
    - And ...........

#### Attributes:
- classes_ : ndarray of shape (n_classes, ) 
- coef_ : ndarray of shape (1, n_features) or (n_classes, n_features)
- intercept_ : ndarray of shape (1,) or (n_classes,)
- n_features_in_ : int
- And ..........


### Notes (load_breast_cancer)
-----
- Data Set Characteristics:
    - :Number of Instances: 569

    - :Number of Attributes: 31 numeric, predictive attributes and the class

    - :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry 
        - fractal dimension ("coastline approximation" - 1)


In [852]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression


df = pd.read_csv('breast_cancer.csv')

In [853]:
df.head()

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,target
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,1
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,1
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,1
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,1
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,1


In [866]:
df.keys()

Index(['radius_mean', 'texture_mean', 'perimeter_mean', 'area_mean',
       'smoothness_mean', 'compactness_mean', 'concavity_mean',
       'concave points_mean', 'symmetry_mean', 'fractal_dimension_mean',
       'radius_se', 'texture_se', 'perimeter_se', 'area_se', 'smoothness_se',
       'compactness_se', 'concavity_se', 'concave points_se', 'symmetry_se',
       'fractal_dimension_se', 'radius_worst', 'texture_worst',
       'perimeter_worst', 'area_worst', 'smoothness_worst',
       'compactness_worst', 'concavity_worst', 'concave points_worst',
       'symmetry_worst', 'fractal_dimension_worst', 'target'],
      dtype='object')

In [893]:
## Check the data types of each column
df.dtypes

radius_mean                float64
texture_mean               float64
perimeter_mean             float64
area_mean                  float64
smoothness_mean            float64
compactness_mean           float64
concavity_mean             float64
concave points_mean        float64
symmetry_mean              float64
fractal_dimension_mean     float64
radius_se                  float64
texture_se                 float64
perimeter_se               float64
area_se                    float64
smoothness_se              float64
compactness_se             float64
concavity_se               float64
concave points_se          float64
symmetry_se                float64
fractal_dimension_se       float64
radius_worst               float64
texture_worst              float64
perimeter_worst            float64
area_worst                 float64
smoothness_worst           float64
compactness_worst          float64
concavity_worst            float64
concave points_worst       float64
symmetry_worst      

In [894]:
df.describe()

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,target
count,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,...,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0
mean,14.127292,19.289649,91.969033,654.889104,0.09636,0.104341,0.088799,0.048919,0.181162,0.062798,...,25.677223,107.261213,880.583128,0.132369,0.254265,0.272188,0.114606,0.290076,0.083946,0.372583
std,3.524049,4.301036,24.298981,351.914129,0.014064,0.052813,0.07972,0.038803,0.027414,0.00706,...,6.146258,33.602542,569.356993,0.022832,0.157336,0.208624,0.065732,0.061867,0.018061,0.483918
min,6.981,9.71,43.79,143.5,0.05263,0.01938,0.0,0.0,0.106,0.04996,...,12.02,50.41,185.2,0.07117,0.02729,0.0,0.0,0.1565,0.05504,0.0
25%,11.7,16.17,75.17,420.3,0.08637,0.06492,0.02956,0.02031,0.1619,0.0577,...,21.08,84.11,515.3,0.1166,0.1472,0.1145,0.06493,0.2504,0.07146,0.0
50%,13.37,18.84,86.24,551.1,0.09587,0.09263,0.06154,0.0335,0.1792,0.06154,...,25.41,97.66,686.5,0.1313,0.2119,0.2267,0.09993,0.2822,0.08004,0.0
75%,15.78,21.8,104.1,782.7,0.1053,0.1304,0.1307,0.074,0.1957,0.06612,...,29.72,125.4,1084.0,0.146,0.3391,0.3829,0.1614,0.3179,0.09208,1.0
max,28.11,39.28,188.5,2501.0,0.1634,0.3454,0.4268,0.2012,0.304,0.09744,...,49.54,251.2,4254.0,0.2226,1.058,1.252,0.291,0.6638,0.2075,1.0


In [908]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
from sklearn.preprocessing import StandardScaler

# Load the dataset
df = pd.read_csv('breast_cancer.csv')

# Independent features (X) and dependent variable (y)

x = pd.DataFrame(df.iloc[:, :-1].values, columns=df.columns[:-1])
## selecting all rows and all columns except the last column from the original DataFrame df

y = pd.DataFrame(df.iloc[:, -1].values, columns=['target'])


# Scale the features using StandardScaler
scaler = StandardScaler()
x_scaled = scaler.fit_transform(x)

# Split the scaled data into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x_scaled, y, test_size=0.33, random_state=42)

# Define the parameter grid for GridSearchCV
## This dictionary specifies the hyperparameter C for the logistic regression model.

params = [{'C': [1, 5, 10]}, {'max_iter': [100, 150]}]

# Create an instance of Logistic Regression
model_1 = LogisticRegression(C=100, max_iter=100)

# Perform grid search with cross-validation
model = GridSearchCV(model_1, param_grid=params, scoring='f1', cv=5)

# Fit the model on the training data
model.fit(x_train, y_train.values.ravel())  ## .values.ravel() is used to convert it into a flattened 1D array


# Print the best parameters and best score
print("Best Parameters: ", model.best_params_)
print("Best Score: ", model.best_score_)

# Make predictions on the test data
y_pred = model.predict(x_test)

# Calculate and print the accuracy score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: ", accuracy)

# Calculate and print the confusion matrix
confusion_mat = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(confusion_mat)

# Print the classification report
print("Classification Report:")
print(classification_report(y_test, y_pred))


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

Best Parameters:  {'C': 1}
Best Score:  0.9717726133515606
Accuracy:  0.9787234042553191
Confusion Matrix:
[[118   3]
 [  1  66]]
Classification Report:
              precision    recall  f1-score   support

           0       0.99      0.98      0.98       121
           1       0.96      0.99      0.97        67

    accuracy                           0.98       188
   macro avg       0.97      0.98      0.98       188
weighted avg       0.98      0.98      0.98       188



### Result : 

- This table summarizes the best parameters, best score, accuracy, precision, recall, F1-score, and support for each class based on the best performance achieved.

![image-2.png](attachment:image-2.png)

- "support" refers to the number of samples or instances in the dataset that belong to a particular class. It represents the number of observations that are classified as a specific class.

- In the classification report you provided, the "support" column represents the number of instances for each class. In your case, it shows that there are 121 instances of class 0 and 67 instances of class 1. The support values help provide additional context and understanding of the classification performance by indicating the distribution of instances across different classes.
