# <center>Heart Disease Prediction with Logistic Regression</center>

<p>This notebook explores the use of logistic regression for predicting the presence of heart disease based on various medical attributes. The dataset is preprocessed, and logistic regression models with different regularization parameters are trained and evaluated to predict the occurrence of heart disease.</p>

- Importing necessary libraries and modules


In [14]:
import pandas as pd
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

- Loading the heart disease dataset from a CSV file

In [4]:
data = pd.read_csv('heart.csv')
data

Unnamed: 0.1,Unnamed: 0,Age,Sex,ChestPain,RestBP,Chol,Fbs,RestECG,MaxHR,ExAng,Oldpeak,Slope,Ca,Thal,AHD
0,1,63,1,typical,145,233,1,2,150,0,2.3,3,0.0,fixed,No
1,2,67,1,asymptomatic,160,286,0,2,108,1,1.5,2,3.0,normal,Yes
2,3,67,1,asymptomatic,120,229,0,2,129,1,2.6,2,2.0,reversable,Yes
3,4,37,1,nonanginal,130,250,0,0,187,0,3.5,3,0.0,normal,No
4,5,41,0,nontypical,130,204,0,2,172,0,1.4,1,0.0,normal,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
298,299,45,1,typical,110,264,0,0,132,0,1.2,2,0.0,reversable,Yes
299,300,68,1,asymptomatic,144,193,1,0,141,0,3.4,2,2.0,reversable,Yes
300,301,57,1,asymptomatic,130,131,0,0,115,1,1.2,2,1.0,reversable,Yes
301,302,57,0,nontypical,130,236,0,2,174,0,0.0,2,1.0,normal,Yes


- Preprocessing the dataset by dropping unnecessary columns and converting categorical variables to numerical codes

In [5]:
data = data.drop(columns="Unnamed: 0")

In [6]:
data['ChestPain'] = data['ChestPain'].astype('category')
data['ChestPain'] = data['ChestPain'].cat.codes

data['Thal'] = data['Thal'].astype('category')
data['Thal'] = data['Thal'].cat.codes

data['AHD'] = data['AHD'].astype('category')
data['AHD'] = data['AHD'].cat.codes

- Handling missing values in the dataset

In [7]:
data.isnull().sum()

Age          0
Sex          0
ChestPain    0
RestBP       0
Chol         0
Fbs          0
RestECG      0
MaxHR        0
ExAng        0
Oldpeak      0
Slope        0
Ca           4
Thal         0
AHD          0
dtype: int64

In [9]:
data = data.dropna()
data.shape

(299, 14)

- Splitting the dataset into features (X) and target variable (Y)

In [10]:
X = data.drop(columns="AHD")
Y = data['AHD']

- Splitting the data into training and testing sets

In [12]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=22)

In [13]:
X_train

Unnamed: 0,Age,Sex,ChestPain,RestBP,Chol,Fbs,RestECG,MaxHR,ExAng,Oldpeak,Slope,Ca,Thal
151,42,0,0,102,265,0,2,122,0,0.6,2,0.0,1
301,57,0,2,130,236,0,2,174,0,0.0,2,1.0,1
73,65,1,0,110,248,0,2,158,0,0.6,1,2.0,0
30,69,0,3,140,239,0,0,151,0,1.8,1,2.0,1
86,47,1,1,138,257,0,2,156,0,0.0,1,0.0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
254,43,1,0,115,303,0,0,181,0,1.2,2,0.0,1
14,52,1,1,172,199,1,0,162,0,0.5,1,0.0,2
146,57,1,0,165,289,1,2,124,0,1.0,2,3.0,2
84,52,1,2,120,325,0,0,172,0,0.2,1,0.0,1


- Scaling the features using StandardScaler


In [16]:
scaler = StandardScaler()

In [18]:
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
X_train_scaled

array([[-1.35768912, -1.42442462, -0.83710367, ...,  0.59989352,
        -0.72333971, -0.48718784],
       [ 0.28000053, -1.42442462,  1.27078388, ...,  0.59989352,
         0.30508206, -0.48718784],
       [ 1.15343501,  0.70203785, -0.83710367, ..., -1.00751348,
         1.33350382, -2.10341417],
       ...,
       [ 0.28000053,  0.70203785, -0.83710367, ...,  0.59989352,
         2.36192559,  1.12903849],
       [-0.26589602,  0.70203785,  1.27078388, ..., -1.00751348,
        -0.72333971, -0.48718784],
       [-2.77702015,  0.70203785,  1.27078388, ..., -1.00751348,
        -0.72333971, -0.48718784]])

In [19]:
X_test_scaled

array([[ 0.71671777, -1.42442462, -0.83710367, ...,  0.59989352,
        -0.72333971,  1.12903849],
       [-0.48425464,  0.70203785,  0.21684011, ...,  0.59989352,
         0.30508206,  1.12903849],
       [-0.37507533,  0.70203785,  0.21684011, ..., -1.00751348,
         0.30508206,  1.12903849],
       ...,
       [-0.70261326,  0.70203785,  0.21684011, ..., -1.00751348,
         1.33350382, -0.48718784],
       [ 0.17082122,  0.70203785,  1.27078388, ...,  2.20730051,
        -0.72333971, -0.48718784],
       [ 0.38917984,  0.70203785,  0.21684011, ...,  0.59989352,
        -0.72333971,  1.12903849]])

- Training a logistic regression model on the training data


In [21]:
logisticRegr = LogisticRegression(random_state=0)
logisticRegr.fit(X_train, Y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [22]:
logisticRegr.predict(X_train_scaled)



array([0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1,
       0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0,
       1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1,
       1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0,
       0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0,
       1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1,
       1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1,
       1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0,
       1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0], dtype=int8)

- Evaluating the model's performance on the training and testing sets


In [23]:
logisticRegr.score(X_train_scaled, Y_train)



0.8277511961722488

In [24]:
logisticRegr.score(X_test_scaled, Y_test)



0.8

- Training and evaluating logistic regression models with different regularization parameters (C values)


In [34]:
logisticRegr1 = LogisticRegression(random_state=0,
                                   C=.9,
                                   fit_intercept=True
                                   )
logisticRegr1.fit(X_train, Y_train)

print(logisticRegr1.score(X_train_scaled, Y_train))
print(logisticRegr1.score(X_test_scaled, Y_test))

0.8229665071770335
0.8


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [36]:
logisticRegr2 = LogisticRegression(random_state=0,
                                   C=.5,
                                   fit_intercept=True
                                   )
logisticRegr2.fit(X_train, Y_train)
print(logisticRegr2.score(X_train_scaled, Y_train))
print(logisticRegr2.score(X_test_scaled, Y_test))

0.8277511961722488
0.8111111111111111


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [38]:
logisticRegr4 = LogisticRegression(random_state=0,
                                   C=.3,
                                   fit_intercept=True
                                   )
logisticRegr4.fit(X_train, Y_train)
print(logisticRegr4.score(X_train_scaled, Y_train))
print(logisticRegr4.score(X_test_scaled, Y_test))

0.8516746411483254
0.8333333333333334


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [39]:
logisticRegr5 = LogisticRegression(random_state=0,
                                   C=.2,
                                   fit_intercept=True
                                   )
logisticRegr5.fit(X_train, Y_train)
print(logisticRegr5.score(X_train_scaled, Y_train))
print(logisticRegr5.score(X_test_scaled, Y_test))

0.8325358851674641
0.8222222222222222


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
