In [96]:
import warnings
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

warnings.filterwarnings('ignore')

In [97]:
df = pd.read_csv('heart.csv')
df_copy = df.copy()

In [98]:
df.head()

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
0,40,M,ATA,140,289,0,Normal,172,N,0.0,Up,0
1,49,F,NAP,160,180,0,Normal,156,N,1.0,Flat,1
2,37,M,ATA,130,283,0,ST,98,N,0.0,Up,0
3,48,F,ASY,138,214,0,Normal,108,Y,1.5,Flat,1
4,54,M,NAP,150,195,0,Normal,122,N,0.0,Up,0


### Cheking NA or null values of our data set

In [99]:
df.isnull().sum() + df.isna().sum()

Age               0
Sex               0
ChestPainType     0
RestingBP         0
Cholesterol       0
FastingBS         0
RestingECG        0
MaxHR             0
ExerciseAngina    0
Oldpeak           0
ST_Slope          0
HeartDisease      0
dtype: int64

### Performing One-Hot Encoding on following collumns
- ChestPainType
- RestingECG
- ExerciseAngina
- ST_Slope
- Sex (Gender)

In [100]:
def one_hot_encoding(data_frame,column_name, prefix_name):
    temp = pd.get_dummies(data_frame[column_name], prefix=prefix_name) * 1
    data_frame.drop(column_name, axis=1, inplace=True)
    return pd.concat([data_frame, temp], axis=1)

In [101]:
columns_to_perform_one_hot_encoding = {
    'ChestPainType': 'Cheest_Pain',
    'RestingECG': 'Resting_ECG',
    'ExerciseAngina': 'Exercise_Angina',
    'ST_Slope': 'ST_Slope',
    'Sex': 'Sex'
}
for key in columns_to_perform_one_hot_encoding:
    df = one_hot_encoding(df, key, columns_to_perform_one_hot_encoding[key])


### Applying Standardization technique on the following collumns
- Age
- RestingBP
- Cholesterol
- MaxHR
- Oldpeak

#### Since the scale of some columns are different, we need to apply Standardization technique to give equal importance to all the columns, other wise those collumn will affect the model most because of their high scales

In [102]:
numerical_features = ['Age', 'RestingBP', 'Cholesterol', 'MaxHR', 'Oldpeak']
scaler = StandardScaler()
df[numerical_features] = scaler.fit_transform(df[numerical_features])

### Converting our pandas DataFrame to numpy series to perform the machine learning algorithms

In [103]:
y = df['HeartDisease'].values
X = df.drop('HeartDisease', axis=1).values

### Spliting our data set to train and test. Test siez will be 20% of whole data

In [104]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Building our logistic model from sikit-learn library and train our model with train data set that we have

In [105]:
logistic_regression_model = LogisticRegression()
logistic_regression_model.fit(X_train, y_train)

In [106]:
logistic_regression_prediction = logistic_regression_model.predict(X_test)

### Finding accuracy of the model by feeding test data set that have splitied before (20% of whole data)

In [107]:
accuracy_of_logistic_regression_model = accuracy_score(y_test, logistic_regression_prediction)
report_of_logistic_regression_model = classification_report(y_test, logistic_regression_prediction)
print("Accuracy:", accuracy_of_logistic_regression_model)
print("Classification Report:\n", report_of_logistic_regression_model)

Accuracy: 0.8532608695652174
Classification Report:
               precision    recall  f1-score   support

           0       0.80      0.87      0.83        77
           1       0.90      0.84      0.87       107

    accuracy                           0.85       184
   macro avg       0.85      0.86      0.85       184
weighted avg       0.86      0.85      0.85       184



```
Accuracy: 0.8532608695652174
Classification Report:
```
```
              precision    recall  f1-score   support

           0       0.80      0.87      0.83        77
           1       0.90      0.84      0.87       107

    accuracy                           0.85       184
   macro avg       0.85      0.86      0.85       184
weighted avg       0.86      0.85      0.85       184
```

- **Accuracy**: The overall accuracy of the model is approximately 85.33%. This indicates the proportion of correct predictions made by the model.

- **Classification Report**:
  - **Precision**: Precision is the ratio of correctly predicted positive observations to the total predicted positives. For class 0, the precision is 80%, meaning that 80% of the instances predicted as class 0 were actually class 0. For class 1, the precision is 90%, indicating that 90% of the instances predicted as class 1 were actually class 1.
  
  - **Recall**: Recall, also known as sensitivity or true positive rate, is the ratio of correctly predicted positive observations to all observations in actual class. For class 0, the recall is 87%, indicating that 87% of the actual instances of class 0 were correctly classified. For class 1, the recall is 84%, meaning that 84% of the actual instances of class 1 were correctly classified.
  
  - **F1-score**: The F1-score is the harmonic mean of precision and recall. It provides a balance between precision and recall. For class 0, the F1-score is 83%, and for class 1, the F1-score is 87%.
  
  - **Support**: Support refers to the number of actual occurrences of the class in the specified dataset. For class 0, there are 77 instances, and for class 1, there are 107 instances.
  
- **Macro Average**: The macro average computes the metric independently for each class and then takes the average. Here, the macro average precision, recall, and F1-score are all approximately 85%.

- **Weighted Average**: The weighted average calculates the metric for each class, but with respect to the number of instances of each class. Here, the weighted average precision, recall, and F1-score are all approximately 85%, taking into account the class imbalance.
