# Heart Disease Risk Prediction Using Logistic Regression

**objectives:**
Perform comprehensive EDA to understand feature relationships
- Implement logistic regression from scratch and using sklearn
- Compare regularization techniques (L1, L2, Elastic Net)
- Deploy model to AWS SageMaker for real-time inference
- Provide actionable clinical insights

## Setup


In [None]:
# Install required libraries (run this once if needed)
%pip install numpy pandas matplotlib

import pandas as pd
import matplotlib.pyplot as plt

## 1. Load and Prepare the Dataset

In [None]:
df = pd.read_csv("Heart_Disease_Prediction.csv")
df.head()

In [None]:
# binarize the Heart Disease column
df["Heart Disease"] = df["Heart Disease"].map({"Presence": 1, "Absence": 0})
print(df)


In [None]:
# Data types and missing values
df.columns = df.columns.str.replace("Presence", "Absence")
df.columns = df.columns.str.replace("-", "_")
df.info()

In [None]:
# Statistical summary
df.describe()

In [None]:
# Target variable distribution
plt.figure(figsize=(6,4))
df['Heart Disease'].value_counts().plot(kind='bar', color=['skyblue', 'salmon'])
plt.title('Distribution of Heart Disease')  
plt.xlabel('Heart Disease (0 = Absence, 1 = Presence)')
plt.ylabel('Count')
plt.show()
