# Heart Attack Risk Prediction using Gradient Boosting
In this notebook, we will predict the risk of heart attacks based on various medical, lifestyle, and demographic factors. We will apply the Gradient Boosting Classifier, a powerful machine learning algorithm for classification tasks.

### Steps involved:
1. Load the dataset
2. Data Cleaning and Preprocessing
3. Split the data into training and testing sets
4. Standardize the data
5. Train a Gradient Boosting Classifier
6. Evaluate the model
7. Perform a sample prediction


In [None]:
# Import necessary libraries
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report


## Step 1: Load the Dataset
We begin by loading the heart attack dataset, which contains various features related to patients' health, lifestyle, and demographic factors.
The dataset includes information such as age, cholesterol levels, blood pressure, exercise hours, and more.


In [None]:
# Load the dataset
data = pd.read_csv('/content/heart_attack_prediction_dataset.csv')
# Preview the first few rows of the dataset
data.head()


## Step 2: Data Cleaning and Preprocessing
Now we will clean and preprocess the data to make it suitable for machine learning.

### Key preprocessing steps:
- Split blood pressure into two features: systolic and diastolic pressure
- Convert categorical variables (e.g., sex, diet) to numerical values
- Apply one-hot encoding for country, continent, and hemisphere features


In [None]:
# Data cleaning and processing
data[['Systolic BP', 'Diastolic BP']] = data['Blood Pressure'].str.split('/', expand=True)
data['Systolic BP'] = pd.to_numeric(data['Systolic BP'], errors='coerce')
data['Diastolic BP'] = pd.to_numeric(data['Diastolic BP'], errors='coerce')
data['Sex'] = data['Sex'].apply(lambda x: 1 if x == 'Male' else 0)
data['Diet'] = data['Diet'].apply(lambda x: 1 if x == 'Healthy' else 0)
data = pd.get_dummies(data, columns=['Country', 'Continent', 'Hemisphere'], drop_first=True)
data_cleaned = data.drop(columns=['Blood Pressure', 'Patient ID'])
# Display cleaned data
data_cleaned.head()


## Step 3: Splitting the Data
We will now split the data into training and testing sets.
The training set will be used to build the model, while the testing set will be used to evaluate how well the model performs on unseen data.


In [None]:
# Define features (X) and target variable (y)
X = data_cleaned.drop('Heart Attack Risk', axis=1)
y = data_cleaned['Heart Attack Risk']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


## Step 4: Standardizing the Data
Standardization is an important step in machine learning, especially for algorithms like Gradient Boosting.
Standardizing ensures that all numerical features have the same scale, which helps the model learn more efficiently.


In [None]:
# Standardize the numerical features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


## Step 5: Train the Gradient Boosting Classifier
We will now train a Gradient Boosting Classifier on the training data.
This model builds multiple decision trees and combines them to make accurate predictions.


In [None]:
# Initialize and train the Gradient Boosting model
gb_model = GradientBoostingClassifier(random_state=42)
gb_model.fit(X_train_scaled, y_train)


## Step 6: Evaluate the Model
We will now evaluate the trained model using the testing data.
We will calculate key metrics such as precision, recall, and F1-score to assess how well the model performs in predicting heart attack risk.


In [None]:
# Make predictions and evaluate the model
y_pred = gb_model.predict(X_test_scaled)
print(classification_report(y_test, y_pred))


## Step 7: Perform a Sample Prediction
Finally, we will use the trained model to make a prediction for a sample patient.
We will provide the model with the features of a test patient and check the predicted risk for a heart attack.


In [None]:
# Perform a sample prediction using the first test example
sample_input = [X_test_scaled[0]]
sample_prediction = gb_model.predict(sample_input)
print(f'Predicted Heart Attack Risk (0 = No Risk, 1 = High Risk): {sample_prediction[0]}')
