# Objective :To build and evaluate a machine learning model for diagnosing heart disease using medical data, with metrics like accuracy, precision, and recall.

**Import Libraries**

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, classification_report

**Load and Inspect the Dataset**

In [2]:
data = pd.read_csv('/kaggle/input/heart-disease-health-indicators/heart_disease_health_indicators_BRFSS2015.csv')

# Display dataset structure
print("First 5 rows of the dataset:")
print(data.head())

First 5 rows of the dataset:
   HeartDiseaseorAttack  HighBP  HighChol  CholCheck   BMI  Smoker  Stroke  \
0                   0.0     1.0       1.0        1.0  40.0     1.0     0.0   
1                   0.0     0.0       0.0        0.0  25.0     1.0     0.0   
2                   0.0     1.0       1.0        1.0  28.0     0.0     0.0   
3                   0.0     1.0       0.0        1.0  27.0     0.0     0.0   
4                   0.0     1.0       1.0        1.0  24.0     0.0     0.0   

   Diabetes  PhysActivity  Fruits  ...  AnyHealthcare  NoDocbcCost  GenHlth  \
0       0.0           0.0     0.0  ...            1.0          0.0      5.0   
1       0.0           1.0     0.0  ...            0.0          1.0      3.0   
2       0.0           0.0     1.0  ...            1.0          1.0      5.0   
3       0.0           1.0     1.0  ...            1.0          0.0      2.0   
4       0.0           1.0     1.0  ...            1.0          0.0      2.0   

   MentHlth  PhysHlth  Diff

**Feature Selection and Target Variable Identification**

In [3]:
# Set the target column directly
target_column = "HeartDiseaseorAttack"  # Specify the target column name here
X = data.drop(columns=[target_column])
y = data[target_column]

**Standardize the Features**

In [4]:
# Scale the features for uniformity
scaler = StandardScaler()
X = scaler.fit_transform(X)

**Split the Data into Training and Testing Sets**

In [5]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

**Initialize and Train the Model**

In [6]:
# Initialize and train the model
model = RandomForestClassifier(random_state=42, n_estimators=100)
model.fit(X_train, y_train)

**Make Predictions**

In [7]:
# Make predictions
y_pred = model.predict(X_test)

**Evaluate the Model**

In [8]:
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted', zero_division=0)
recall = recall_score(y_test, y_pred, average='weighted', zero_division=0)

# Display results
print("\nModel Evaluation:")
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred, zero_division=0))


Model Evaluation:
Accuracy: 0.90
Precision: 0.87
Recall: 0.90

Classification Report:
              precision    recall  f1-score   support

         0.0       0.92      0.98      0.95     69007
         1.0       0.44      0.12      0.18      7097

    accuracy                           0.90     76104
   macro avg       0.68      0.55      0.57     76104
weighted avg       0.87      0.90      0.88     76104

