# Classification Project: Heart Disease Prediction (Cleveland Dataset)

**Author:** Saratachandra Golla     
**Date:** 11/09/2025    
**Project Goal:** Predict the presence of heart disease (Class 1) versus no disease (Class 0) using clinical data from the Cleveland dataset. This project follows a structured approach: data cleaning, feature engineering, exploratory analysis, and comparative model evaluation using Logistic Regression and a Decision Tree Classifier.

## 1. Import and Inspect the Data

In [1]:
# All imports at the top
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import StratifiedShuffleSplit, train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Set plot style
sns.set_style("whitegrid")

### 1.1 Load the dataset and display the first 10 rows.

In [12]:

# Load the Cleveland dataset from the UCI archive
df = pd.read_csv("data\\heart_disease.data",header=0, na_values="?")

print("First 10 rows of the Heart Disease dataset:")
print(df.head(10))

First 10 rows of the Heart Disease dataset:
    age   sex   cp   trestbps   chol   fbs   restecg   thalach   exang  \
0  63.0   1.0  1.0      145.0  233.0   1.0       2.0     150.0     0.0   
1  67.0   1.0  4.0      160.0  286.0   0.0       2.0     108.0     1.0   
2  67.0   1.0  4.0      120.0  229.0   0.0       2.0     129.0     1.0   
3  37.0   1.0  3.0      130.0  250.0   0.0       0.0     187.0     0.0   
4  41.0   0.0  2.0      130.0  204.0   0.0       2.0     172.0     0.0   
5  56.0   1.0  2.0      120.0  236.0   0.0       0.0     178.0     0.0   
6  62.0   0.0  4.0      140.0  268.0   0.0       2.0     160.0     0.0   
7  57.0   0.0  4.0      120.0  354.0   0.0       0.0     163.0     1.0   
8  63.0   1.0  4.0      130.0  254.0   0.0       2.0     147.0     0.0   
9  53.0   1.0  4.0      140.0  203.0   1.0       2.0     155.0     1.0   

    oldpeak   slope   ca   thal   target  
0       2.3     3.0  0.0    6.0        0  
1       1.5     2.0  3.0    3.0        2  
2       2.6 