# Cardiovascular Disease Dataset

This dataset describes features such as patient age, chest pain type, and other medical attributes, aiming to predict the presence of cardiovascular disease.

## Getting Started
1. `cd environment`
2. `conda env create -f 7641_project_env.yml`
3. `conda activate 7641_project_env`

If that doesn't work...
1. `!pip3 install -U ucimlrepo`

## Data Cleaning and Exploration

In [1]:
import pandas as pd
import numpy as np
import pprint
import math
import matplotlib.pyplot as plt

In [2]:
# Load the dataset (assuming you have the CSV)
data_url = 'Cardiovascular_Disease_Dataset.csv'
df = pd.read_csv(data_url)
print("Cardiovascular Disease Dataset loaded.")

Cardiovascular Disease Dataset loaded.


In [3]:
# Checking the shape of the dataset
df.shape

(303, 14)


In [4]:
# What does the data look like?
df.head()

Unnamed: 0,patientid,age,gender,chestpain,restingBP,serumcholestrol,fastingbloodsugar,restingelectro,maxheartrate,exerciseangia,oldpeak,slope,noofmajorvessels,target
0,1,63,1,1,145,233,1,2,150,0,2.3,3,0.0,1
1,2,67,1,4,160,286,0,2,108,1,1.5,2,3.0,1


In [5]:
# Check for null values in the dataset
print(df.isnull().any() == False)

age         True
gender      True
chestpain   True
restingBP   True
serumcholestrol True
fastingbloodsugar True
restingelectro True
maxheartrate True
exerciseangia True
oldpeak     True
slope       True
noofmajorvessels True
target      True


## Handling Missing Data
In case there are missing values, we'll handle them by filling missing numeric values with the mean.

In [6]:
# Filling missing values with the mean (for numerical columns)
noofmajorvessels_mean = int(math.ceil(df['noofmajorvessels'].mean()))
thal_mean = int(math.ceil(df['thal'].mean()))
print(f"noofmajorvessels mean: {noofmajorvessels_mean}")
print(f"thal mean: {thal_mean}")

# Fill missing values
df['noofmajorvessels'].fillna(noofmajorvessels_mean, inplace=True)
df['thal'].fillna(thal_mean, inplace=True)

noofmajorvessels mean: 1
thal mean: 2


### Data Export
Export the cleaned dataset into a CSV file.

In [7]:
# Export the cleaned dataset
df.to_csv('cardiovascular_disease_cleaned.csv', index=False)