# Personal Key Indicators of Heart Disease - Exploratory Data Analysis

According to a study done by PHAC (Public Health Agency of Canada), about 1 in 12 (~2.6 million) Canadians aged 20 and older live with diagnosed heart disease. Further, they have reported that every hour ~14 Canadian adults aged 20 and older with diagnosed heart disease die. Although these numbers are very high, heart disease is avoidable and there are many factors that can aid in this prevention. This project aims to identify what these factors are as well as the impact they have on heart disease, to help individuals better understand what things they can do in their day to day life to avoid getting heart disease as well as things they can do to reduce its effects if they do already have it. The data is from the CDC, which comes from the United States, but the overall idea of prevention can still be applied to those in Canada. 

Dataset: https://www.kaggle.com/datasets/kamilpytlak/personal-key-indicators-of-heart-disease?resource=download

### Import Necessary Packages

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Importing Data and Checking for N/A Values

In [2]:
heart_disease_data = pd.read_csv('heart_2020_cleaned.csv')
heart_disease_data.head()

Unnamed: 0,HeartDisease,BMI,Smoking,AlcoholDrinking,Stroke,PhysicalHealth,MentalHealth,DiffWalking,Sex,AgeCategory,Race,Diabetic,PhysicalActivity,GenHealth,SleepTime,Asthma,KidneyDisease,SkinCancer
0,No,16.6,Yes,No,No,3.0,30.0,No,Female,55-59,White,Yes,Yes,Very good,5.0,Yes,No,Yes
1,No,20.34,No,No,Yes,0.0,0.0,No,Female,80 or older,White,No,Yes,Very good,7.0,No,No,No
2,No,26.58,Yes,No,No,20.0,30.0,No,Male,65-69,White,Yes,Yes,Fair,8.0,Yes,No,No
3,No,24.21,No,No,No,0.0,0.0,No,Female,75-79,White,No,No,Good,6.0,No,No,Yes
4,No,23.71,No,No,No,28.0,0.0,Yes,Female,40-44,White,No,Yes,Very good,8.0,No,No,No


In [3]:
heart_disease_data.isnull().sum()

HeartDisease        0
BMI                 0
Smoking             0
AlcoholDrinking     0
Stroke              0
PhysicalHealth      0
MentalHealth        0
DiffWalking         0
Sex                 0
AgeCategory         0
Race                0
Diabetic            0
PhysicalActivity    0
GenHealth           0
SleepTime           0
Asthma              0
KidneyDisease       0
SkinCancer          0
dtype: int64

In [4]:
print(heart_disease_data.dtypes)

HeartDisease         object
BMI                 float64
Smoking              object
AlcoholDrinking      object
Stroke               object
PhysicalHealth      float64
MentalHealth        float64
DiffWalking          object
Sex                  object
AgeCategory          object
Race                 object
Diabetic             object
PhysicalActivity     object
GenHealth            object
SleepTime           float64
Asthma               object
KidneyDisease        object
SkinCancer           object
dtype: object


### Questions to be Answered

1. What factor(s) play the largest role and have the most correlation to getting heart disease?
2. Are those with other health issues (diabetic, asthma, kidney disease, skin cancer) more likely to get heart disease than those with none? If so, which of these issues is most likned to heart disease?
3. Is there a certain age range that heart disease is being identified more than others?
4. Is there any correlation between physical/mental health and getting heart disease?
5. Does alcohol consumption and smoking have any effect on a person getting heart disease? Are those that don't partake in both substances less likely to get heart disease?
6. Is there any correlation between sleeping and getting a heart disease? Are those that sleep less, still able to avoid heart disease through other factors (physical activity, etc.)? 