### Heart Disease Predictors

#### Background:
Heart disease, also known as cardiovascular disease, is a broad term that encompasses various conditions affecting the heart and circulatory system. It is a leading cause of disability worldwide. Because the heart is one of the body’s most essential organs, its disorders can impact other organs and body systems as well. There are many types and forms of heart disease, with the most common involving the narrowing or blockage of coronary arteries, valve dysfunction, enlargement of the heart, and other issues that can result in heart attacks or heart failure. [Source](https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-%28cvds%29?)

#### Dataset Description:
This dataset originates from 1988 and includes data collected from four sources: `Cleveland, Hungary, Switzerland, and Long Beach V`. It contains 76 attributes in total, including the target variable; however, most published studies typically utilize a subset of 14 of these features. The “target” variable indicates whether a patient has heart disease, represented as an `integer — 0 for no disease and 1 for presence of disease`.

#### Objective: 
The ojective of this project is to use exploratory analysis to determine heart disease prdeictors using the provided dataset.

Data Source: [Kaggle](https://www.google.com/search?q=kaggle%2Finput%2Fheart-disease%2Fheart.csv&rlz=1C1KNTJ_enNG1087NG1088&oq=kaggle%2Finput%2Fheart-disease%2Fheart.csv&gs_lcrp=EgZjaHJvbWUqBggAEEUYOzIGCAAQRRg7MgYIARBFGDrSAQgxMTM0ajBqN6gCCLACAfEFy5I5ff9HO2Y&sourceid=chrome&ie=UTF-8)

In [1]:
import pandas as pd
data = pd.read_csv('heart.csv')

In [2]:
data.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [3]:
#Display column data types
data.dtypes

age           int64
sex           int64
cp            int64
trestbps      int64
chol          int64
fbs           int64
restecg       int64
thalach       int64
exang         int64
oldpeak     float64
slope         int64
ca            int64
thal          int64
target        int64
dtype: object

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       303 non-null    int64  
 1   sex       303 non-null    int64  
 2   cp        303 non-null    int64  
 3   trestbps  303 non-null    int64  
 4   chol      303 non-null    int64  
 5   fbs       303 non-null    int64  
 6   restecg   303 non-null    int64  
 7   thalach   303 non-null    int64  
 8   exang     303 non-null    int64  
 9   oldpeak   303 non-null    float64
 10  slope     303 non-null    int64  
 11  ca        303 non-null    int64  
 12  thal      303 non-null    int64  
 13  target    303 non-null    int64  
dtypes: float64(1), int64(13)
memory usage: 33.3 KB


### Data Glossary
- age: age in years
- sex: gender
     - 1 = male
     - 0 = female
- cp: chest pain type
    - value 0: typical angina
    - value 1: atypical angina
    - value 2: non-anginal pain
    - value 3: asymptomatic
- trestbps: resting blood pressure (in mm Hg on admission to the hospital)
- chol: serum cholestoral in mg/dl
- fbs: (fasting blood sugar > 120 mg/dl)
    - 1 = true
    - 0 = false
- restecg: resting electrocardiographic results
    - value 0: normal
    - value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
    - value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
- thalach: maximum heart rate achieved
- exang: exercise induced angina
    - 1 = yes
    - 0 = no
- oldpeak = ST depression induced by exercise relative to rest
- slope: the slope of the peak exercise ST segment
    - value 0: upsloping
    - value 1: flat
    - value 2: downsloping
- ca: number of major vessels (0-3) colored by flourosopy
- thal:
   - 0 = error (in the original dataset 0 maps to NaN's)
   - 1 = fixed defect
   - 2 = normal
   - 3 = reversable defect
- target (the label):
    - 0 = no disease,
    - 1 = disease