In [1]:
%%html
<style>
table {align:left; display:block}
</style>

# Analysing and Predicting Heart Attacks using Clinical Parameters of Patients


## Data Dictionary

Below mentioned are several clinical parameters which would be required when moving forward. This section will clearly define what the particular variables stand for and what it explains where necessary.

1. `age` - Age (in years)


2. `sex` - Gender
    * `0`: Female
    * `1`: Male


3. `cp` - Type of chest pain
    * `0`: `Typical Angina`: It is a substernal chest pain which is provoked due to physical exertion or emotional stress, and relieved by nitroglycerine or rest (or both).
    * `1`: `Atypical Angina`: It is a condition where a person experiences a chest pain which does not meet the criteria for Angina. Angina chest pain can be defined as a pressure or squeezing like sensation, that is typically caused when the coronary heart muscles does not receive the sufficient amount of oxygenated blood.
    * `2`: `Non-Anginal Pain`: It is a type of chest pain which resembles a chest pain in people who does not have any heart disease. It is also known as "Non-cardiac chest pain."
    * `3`: `Asymptomatic`: It is the condition which has transient symptoms / few to no symptoms / symptoms that are unrecognisable as a heart attack. It is also known as "Silent heart attack."



4. `trestbps` - Resting / Normal blood pressure of a person (measured in mmHg)


| BLOOD PRESSURE CATEGORY | SYSTOLIC mm Hg | DIASTOLIC mm Hg |
| :- | :-: | :-: |
| NORMAL | LESS THAN 120 | LESS THAN 80 |
| ELEVATED | 120-129 | LESS THAN 80 |
| HIGH BLOOD PRESSURE (HYPERTENSION) STAGE 1 | 130-139 | 80-89 |
| HIGH BLOOD PRESSURE (HYPERTENSION) STAGE 2 | 140 OR HIGHER | 90 OR HIGHER |
| HYPERTENSIVE CRISIS | HIGHER THAN 180 | HIGHER THAN 120 |


5. `chol` - Cholesterol (measured in mg/dl)


6. `fbs` - Fasting blood sugar > 120 mg/dl
    * `0`: False
    * `1`: True
    
    
7. `restecg` - Resting electrocardiographic results
    * 0: Normal
    * 1: Has ST-T Wave abnormality
    * 2: Has or shows signs of left ventricular hypertrophy


8. `thalach` - The maximum heart rate achieved


9. `exang` - Exercise induced Angina
    * `0`: No
    * `1`: Yes


10. `oldpeak` - The previous peak


11. `slp` - Slope of the exercise induced ST segment
    * `0`: `Upsloping`: Heart rate becomes better with exercise
    * `1`: `Flatsloping`: Healthy heart with barely any change
    * `2`: `Downsloping`: Unhealthy heart


12. `caa` - Number of major vessels (0-4)


13. `thal` - Result of thallium stress test (0-3)


14. `output` - Has had a heart attack or not
    * `0`: No
    * `1`: Yes

# Import Libraries

In [None]:
import sys
!{sys.executable} -m pip install xgboost
!{sys.executable} -m pip install hvplot
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler 
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
import xgboost as xgb
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier  
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

import hvplot.pandas
from IPython.display import Markdown as md

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

In [3]:
# loading the dataset (patients.csv) file and assigning to the 'patients' variable
patients=pd.read_csv('patients.csv')