# Predicting Heart Disease using Machine Learning

This notebook uses various python machine learning tools to build a model that predicts the susceptibility of a person
having a heart disease using thier medical records as reference

The framework we've used:
1. Problem Definition
2. Data
3. Evaluation
4. Features
5. Modelling
6. Experimentation

## 1. Problem Definition

> Given clinical parameters of a patient, can we predict whether or not they have a heart disease?

## 2. Data

Original Data is from UCI Machine Learning - [Heart-Disease-Dataset]('https://archive.ics.uci.edu/ml/datasets/Heart+Disease')

## 3. Evaluation

> If our model predicts with an accuracy of 95% and above, we will pursue the project.

## 4. Features

    1. age - age in years
    2. sex - (1 = male; 0 = female) 
    3. cp - chest pain type 
        * 0: Typical angina: chest pain related decrease blood supply to the heart
        * 1: Atypical angina: chest pain not related to heart
        * 2: Non-anginal pain: typically esophageal spasms (non heart related)
        * 3: Asymptomatic: chest pain not showing signs of disease
    4. trestbps - resting blood pressure (in mm Hg on admission to the hospital)
        * anything above 130-140 is typically cause for concern
    5. chol - serum cholestoral in mg/dl 
        * serum = LDL + HDL + .2 * triglycerides
        * above 200 is cause for concern
    6. fbs - (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) 
        * '>126' mg/dL signals diabetes
    7. restecg - resting electrocardiographic results
        * 0: Nothing to note
        * 1: ST-T Wave abnormality
            - can range from mild symptoms to severe problems
            - signals non-normal heart beat
        * 2: Possible or definite left ventricular hypertrophy
            - Enlarged heart's main pumping chamber
    8. thalach - maximum heart rate achieved 
    9. exang - exercise induced angina (1 = yes; 0 = no) 
    10. oldpeak - ST depression induced by exercise relative to rest 
        * looks at stress of heart during excercise
        * unhealthy heart will stress more
    11. slope - the slope of the peak exercise ST segment
        * 0: Upsloping: better heart rate with excercise (uncommon)
        * 1: Flatsloping: minimal change (typical healthy heart)
        * 2: Downslopins: signs of unhealthy heart
    12. ca - number of major vessels (0-3) colored by flourosopy 
        * colored vessel means the doctor can see the blood passing through
        * the more blood movement the better (no clots)
    13. thal - thalium stress result
        * 1,3: normal
        * 6: fixed defect: used to be defect but ok now
        * 7: reversable defect: no proper blood movement when excercising 
    14. target - have disease or not (1=yes, 0=no) (= the predicted attribute)

# Preparing the tools

we are going to use Pandas, Matplotlib and Numpy for data analysis manipulation

In [8]:
# Regular EDA ( Exploratory Data Analysis ) and plotting libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# plot inside of notebook
%matplotlib inline 


# Models from sklearn
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

# Model Evaluation
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import precision_score, f1_score, recall_score
from sklearn.metrics import plot_roc_curve