Skip to content

reallyinvincible/Heart-Disease-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Engineering Capstone Proposal

Heart Disease Prediction: Saving lives using Machine Learning

by Sparsh Srivastava

A) Domain Background: Overview

Heart disease is the leading cause of death for people of most racial and ethnic groups in the United States, including African American, American Indian, Alaska Native, Hispanic, and white men. For women from the Pacific Islands and Asian American, American Indian, Alaska Native, and Hispanic women, heart disease is second only to cancer.

  • One person dies every 37 seconds just in the United States alone from cardiovascular disease.
  • About 647,000 Americans die from heart disease each year—that's 1 in every 4 deaths.
  • Heart disease costs the United States about $219 billion each year from 2014 to 2015. This includes the cost of health care services, medicines, and lost productivity due to death.

B) Problem Statements

  • Complete analysis of Heart Disease UCI dataset both visually and statistically to obtain critical observations which can be used for inference.
  • To predict whether a person has a heart disease or not based on the various biological and physical parameters of the body
  • To make a model having high accuracy and precision and can predict the results with greater confidence.
  • Make these predictions accessible to users and patients anywhere, anytime so that they can get complete picture of their Health

C) Datasets and Inputs

1. Collecting Data

The data used for training and testing is the Heart Disease UCI downloaded from Kaggle. This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date. The "goal" field refers to the presence of heart disease in the patient.

image

2. Exploratory Data Analysis

It's a clean, easy to understand set of data. However, the meaning of some of the column headers are not obvious. Here's what they mean,

  • age: The person's age in years
  • sex: The person's sex (1 = male, 0 = female)
  • cp: The chest pain experienced (Value 1: typical angina, Value 2: atypical angina, Value 3: non-anginal pain, Value 4: asymptomatic)
  • trestbps: The person's resting blood pressure (mm Hg on admission to the hospital)
  • chol: The person's cholesterol measurement in mg/dl
  • fbs: The person's fasting blood sugar (> 120 mg/dl, 1 = true; 0 = false)
  • restecg: Resting electrocardiographic measurement (0 = normal, 1 = having ST-T wave abnormality, 2 = showing probable or definite left ventricular hypertrophy by Estes' criteria)
  • thalach: The person's maximum heart rate achieved
  • exang: Exercise induced angina (1 = yes; 0 = no)
  • ldpeak: ST depression induced by exercise relative to rest ('ST' relates to positions on the ECG plot. See more here)
  • slope: the slope of the peak exercise ST segment (Value 1: upsloping, Value 2: flat, Value 3: downsloping)
  • ca: The number of major vessels (0-3)
  • thal: A blood disorder called thalassemia (3 = normal; 6 = fixed defect; 7 = reversable defect)
  • target: Heart disease (0 = no, 1 = yes)

3. Data Visualization

Now let's see various visual representations of the data to understand more about relationship between various features.

image

image

4. Correlation Matrix

The best way to compare relationship between various features is to look at the correlation matrix between those features.

image

5. Inputs

Here our Model is trained to predict whether a person has a heart disease or not based on the following common features as input:

  • age
  • gender
  • chest pain
  • blood pressure
  • cholesterol level
  • max heart rate

D) Solution Statements

  • To make a Linear Regression Model, the problem being a binary classification with very less correlation between features.
  • To deploy the trained model on AWS Sagemaker and subsequently deploying an endpoint which can be used to make predictions
  • Using the deployed endpoint, AWS Lambda Function and Amazon Gateway to create a publicly accessible API where predictions can be made using the parameters.
  • Using the API to create an Android App which can help user and patients predict their heart's health anytime and anywhere.

E) Benchmarks

The model will be using a test dataset for benchmarking. The predicted labels will be compared to the original labels to find false positives and false negatives. Number of false positives and false negatives will tell us about the performance of model.

F) Evaluation Metrics

The model will be using various evaluation metrics such as

  • Accuracy: which refers to how close a measurement is to the true value and can be calculated using the following formula

image

  • Precision: which is how consistent results are when measurements are repeated and can be calculated using the following formula

image

  • Recall: which refers to the percentage of total relevant results correctly classified by the model and can be calculated using the formula

image

G) Project Design

image

The diagram above gives an overview of how the various services will work together. On the far right is the model which we trained above and which is deployed using SageMaker. On the far left is the Android App that collects a user's information, sends it off and expects a prediction in return.

In the middle we will construct a Lambda function, which is a straightforward Python function that can be executed whenever a specified event occurs. We will give this function permission to send and recieve data from a SageMaker endpoint.

Lastly, the method we will use to execute the Lambda function is a new endpoint that we will create using API Gateway. This endpoint will be a url that listens for data to be sent to it. Once it gets some data it will pass that data on to the Lambda function and then return whatever the Lambda function returns. Essentially it will act as an interface that lets our web app communicate with the Lambda function.

H) Instructions

Everything has been explained and summarized in the Python Notebook.
I have made an Android App - Salveo that goes with this model.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published