# Using Logistic Regression for Classifying Heart Disease

## 1. Introduction

This is a guided project from Dataquest's course "Logistic Regression Modeling in Python".

The aim is to implement a logistic regression machine learning model on a sanitized version of a real-life [Heart Disease dataset](https://archive.ics.uci.edu/dataset/45/heart+disease) from the UC Irvine Machine Learning Repository, donated by the Cleveland Clinic Foundation, which recorded information on various patient characteristics, such as age and chest pain, to try to classify the presence of heart disease in an individual.

The dataset contains these attributes:
1. **age**: age in years.
2. **sex**: gender (1 = male; 0 = female).
3. **cp**: chest pain type:
    - Value 1: typical angina
    - Value 2: atypical angina
    - Value 3: non-anginal pain
    - Value 4: asymptomatic
4. **trestbps**: resting blood pressure (in mm Hg on admission to the hospital.
5. **chol**: serum cholesterol in mg/dl.
6. **fbs**: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false).
7. **restecg**: resting electrocardiographic results.
8. **thalach**: maximum heart rate achieved.
9. **exang**: exercise induced angina (1 = yes; 0 = no)
10. **oldpeak**: ST depression induced by exercise relative to rest
11. **slope**: the slope of the peak exercise ST segment:
    - Value 1: upsloping
    - Value 2: flat
    - Value 3: downsloping
12. **ca**: number of major vessels (0-3) colored by flouroscopy.
13. **thal**: 3 = normal; 6 = fixed defect; 7 = reversible defect.
14. **present** (the predicted attribute): diagnosis of heart disease:
    - Value 0: not present
    - Value 1: present

In [2]:
import pandas as pd

# Read the data into a dataframe
heart = pd.read_csv('heart_disease.csv')

## 2. Exploring the Dataset

In [3]:
# Display the first five rows of the dataframe
heart.head()

Unnamed: 0.1,Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,present
0,1,63,1,1,145,233,1,2,150,0,2.3,3,0.0,6.0,0
1,2,67,1,4,160,286,0,2,108,1,1.5,2,3.0,3.0,1
2,3,67,1,4,120,229,0,2,129,1,2.6,2,2.0,7.0,1
3,4,37,1,3,130,250,0,0,187,0,3.5,3,0.0,3.0,0
4,5,41,0,2,130,204,0,2,172,0,1.4,1,0.0,3.0,0


In [4]:
heart = heart.drop('Unnamed: 0', axis=1)