You are provided with heart_tidy dataset. Keenly Explore it manually before beginning your analysis

# Explanation of Dataset Fields:

Age: Age of the patient

Sex: Gender of the patient (1 = male, 0 = female)

ChestPainType: Type of chest pain experienced by the patient (1 = typical angina, 2 = atypical angina, 3 = non-anginal pain, 4 = asymptomatic)

RestingBP: Resting blood pressure (in mm Hg) of the patient

CholesterolLevel: Serum cholesterol level (in mg/dL) of the patient

FastingBloodSugar: Fasting blood sugar > 120 mg/dl (1 = true, 0 = false)

RestingElectrocardiographicResult: Resting electrocardiographic results (0 = normal, 1 = having ST-T wave abnormality, 2 = showing probable or definite left ventricular hypertrophy)

MaxHeartRate: Maximum heart rate achieved during exercise

ExerciseAngina: Exercise-induced angina (1 = yes, 0 = no)

STDepression: ST depression induced by exercise relative to rest

STSegmentSlope: Slope of the peak exercise ST segment (1 = upsloping, 2 = flat, 3 = downsloping)

NumMajorVessels: Number of major vessels colored by fluoroscopy (0-3)

ThalliumStressRest: Thallium stress test result (3 = normal, 6 = fixed defect, 7 = reversible defect)

HeartDiseasePresent: Presence of heart disease (1 = yes, 0 = no)

# Identification of Target and Predictor Variables:

Target Variable: HeartDiseasePresent (presence of heart disease)

Predictor Variables: All other variables listed above (Age, Sex, ChestPainType, RestingBP, CholesterolLevel, FastingBloodSugar, RestingElectrocardiographicResult, MaxHeartRate, ExerciseAngina, STDepression, STSegmentSlope, NumMajorVessels, ThalliumStressRest)

# Definition of Machine Learning Problem:

The machine learning problem is to classify whether a patient has heart disease (HeartDiseasePresent) based on various medical attributes provided in the dataset.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

In [2]:
df = pd.read_csv("heart_tidy.csv")
df.head(10)

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,CholesterolLevel,FastingBloodSugar,RestingElectrocardiographicResult,MaxHeartRate,ExerciseAngina,STDepression,STSegmentSlope,NumMajorVessels,ThalliumStressRest,HeartDiseasePresent,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18
0,63,1,1,145,233,1,2,150,0,2.3,3,0,6,0,,,,,
1,67,1,4,160,286,0,2,108,1,1.5,2,3,3,1,,,,,
2,67,1,4,120,229,0,2,129,1,2.6,2,2,7,1,,,,,
3,37,1,3,130,250,0,0,187,0,3.5,3,0,3,0,,,,,
4,41,0,2,130,204,0,2,172,0,1.4,1,0,3,0,,,,,
5,56,1,2,120,236,0,0,178,0,0.8,1,0,3,0,,,,,
6,62,0,4,140,268,0,2,160,0,3.6,3,2,3,1,,,,,
7,57,0,4,120,354,0,0,163,1,0.6,1,0,3,0,,,,,
8,63,1,4,130,254,0,2,147,0,1.4,2,1,7,1,,,,,
9,53,1,4,140,203,1,2,155,1,3.1,3,0,7,1,,,,,


In [5]:
df = df.drop(['Unnamed: 14','Unnamed: 15','Unnamed: 16','Unnamed: 17','Unnamed: 18'], axis=1)

In [6]:
df.head(10)

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,CholesterolLevel,FastingBloodSugar,RestingElectrocardiographicResult,MaxHeartRate,ExerciseAngina,STDepression,STSegmentSlope,NumMajorVessels,ThalliumStressRest,HeartDiseasePresent
0,63,1,1,145,233,1,2,150,0,2.3,3,0,6,0
1,67,1,4,160,286,0,2,108,1,1.5,2,3,3,1
2,67,1,4,120,229,0,2,129,1,2.6,2,2,7,1
3,37,1,3,130,250,0,0,187,0,3.5,3,0,3,0
4,41,0,2,130,204,0,2,172,0,1.4,1,0,3,0
5,56,1,2,120,236,0,0,178,0,0.8,1,0,3,0
6,62,0,4,140,268,0,2,160,0,3.6,3,2,3,1
7,57,0,4,120,354,0,0,163,1,0.6,1,0,3,0
8,63,1,4,130,254,0,2,147,0,1.4,2,1,7,1
9,53,1,4,140,203,1,2,155,1,3.1,3,0,7,1


In [9]:
df.shape

(300, 14)

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300 entries, 0 to 299
Data columns (total 14 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Age                                300 non-null    int64  
 1   Sex                                300 non-null    int64  
 2   ChestPainType                      300 non-null    int64  
 3   RestingBP                          300 non-null    int64  
 4   CholesterolLevel                   300 non-null    int64  
 5   FastingBloodSugar                  300 non-null    int64  
 6   RestingElectrocardiographicResult  300 non-null    int64  
 7   MaxHeartRate                       300 non-null    int64  
 8   ExerciseAngina                     300 non-null    int64  
 9   STDepression                       300 non-null    float64
 10  STSegmentSlope                     300 non-null    int64  
 11  NumMajorVessels                    300 non-null    int64  

In [11]:
df.describe()

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,CholesterolLevel,FastingBloodSugar,RestingElectrocardiographicResult,MaxHeartRate,ExerciseAngina,STDepression,STSegmentSlope,NumMajorVessels,ThalliumStressRest,HeartDiseasePresent
count,300.0,300.0,300.0,300.0,300.0,300.0,300.0,300.0,300.0,300.0,300.0,300.0,300.0,300.0
mean,54.48,0.68,3.153333,131.626667,246.93,0.146667,0.986667,149.683333,0.326667,1.049667,1.603333,0.67,4.726667,0.46
std,9.078049,0.467256,0.965884,17.687759,51.91798,0.354364,0.994881,22.87489,0.469778,1.162471,0.61692,0.936674,1.938508,0.49923
min,29.0,0.0,1.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,1.0,0.0,3.0,0.0
25%,48.0,0.0,3.0,120.0,211.0,0.0,0.0,133.75,0.0,0.0,1.0,0.0,3.0,0.0
50%,56.0,1.0,3.0,130.0,241.5,0.0,0.5,153.0,0.0,0.8,2.0,0.0,3.0,0.0
75%,61.0,1.0,4.0,140.0,275.25,0.0,2.0,166.0,1.0,1.6,2.0,1.0,7.0,1.0
max,77.0,1.0,4.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,3.0,3.0,7.0,1.0


In [12]:
df.isnull().sum()

Age                                  0
Sex                                  0
ChestPainType                        0
RestingBP                            0
CholesterolLevel                     0
FastingBloodSugar                    0
RestingElectrocardiographicResult    0
MaxHeartRate                         0
ExerciseAngina                       0
STDepression                         0
STSegmentSlope                       0
NumMajorVessels                      0
ThalliumStressRest                   0
HeartDiseasePresent                  0
dtype: int64

In [13]:
df.drop_duplicates()

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,CholesterolLevel,FastingBloodSugar,RestingElectrocardiographicResult,MaxHeartRate,ExerciseAngina,STDepression,STSegmentSlope,NumMajorVessels,ThalliumStressRest,HeartDiseasePresent
0,63,1,1,145,233,1,2,150,0,2.3,3,0,6,0
1,67,1,4,160,286,0,2,108,1,1.5,2,3,3,1
2,67,1,4,120,229,0,2,129,1,2.6,2,2,7,1
3,37,1,3,130,250,0,0,187,0,3.5,3,0,3,0
4,41,0,2,130,204,0,2,172,0,1.4,1,0,3,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
294,57,0,4,140,241,0,0,123,1,0.2,2,0,7,1
295,45,1,1,110,264,0,0,132,0,1.2,2,0,7,1
296,68,1,4,144,193,1,0,141,0,3.4,2,2,7,1
297,57,1,4,130,131,0,0,115,1,1.2,2,1,7,1


In [14]:
df.shape

(300, 14)

In [8]:
# Splitting the data into predictors (X) and target variable (y)
X = df.drop('HeartDiseasePresent', axis=1)
y = df['HeartDiseasePresent']

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating a Random Forest Classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Training the Random Forest Classifier
rf_classifier.fit(X_train, y_train)

# Making predictions on the test set
y_pred = rf_classifier.predict(X_test)



Accuracy: 0.8333333333333334


# Model Evaluation - Accuracy Score

In [17]:
# Evaluating the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.8333333333333334


Our model is 83.3% accurate in predicting the rate of heart attack. This is not a bad figure, and therefore we can proceed to deploying/using our model.

In [15]:
y.head(10)

0    0
1    1
2    1
3    0
4    0
5    0
6    1
7    0
8    1
9    1
Name: HeartDiseasePresent, dtype: int64

In [16]:
X.head(10)

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,CholesterolLevel,FastingBloodSugar,RestingElectrocardiographicResult,MaxHeartRate,ExerciseAngina,STDepression,STSegmentSlope,NumMajorVessels,ThalliumStressRest
0,63,1,1,145,233,1,2,150,0,2.3,3,0,6
1,67,1,4,160,286,0,2,108,1,1.5,2,3,3
2,67,1,4,120,229,0,2,129,1,2.6,2,2,7
3,37,1,3,130,250,0,0,187,0,3.5,3,0,3
4,41,0,2,130,204,0,2,172,0,1.4,1,0,3
5,56,1,2,120,236,0,0,178,0,0.8,1,0,3
6,62,0,4,140,268,0,2,160,0,3.6,3,2,3
7,57,0,4,120,354,0,0,163,1,0.6,1,0,3
8,63,1,4,130,254,0,2,147,0,1.4,2,1,7
9,53,1,4,140,203,1,2,155,1,3.1,3,0,7
