![](https://i1.wp.com/www.md2c.nl/wp-content/uploads/2016/06/predict-heart-disease.jpg?w=480)

The Framingham Heart Study (FHS) is dedicated to identifying common factors or characteristics that contribute to cardiovascular disease (CVD). In 1948, an original cohort of 5,209 men and women between 30 and 62 years old were recruited from Framingham, MA. An Offspring Cohort began in 1971, an Omni Cohort in 1994, a Third Generation Cohort in 2002, a New Offspring Spouse Cohort in 2004 and a Second Generation Omni Cohort in 2003. Core research in the dataset focuses on cardiovascular and cerebrovascular diseases. The data include biological specimens, molecular genetic data, phenotype data, samples, images, participant vascular functioning data, physiological data, demographic data, and ECG data.It is a collaborative project of the National Heart, Lung and Blood Institute and Boston University.

#### Attributes

#### Demographic
* Sex: male or female
* Age: Age of the patient
* Education: no further information provided

#### Behavioral
* Current Smoker: whether or not the patient is a current smoker
* Cigs Per Day: the number of cigarettes that the person smoked on average in one day

#### Information on medical history
* BP Meds: whether or not the patient was on blood pressure medication 
* Prevalent Stroke: whether or not the patient had previously had a stroke
* Prevalent Hyp: whether or not the patient was hypertensive 
* Diabetes: whether or not the patient had diabetes

#### Information on current medical condition
* Tot Chol: total cholesterol level
* Sys BP: systolic blood pressure
* Dia BP: diastolic blood pressure
* BMI: Body Mass Index
* Heart Rate: heart rate - In medical research, variables such as heart rate though in fact discrete, yet are considered continuous because of large number of possible values.
* Glucose: glucose level

#### Target variable to predict
* TenYearCHD: 10 year risk of coronary heart disease (binary: “1:Yes”, “0:No”)

### Importing Libraries 

In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

## Importing Dataset and Creating X and y arrays

In [None]:
df=pd.read_csv('/kaggle/input/heart-disease-prediction-using-logistic-regression/framingham.csv')
df.dropna(axis=0, inplace=True)
X=df.iloc[:,:-1].values
y=df.iloc[:,-1].values
df.head()

In [None]:
df.isnull().sum()

## Mean Difference

In [None]:
df_1=df[df['TenYearCHD']==1]
df_2=df[df['TenYearCHD']==0]
print(df_1['glucose'].mean()-df_2['glucose'].mean())

## Splitting the dataset into the Training set and Test set

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.30,random_state=42)


## Scaling

In [None]:
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## Logistic Regression

In [None]:
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state=0)
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm=confusion_matrix(y_test,y_pred)
print(cm)
recall = cm[0][0]/(cm[0][0] + cm[1][0])
print(recall)

## Decision Tree

In [None]:
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(random_state = 0)
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm=confusion_matrix(y_test,y_pred)
print(cm)
recall = cm[0][0]/(cm[0][0] + cm[1][0])
print(recall)

## Random Forest

In [None]:
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, random_state = 0)
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm=confusion_matrix(y_test,y_pred)
print(cm)
recall = cm[0][0]/(cm[0][0] + cm[1][0])
print(recall)