## Be Heart Smart

In [71]:
# Import our dependencies
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import numpy as np

In [72]:
%matplotlib notebook

Features:

1. Age | Objective Feature | age | int (days)
2. Height | Objective Feature | height | int (cm) |
3. Weight | Objective Feature | weight | float (kg) |
4. Gender | Objective Feature | gender | categorical code |1= women, 2= men
5. Systolic blood pressure | Examination Feature | ap_hi | int |
6. Diastolic blood pressure | Examination Feature | ap_lo | int |
7. Cholesterol | Examination Feature | cholesterol | 1: Normal (<200), 2: Moderate (200 - 239), 3: High (>240) |
8. Glucose | Examination Feature | gluc | 1: Normal (<100), 2:Moderate (100 - 125), 3: High (>126) |
9. Smoking | Subjective Feature | smoke | binary |
10. Alcohol intake | Subjective Feature | alco | binary |
11. Physical activity | Subjective Feature | active | binary |
12. Presence or absence of cardiovascular disease | Target Variable | cardio | binary |


In [75]:
path = ("../Resources/cardio_data_cleaned.csv")
cardio_df = pd.read_csv(path)
cardio_df.head()

Unnamed: 0,Age,gender,height,weight,systolic_bp,diastolic_bp,cholesterol,glucose,smoker,alcohol_intake,active,cardio
0,51,1,171,29.0,110.0,70.0,2,1,0,0,1,1
1,49,1,160,30.0,120.0,80.0,1,1,0,0,1,1
2,58,1,143,30.0,103.0,61.0,2,1,0,0,1,0
3,47,2,170,31.0,150.0,90.0,2,2,0,0,1,1
4,42,1,146,32.0,100.0,70.0,1,1,0,0,0,0


In [34]:
# list of column names
cardio_df.columns

Index(['Drop_me', 'id', 'age', 'gender', 'height', 'weight', 'systolic_bp',
       'diastolic_bp', 'cholesterol', 'glucose', 'smoker', 'alcohol_intake',
       'active', 'cardio'],
      dtype='object')

In [77]:
# Check the data type
cardio_df.dtypes

Age                 int64
gender              int64
height              int64
weight            float64
systolic_bp       float64
diastolic_bp      float64
cholesterol         int64
glucose             int64
smoker              int64
alcohol_intake      int64
active              int64
cardio              int64
dtype: object

In [78]:
# Checking for null values
cardio_df.count()

Age               68297
gender            68297
height            68297
weight            68297
systolic_bp       68297
diastolic_bp      68297
cholesterol       68297
glucose           68297
smoker            68297
alcohol_intake    68297
active            68297
cardio            68297
dtype: int64

In [79]:
cardio_df.shape

(68297, 12)

## Cleaning up the dataset

In [80]:
data = [cardio_df["systolic_bp"]]
fig1, ax1 = plt.subplots(figsize = (8,8))
ax1.set_title('Systole outliers')
ax1.boxplot(data, notch= True)
plt.show()

<IPython.core.display.Javascript object>

In [81]:
data = [cardio_df["diastolic_bp"]]
fig1, ax1 = plt.subplots(figsize = (8,8))
ax1.set_title('Diastole outliers')
ax1.boxplot(data, notch= True)
plt.show()

<IPython.core.display.Javascript object>

In [82]:
data = [cardio_df["weight"]]
fig1, ax1 = plt.subplots(figsize = (8,8))
ax1.set_title('Weight outliers')
ax1.boxplot(data, notch= True)
plt.show()

<IPython.core.display.Javascript object>

In [83]:
data = [cardio_df["height"]]
fig1, ax1 = plt.subplots(figsize = (8,8))
ax1.set_title('Height outliers')
ax1.boxplot(data, notch= True)
plt.show()

<IPython.core.display.Javascript object>

In [84]:
cholesterol_cardio_df = cardio_df["cardio"].groupby(cardio_df["cholesterol"]).mean()
cholesterol_cardio_df.head()

cholesterol
1    0.434299
2    0.594469
3    0.761380
Name: cardio, dtype: float64

In [85]:
cholesterol_cardio_df.plot.bar(color='r', alpha=0.5, align="center")
# Create labels for the x and y axes.
plt.xlabel("Cholesterol levels")
plt.ylabel("Number")
# Create a title.
plt.title("Cardiac disease based on Cholesterol Levels")
# Add the legend.
plt.legend()


<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7fdc2ae1aeb0>

In [86]:
print(f" People with very high Cholesterol level have a greater chance of developing heart disease.")

 People with very high Cholesterol level have a greater chance of developing heart disease.


In [87]:
glucose_cardio_df = cardio_df["cardio"].groupby(cardio_df["glucose"]).mean()
glucose_cardio_df.head()

glucose
1    0.474391
2    0.584830
3    0.616438
Name: cardio, dtype: float64

In [88]:
glucose_cardio_df.plot.bar(color='green', alpha=0.5, align="center")
# Create labels for the x and y axes.
plt.xlabel("Glucose levels")
plt.ylabel("Number")
# Create a title.
plt.title("Cardiac disease based on Glucose Levels")
# Add the legend.
plt.legend()

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7fdc2ae63fa0>

In [89]:
print(f" People with very high Glucose level have a greater chance of developing heart disease.")

 People with very high Glucose level have a greater chance of developing heart disease.


In [90]:
gender_cardio_df = cardio_df["cardio"].groupby(cardio_df["gender"]).mean()
gender_cardio_df.head()

gender
1    0.490967
2    0.497583
Name: cardio, dtype: float64

In [91]:
gender_cardio_df.plot.bar(color='blue', alpha=0.5, align="center")
# Create labels for the x and y axes.
plt.xlabel("gender")
plt.ylabel("Number")
# Create a title.
plt.title("Cardiac disease based on Gender")
# Add the legend.
plt.legend()

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7fdc291d4b50>

In [92]:
alcohol_cardio_df = cardio_df["cardio"].groupby(cardio_df["alcohol_intake"]).mean()
alcohol_cardio_df.head()

alcohol_intake
0    0.494356
1    0.473974
Name: cardio, dtype: float64

In [94]:
alcohol_cardio_df.plot.bar(color='cyan', alpha=0.5, align="center")
# Create labels for the x and y axes.
plt.xlabel("Alcohol Consumption")
plt.ylabel("Number")
# Create a title.
plt.title("Cardiac disease based on Alcohol Consumption")
# Add the legend.
plt.legend()

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7fdc352e9df0>

In [95]:
print(f" Alcohol consumption alone doesnot Contribute to developing heart disease.")

 Alcohol consumption alone doesnot Contribute to developing heart disease.
