# Predicting Blood Pressure Based on Diet
By: [Kelly Wu](https://www.linkedin.com/in/kelly-wu-nj/)

## Problem Statement

We have all gone to the doctor at least once in our life and went through the basic health checks: height, weight, temperature, and blood pressure. We're all familiar with the process of having our arm squeezed tightly by a cuff while a our doctor listens in with his or her stethoscope and watches the little monitor intently. Then we all hope to never hear that we have high blood pressure. High blood pressure, also known as a "silent killer," normally doesn't induce any health symptoms, but can lead to a heart attack or stroke. What's worse is that hypertension is so common that it is a leading risk for death and disability worldwide (Dr. Paul Whelton, an expert in hypertension and kidney disease at Tulane University). 

What is blood pressure? Blood pressure is given as two numbers. The first number represents the pressure in your blood vessels as the heart beats (systolic pressure). The second is the pressure as your heart relaxes and fills with blood (diastolic pressure). Normal blood pressure is considered to be 120/80 or lower, while high blood pressure is considered to be 140/90 or higher. So what affects blood pressure? There are numerous factors that can affect blood pressure and it's normal for it to fluctuate throughout the day. The time of day, the foods you eat, and stress are a few contributors to blood pressure. 

Food is essential to life, but majority of the population simply eat what's good to them or what's convenient. There aren't many people who actually go through the trouble of calculating the necessary macronutrients needed on a daily basis. Culture also has another factor that affects diet where maybe certain foods like rice is the primary carbohydrate versus pasta in another culture. With such differences in diet and diet being a factor that affects blood pressure, can we predict blood pressure simply based on what we eat? Maybe our predictions will cause us to rethink what we eat or put more consideration into eating more variety. 

Stakeholders and metrics.

## Executive Summary

We initially begin by gathering public data from The National Health and Nurtition Examination Survey (NHANES) from 2013 - 2014. Luckily, [Kaggle](https://www.kaggle.com/cdc/national-health-and-nutrition-examination-survey#diet.csv) organized the data for us into simple `.csv` files.

## Contents: 
- [Imports](#Imports)
- [Cleaning Our Data](#Cleaning-Our-Data)
- [Exploratory Data Analysis](#Exploratory-Data-Analysis)
    - [Visualizations](#Visualizations)
- [Preprocessing](#Preprocessing)
- [Regression Modeling](#Regression-Modeling)
    - [Lasso](#Lasso)
    - [Ridge](#Ridge)
    - [Elastic Net](#Elastic-Net)
- [Outside Research](#Outside-Research)
- [Conclusions](#Conclusions)
- [Recommendations](#Recommendations)
- [Sources](#Sources)

### Imports
[Back to Contents](#Contents:)

In [1]:
import pandas as pd

In [2]:
diet = pd.read_csv('./datasets/diet.csv')
exams = pd.read_csv('./datasets/examination.csv')

In [3]:
diet.head()

Unnamed: 0,SEQN,WTDRD1,WTDR2D,DR1DRSTZ,DR1EXMER,DRABF,DRDINT,DR1DBIH,DR1DAY,DR1LANG,...,DRD370QQ,DRD370R,DRD370RQ,DRD370S,DRD370SQ,DRD370T,DRD370TQ,DRD370U,DRD370UQ,DRD370V
0,73557,16888.327864,12930.890649,1,49.0,2.0,2.0,6.0,2.0,1.0,...,,,,,,,,,,
1,73558,17932.143865,12684.148869,1,59.0,2.0,2.0,4.0,1.0,1.0,...,,2.0,,2.0,,2.0,,2.0,,2.0
2,73559,59641.81293,39394.236709,1,49.0,2.0,2.0,18.0,6.0,1.0,...,,,,,,,,,,
3,73560,142203.069917,125966.366442,1,54.0,2.0,2.0,21.0,3.0,1.0,...,,,,,,,,,,
4,73561,59052.357033,39004.892993,1,63.0,2.0,2.0,18.0,1.0,1.0,...,,2.0,,2.0,,2.0,,2.0,,2.0


In [4]:
exams.head()

Unnamed: 0,SEQN,PEASCST1,PEASCTM1,PEASCCT1,BPXCHR,BPAARM,BPACSZ,BPXPLS,BPXPULS,BPXPTY,...,CSXLEAOD,CSXSOAOD,CSXGRAOD,CSXONOD,CSXNGSOD,CSXSLTRT,CSXSLTRG,CSXNART,CSXNARG,CSAEFFRT
0,73557,1,620.0,,,1.0,4.0,86.0,1.0,1.0,...,2.0,1.0,1.0,1.0,4.0,62.0,1.0,,,1.0
1,73558,1,766.0,,,1.0,4.0,74.0,1.0,1.0,...,3.0,1.0,2.0,3.0,4.0,28.0,1.0,,,1.0
2,73559,1,665.0,,,1.0,4.0,68.0,1.0,1.0,...,2.0,1.0,2.0,3.0,4.0,49.0,1.0,,,3.0
3,73560,1,803.0,,,1.0,2.0,64.0,1.0,1.0,...,,,,,,,,,,
4,73561,1,949.0,,,1.0,3.0,92.0,1.0,1.0,...,3.0,1.0,4.0,3.0,4.0,,,,,1.0


In [6]:
diet.shape

(9813, 168)

In [7]:
exams.shape

(9813, 224)

### Cleaning Our Data
[Back to Contents](#Contents:)

### Exploratory Data Analysis 
[Back to Contents](#Contents:)

### Visualizations
[Back to Contents](#Contents:)

### Preprocessing 
[Back to Contents](#Contents:)

### Regression Modeling 
[Back to Contents](#Contents:)

### Lasso 
[Back to Contents](#Contents:)

### Ridge 
[Back to Contents](#Contents:)

### Elastic Net
[Back to Contents](#Contents:)

### Outside Research
[Back to Contents](#Contents:)

### Conclusions 
[Back to Contents](#Contents:)

### Recommendations 
[Back to Contents](#Contents:)

### Sources
[Back to Contents](#Contents:)
- [Blood Pressure Matters](https://newsinhealth.nih.gov/2016/01/blood-pressure-matters)
- [Dietary Data](https://wwwn.cdc.gov/nchs/nhanes/Search/DataPage.aspx?Component=Dietary&CycleBeginYear=2013)
- [Dietary Variable List](https://wwwn.cdc.gov/Nchs/Nhanes/Search/variablelist.aspx?Component=Dietary&CycleBeginYear=2013)
- [Examination Data](https://wwwn.cdc.gov/Nchs/Nhanes/Search/DataPage.aspx?Component=Examination&CycleBeginYear=2013)
- [Examination Variable List](https://wwwn.cdc.gov/Nchs/Nhanes/Search/variablelist.aspx?Component=Examination&CycleBeginYear=2013)
- [NHANES Datasets](https://www.kaggle.com/cdc/national-health-and-nutrition-examination-survey#diet.csv)