# Practice notebook for univariate analysis using NHANES data

This notebook will give you the opportunity to perform some univariate analyses on your own using the NHANES.  These analyses are similar to what was done in the week 2 NHANES case study notebook.

You can enter your code into the cells that say "enter your code here", and you can type responses to the questions into the cells that say "Type Markdown and Latex".

Note that most of the code that you will need to write below is very similar to code that appears in the case study notebook.  You will need to edit code from that notebook in small ways to adapt it to the prompts below.

To get started, we will use the same module imports and read the data in the same way as we did in the case study:

In [3]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import statsmodels.api as sm
import numpy as np

df = pd.read_csv("nhanes_2015_2016.csv")

In [4]:
df.head()

Unnamed: 0,SEQN,ALQ101,ALQ110,ALQ130,SMQ020,RIAGENDR,RIDAGEYR,RIDRETH1,DMDCITZN,DMDEDUC2,...,BPXSY2,BPXDI2,BMXWT,BMXHT,BMXBMI,BMXLEG,BMXARML,BMXARMC,BMXWAIST,HIQ210
0,83732,1.0,,1.0,1,1,62,3,1.0,5.0,...,124.0,64.0,94.8,184.5,27.8,43.3,43.6,35.9,101.1,2.0
1,83733,1.0,,6.0,1,1,53,3,2.0,3.0,...,140.0,88.0,90.4,171.4,30.8,38.0,40.0,33.2,107.9,
2,83734,1.0,,,1,1,78,3,1.0,3.0,...,132.0,44.0,83.4,170.1,28.8,35.6,37.0,31.0,116.5,2.0
3,83735,2.0,1.0,1.0,2,2,56,3,1.0,5.0,...,134.0,68.0,109.8,160.9,42.4,38.5,37.7,38.3,110.1,2.0
4,83736,2.0,1.0,1.0,2,2,42,4,1.0,4.0,...,114.0,54.0,55.2,164.9,20.3,37.4,36.0,27.2,80.4,2.0


## Question 1

Relabel the marital status variable [DMDMARTL](https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.htm#DMDMARTL) to have brief but informative character labels.  Then construct a frequency table of these values for all people, then for women only, and for men only.  Then construct these three frequency tables using only people whose age is between 30 and 40.

In [7]:
df['DMDMARTLx'] = df.DMDMARTL.replace({1:'Married', 2:'Widowed', 3:'Divorced', 4: 'Separated',
                                      5: 'Never married', 6: 'Living w/partner', 77:'Refused', 99:'Dont know'})

In [8]:
df.DMDMARTLx.value_counts()

Married             2780
Never married       1004
Divorced             579
Living w/partner     527
Widowed              396
Separated            186
Refused                2
Name: DMDMARTLx, dtype: int64

In [9]:
df["RIAGENDRx"] = df.RIAGENDR.replace({1: "Male", 2: "Female"})

In [12]:
df.groupby('RIAGENDRx').DMDMARTLx.value_counts().unstack()

DMDMARTLx,Divorced,Living w/partner,Married,Never married,Refused,Separated,Widowed
RIAGENDRx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Female,350,262,1303,520,1,118,296
Male,229,265,1477,484,1,68,100


In [15]:
df.groupby('RIAGENDRx').DMDMARTLx.value_counts().unstack().apply(lambda x: x/x.sum(), axis=1)

DMDMARTLx,Divorced,Living w/partner,Married,Never married,Refused,Separated,Widowed
RIAGENDRx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Female,0.122807,0.09193,0.457193,0.182456,0.000351,0.041404,0.10386
Male,0.087271,0.100991,0.562881,0.184451,0.000381,0.025915,0.03811


In [19]:
df[(df.RIDAGEYR>=30) & (df.RIDAGEYR<=40)].DMDMARTLx.value_counts()

Married             560
Never married       217
Living w/partner    143
Divorced             70
Separated            30
Widowed               5
Refused               1
Name: DMDMARTLx, dtype: int64

In [20]:
df[(df.RIDAGEYR>=30) & (df.RIDAGEYR<=40)].groupby('RIAGENDRx').DMDMARTLx.value_counts().unstack()

DMDMARTLx,Divorced,Living w/partner,Married,Never married,Refused,Separated,Widowed
RIAGENDRx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Female,46.0,65.0,285.0,116.0,,18.0,2.0
Male,24.0,78.0,275.0,101.0,1.0,12.0,3.0
