# Practice notebook for univariate analysis using NHANES data

This notebook will give you the opportunity to perform some univariate analyses on your own using the NHANES.  These analyses are similar to what was done in the week 2 NHANES case study notebook.

You can enter your code into the cells that say "enter your code here", and you can type responses to the questions into the cells that say "Type Markdown and Latex".

Note that most of the code that you will need to write below is very similar to code that appears in the case study notebook.  You will need to edit code from that notebook in small ways to adapt it to the prompts below.

To get started, we will use the same module imports and read the data in the same way as we did in the case study:

In [31]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import statsmodels.api as sm
import numpy as np

da = pd.read_csv("nhanes_2015_2016.csv")

da.columns

Index(['SEQN', 'ALQ101', 'ALQ110', 'ALQ130', 'SMQ020', 'RIAGENDR', 'RIDAGEYR',
       'RIDRETH1', 'DMDCITZN', 'DMDEDUC2', 'DMDMARTL', 'DMDHHSIZ', 'WTINT2YR',
       'SDMVPSU', 'SDMVSTRA', 'INDFMPIR', 'BPXSY1', 'BPXDI1', 'BPXSY2',
       'BPXDI2', 'BMXWT', 'BMXHT', 'BMXBMI', 'BMXLEG', 'BMXARML', 'BMXARMC',
       'BMXWAIST', 'HIQ210'],
      dtype='object')

## Question 1

Relabel the marital status variable [DMDMARTL](https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.htm#DMDMARTL) to have brief but informative character labels.  Then construct a frequency table of these values for all people, then for women only, and for men only.  Then construct these three frequency tables using only people whose age is between 30 and 40.

In [33]:
da["DMDMARTLx"] = da.DMDMARTL.replace({1: "Married", 2: "Widowed", 3: "Divorced", 4: "Separated",
                                        5: "Never married", 6: "Living with partner", 77: "Refused",
                                        99: "Don't know"})

da["DMDMARTLx"] = da.DMDMARTLx.fillna("Missing")
print("Status frequency table - All people\n")
print(da.DMDMARTLx.value_counts(), "\n")

da["RIAGENDRx"] = da.RIAGENDR.replace({1: "Male", 2: "Female"})

male = da.loc[da["RIAGENDRx"] == "Male"]
print("Status frequency table - Male\n")
print(male["DMDMARTLx"].value_counts(), "\n")

female = da.loc[da["RIAGENDRx"] == "Female"]
print("Status frequency table - Female\n")
print(female["DMDMARTLx"].value_counts(), "\n")

# print(da.groupby("RIAGENDRx")["DMDMARTLx"].value_counts(), "\n") shows both men and female

# age_interval = (da.RIDAGEYR >= 30) & (da.RIDAGEYR <= 40)
# da["agegrp"] = pd.cut(da.RIDAGEYR, [30, 40])

age_interval_all = da.loc[(da.RIDAGEYR >= 30) & (da.RIDAGEYR <= 40)]
print("Status frequency table - All people with age between 30-40\n")
print(age_interval_all["DMDMARTLx"].value_counts(), "\n")

age_interval_male = da.loc[(da.RIDAGEYR >= 30) & (da.RIDAGEYR <= 40) & (da.RIAGENDRx == "Male")]
print("Status frequency table - Men with age between 30-40\n")
print(age_interval_male["DMDMARTLx"].value_counts(), "\n")

age_interval_female = da.loc[(da.RIDAGEYR >= 30) & (da.RIDAGEYR <= 40) & (da.RIAGENDRx == "Female")]
print("Status frequency table - Women with age between 30-40\n")
print(age_interval_female["DMDMARTLx"].value_counts(), "\n")

# print(da.groupby("agegrp")["DMDMARTLx"].value_counts(), "\n")
# print(da.groupby(["agegrp", "RIAGENDRx"])["DMDMARTLx"].value_counts(), "\n")
da.columns

Status frequency table - All people

Married                2780
Never married          1004
Divorced                579
Living with partner     527
Widowed                 396
Missing                 261
Separated               186
Refused                   2
Name: DMDMARTLx, dtype: int64 

Status frequency table - Male

Married                1477
Never married           484
Living with partner     265
Divorced                229
Missing                 135
Widowed                 100
Separated                68
Refused                   1
Name: DMDMARTLx, dtype: int64 

Status frequency table - Female

Married                1303
Never married           520
Divorced                350
Widowed                 296
Living with partner     262
Missing                 126
Separated               118
Refused                   1
Name: DMDMARTLx, dtype: int64 

Status frequency table - All people with age between 30-40

Married                560
Never married          217
Living with partn

Index(['SEQN', 'ALQ101', 'ALQ110', 'ALQ130', 'SMQ020', 'RIAGENDR', 'RIDAGEYR',
       'RIDRETH1', 'DMDCITZN', 'DMDEDUC2', 'DMDMARTL', 'DMDHHSIZ', 'WTINT2YR',
       'SDMVPSU', 'SDMVSTRA', 'INDFMPIR', 'BPXSY1', 'BPXDI1', 'BPXSY2',
       'BPXDI2', 'BMXWT', 'BMXHT', 'BMXBMI', 'BMXLEG', 'BMXARML', 'BMXARMC',
       'BMXWAIST', 'HIQ210', 'DMDMARTLx', 'RIAGENDRx'],
      dtype='object')