# NHANES Data Exploration using Python

We will start importing the pandas library.

In [1]:
import pandas as pd

Now we will load *nhanes* data using the `read_csv` function.

In [2]:
file = 'nhanes_2015_2016.csv'
df = pd.read_csv(file)

Get the number of rows and columns on the dataframe.

In [3]:
df.shape

(5735, 28)

### Exploring the Contents

To show the columns use the `columns` property.

In [6]:
df.columns

Index(['SEQN', 'ALQ101', 'ALQ110', 'ALQ130', 'SMQ020', 'RIAGENDR', 'RIDAGEYR',
       'RIDRETH1', 'DMDCITZN', 'DMDEDUC2', 'DMDMARTL', 'DMDHHSIZ', 'WTINT2YR',
       'SDMVPSU', 'SDMVSTRA', 'INDFMPIR', 'BPXSY1', 'BPXDI1', 'BPXSY2',
       'BPXDI2', 'BMXWT', 'BMXHT', 'BMXBMI', 'BMXLEG', 'BMXARML', 'BMXARMC',
       'BMXWAIST', 'HIQ210'],
      dtype='object')

To get the datatypes of each columns.

In [7]:
df.dtypes

SEQN          int64
ALQ101      float64
ALQ110      float64
ALQ130      float64
SMQ020        int64
RIAGENDR      int64
RIDAGEYR      int64
RIDRETH1      int64
DMDCITZN    float64
DMDEDUC2    float64
DMDMARTL    float64
DMDHHSIZ      int64
WTINT2YR    float64
SDMVPSU       int64
SDMVSTRA      int64
INDFMPIR    float64
BPXSY1      float64
BPXDI1      float64
BPXSY2      float64
BPXDI2      float64
BMXWT       float64
BMXHT       float64
BMXBMI      float64
BMXLEG      float64
BMXARML     float64
BMXARMC     float64
BMXWAIST    float64
HIQ210      float64
dtype: object

### Slicing a data set

Extract one person educational attainment.

In [13]:
w = df['DMDEDUC2']
x = df.loc[: 'DMDEDUC2']
y = df.DMDEDUC2
z = df.iloc[:,9]

In [14]:
w,x,y,z

(0       5.0
 1       3.0
 2       3.0
 3       5.0
 4       4.0
        ... 
 5730    3.0
 5731    5.0
 5732    4.0
 5733    1.0
 5734    5.0
 Name: DMDEDUC2, Length: 5735, dtype: float64,
        SEQN  ALQ101  ALQ110  ALQ130  SMQ020  RIAGENDR  RIDAGEYR  RIDRETH1  \
 0     83732     1.0     NaN     1.0       1         1        62         3   
 1     83733     1.0     NaN     6.0       1         1        53         3   
 2     83734     1.0     NaN     NaN       1         1        78         3   
 3     83735     2.0     1.0     1.0       2         2        56         3   
 4     83736     2.0     1.0     1.0       2         2        42         4   
 ...     ...     ...     ...     ...     ...       ...       ...       ...   
 5730  93695     2.0     2.0     NaN       1         2        76         3   
 5731  93696     2.0     2.0     NaN       2         1        26         3   
 5732  93697     1.0     NaN     1.0       1         2        80         3   
 5733  93700     NaN     NaN  

Get the maximum value of educational attainment.

In [17]:
df.DMDEDUC2.max()

9.0

Checking the types of the values above.

In [18]:
print(type(df))
print(type(df['DMDEDUC2']))
print(type(df.DMDEDUC2))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>


To get one row of the dataframe.

In [22]:
df.iloc[3,:]

SEQN         83735.0
ALQ101           2.0
ALQ110           1.0
ALQ130           1.0
SMQ020           2.0
RIAGENDR         2.0
RIDAGEYR        56.0
RIDRETH1         3.0
DMDCITZN         1.0
DMDEDUC2         5.0
DMDMARTL         6.0
DMDHHSIZ         1.0
WTINT2YR    102718.0
SDMVPSU          1.0
SDMVSTRA       131.0
INDFMPIR         5.0
BPXSY1         132.0
BPXDI1          72.0
BPXSY2         134.0
BPXDI2          68.0
BMXWT          109.8
BMXHT          160.9
BMXBMI          42.4
BMXLEG          38.5
BMXARML         37.7
BMXARMC         38.3
BMXWAIST       110.1
HIQ210           2.0
Name: 3, dtype: float64

### Missing values

To check for missing values we use the functions `isnull` and `notnull`.

In [29]:
pd.isnull(df.DMDEDUC2).sum()

261

In [25]:
pd.notnull(df.DMDEDUC2).sum()

5474