#### Unit 1 - Example 1 (NHANES)

The National Health and Nutrition Examination Survey (NHANES) is a survey conducted annually by the US National Center for Health Statistics (NCHS). While the original data uses a survey design that oversamples certain subpopulations, the data have been reweighted to undo oversampling effects and can be treated as if it were a simple random sample from the American population.

The following questions will be explored with the NHANES data:

- At what age do Americans seem to reach full adult height?

- What proportion of Americans age 25 or older have a college degree?

- What is the relationship between education level and income?

- How much more likely is it that someone \textit{not} physically active has diabetes, compared to someone who is active?

The reweighted NHANES data are available from the NHANES package. To view the complete list of study variables and their descriptions, access the NHANES documentation page with \texttt{?NHANES}.

For convenience, descriptions of the variables used in this lab exercise are included below.

`Age`: age in years at screening. Subjects 80 years or older were recorded as 80 years of age.

`Education`: highest educational level of study participant, reported for participants aged 20 years or older. Recorded as either `8th Grade`, `9 - 11th Grade`, `High School`, `Some College`, or `College Grad`.

`Poverty`: a ratio of family income to poverty guidelines. Smaller numbers indicate more poverty; i.e., a number below 1 indicates income below the poverty level.

`Weight`: weight, measured in kilograms.

`Height`: standing height, measured in centimeters.

`Diabetes`: `Yes` if the participant was told by a health professional that they have diabetes, `No` otherwise.

`PhysActive`: coded `Yes` if the participant does moderate or vigorous-intensity sports, fitness, or recreational activities; `No` otherwise. Reported for participants 12 years or older.

We load and print the first six rows of the data as follows as follows:

In [None]:
require(NHANES)
head(NHANES)

Remember to load packages that you might need for this example!

In [None]:
require(dplyr)
require(ggplot2)

#### Question 1.

a) Describe in words the distribution of ages for the study participants.

b) Using numerical and graphical summaries, describe the distribution of heights among study participants in terms of inches. Note that 1 centimeter is approximately 0.39 inches.

c) Use the following code to draw a random sample of 200 participants from the entire dataset. Using the random sample, `nhanes.samp`, investigate at which age people generally reach their adult height. Is it possible to do the same for weight; why or why not?

In [None]:
set.seed(5011)
nhanes.samp <- sample_n(NHANES, 200, replace = FALSE)

#### Question 2.

a) Calculate the median and interquartile range of the distribution of the variable `Poverty`. Write a sentence explaining the median in the context of these data.



b) Compare the distribution of `Poverty` across each group in `Education` among adults (defined as individuals 25 years of age or older). Describe any trends or interesting observations.

#### Question 3.

a) What proportion of Americans at least 25 years of age are college graduates?

b) What proportion of Americans with a high school degree are college graduates?

#### Question 4.

a) Construct a two-way table, with `PhysActive` as the row variable and `Diabetes` as the column variable. Among participants who are not physically active, what proportion have diabetes? What proportion of physically active participants have diabetes?

b) In this context, relative risk is the ratio of the proportion of participants who have diabetes among those who are not physically active to the proportion of participants with diabetes among those physically active. Relative risks greater than 1 indicate that people who are not physically active seem to be at a higher risk for diabetes than physically active people. Calculate the relative risk of diabetes for the participants.

From these calculations, is it possible to conclude that being physically active reduces one's chance of becoming diabetic?