Skip to content

Commit

Permalink
Use kable for nicer tables
Browse files Browse the repository at this point in the history
  • Loading branch information
rdpeng committed Sep 4, 2020
1 parent 95d36ad commit 4d67460
Show file tree
Hide file tree
Showing 6 changed files with 12 additions and 9 deletions.
Binary file modified manuscript/images/inferencepred-unnamed-chunk-10-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified manuscript/images/inferencepred-unnamed-chunk-11-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified manuscript/images/inferencepred-unnamed-chunk-3-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified manuscript/images/inferencepred-unnamed-chunk-4-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified manuscript/images/inferencepred-unnamed-chunk-5-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
21 changes: 12 additions & 9 deletions manuscript/inferencepred.Rmd
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# Inference vs. Prediction: Implications for Modeling Strategy

```{r,message=FALSE,warning=FALSE,echo=FALSE}
```{r,include=FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE,
comment = NA, fig.path = "images/inferencepred-")
library(dplyr)
library(ggplot2)
library(tidyverse)
library(broom)
library(knitr)
```

Understanding whether you're answering an inferential question versus a prediction question is an important concept because the type of question you're answering can greatly influence the modeling strategy you pursue. If you do not clearly understand which type of question you are asking, you may end up using the wrong type of modeling approach and ultimately make the wrong conclusions from your data. The purpose of this chapter is to show you what can happen when you confuse one question for another.
Expand Down Expand Up @@ -60,7 +61,8 @@ There doesn't appear to be much going on there, and a simple linear regression m

```{r}
fit0 <- lm(log(death) ~ pm10tmean, data = ny)
summary(fit0)$coefficients
summary(fit0)$coefficients %>%
kable(digits = 6)
```

In the table of coefficients above, the coefficient for `pm10tmean` is quite small and its standard error is relatively large. Effectively, this estimate of the association is zero.
Expand All @@ -72,7 +74,8 @@ Here are the results for a second model, which includes both PM10 and season. Se

```{r}
fit1 <- lm(log(death) ~ season + pm10tmean, data = ny)
summary(fit1)$coefficients
summary(fit1)$coefficients %>%
kable(digits = 4)
```

Notice now that the `pm10tmean` coefficient is quite a bit larger than before and its `t value` is large, suggesting a strong association. How is this possible?
Expand All @@ -86,7 +89,8 @@ In the following model we include temperature (`tmpd`) and dew point temperature

```{r}
fit2 <- lm(log(death) ~ date + season + tmpd + dptp + pm10tmean, data = ny)
summary(fit2)$coefficients
summary(fit2)$coefficients %>%
kable(digits = 4)
```

Notice that the `pm10tmean` coefficient is even bigger than it was in the previous model. There appears to still be an association between PM10 and mortality. The effect size is small, but we will discuss that later.
Expand All @@ -95,7 +99,8 @@ Finally, another class of potential confounders includes other pollutants. Befor

```{r}
fit3 <- lm(log(death) ~ date + season + tmpd + dptp + no2tmean + pm10tmean, data = ny)
summary(fit3)$coefficients
summary(fit3)$coefficients %>%
kable(digits = 4)
```

Notice in the table of coefficients that the `no2tmean` coefficient is similar in magnitude to the `pm10tmean` coefficient, although its `t value` is not as large. The `pm10tmean` coefficient appears to be statistically significant, but it is somewhat smaller in magnitude now.
Expand Down Expand Up @@ -139,8 +144,6 @@ Notice that the variable `pm10tmean` comes near the bottom of the list in terms
However, just because PM10 is not a strong predictor of mortality doesn't mean that it does not have a relevant association with mortality. Given the tradeoffs that have to be made when developing a prediction model, PM10 is not high on the list of predictors that we would include--we simply cannot include every predictor.




## Summary

In any data analysis, you want to ask yourself "Am I asking an inferential question or a prediction question?" This should be cleared up *before* any data are analyzed, as the answer to the question can guide the entire modeling strategy. In the example here, if we had decided on a prediction approach, we might have erroneously thought that PM10 was not relevant to mortality. However, the inferential approach suggested a statistically significant association with mortality. Framing the question right, and applying the appropriate modeling strategy, can play a large role in the kinds of conclusions you draw from the data.

0 comments on commit 4d67460

Please sign in to comment.