Skip to content

Commit

Permalink
Add FAQ section header
Browse files Browse the repository at this point in the history
  • Loading branch information
rdpeng committed Sep 4, 2020
1 parent 4d67460 commit eb7c315
Show file tree
Hide file tree
Showing 7 changed files with 45 additions and 37 deletions.
Binary file modified manuscript/images/inferencepred-unnamed-chunk-10-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified manuscript/images/inferencepred-unnamed-chunk-11-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified manuscript/images/inferencepred-unnamed-chunk-3-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified manuscript/images/inferencepred-unnamed-chunk-4-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified manuscript/images/inferencepred-unnamed-chunk-5-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions manuscript/inferencepred.Rmd
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
---
output:
html_document:
keep_md: yes
---
# Inference vs. Prediction: Implications for Modeling Strategy

```{r,include=FALSE}
Expand Down Expand Up @@ -143,6 +148,8 @@ Notice that the variable `pm10tmean` comes near the bottom of the list in terms

However, just because PM10 is not a strong predictor of mortality doesn't mean that it does not have a relevant association with mortality. Given the tradeoffs that have to be made when developing a prediction model, PM10 is not high on the list of predictors that we would include--we simply cannot include every predictor.

## Frequently Asked Questions


## Summary

Expand Down
75 changes: 38 additions & 37 deletions manuscript/inferencepred.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
---
output:
html_document:
keep_md: yes
---
# Inference vs. Prediction: Implications for Modeling Strategy


Expand Down Expand Up @@ -39,11 +44,10 @@ Let's take a look at the bivariate association between PM10 and mortality. Here
There doesn't appear to be much going on there, and a simple linear regression model of the log of daily mortality and PM10 seems to confirm that.


```
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.08884308354 0.0069353779 733.75138151 0.0000000
pm10tmean 0.00004033446 0.0006913941 0.05833786 0.9535247
```
| | Estimate| Std. Error| t value| Pr(>|t|)|
|:-----------|--------:|----------:|----------:|------------------:|
|(Intercept) | 5.088843| 0.006935| 733.751382| 0.000000|
|pm10tmean | 0.000040| 0.000691| 0.058338| 0.953525|

In the table of coefficients above, the coefficient for `pm10tmean` is quite small and its standard error is relatively large. Effectively, this estimate of the association is zero.

Expand All @@ -53,14 +57,13 @@ Here are the results for a second model, which includes both PM10 and season. Se



```
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.166484285 0.0112629532 458.714886 0.000000e+00
seasonQ2 -0.109271301 0.0166902948 -6.546996 3.209291e-10
seasonQ3 -0.155503242 0.0169729148 -9.161847 1.736346e-17
seasonQ4 -0.060317619 0.0167189714 -3.607735 3.716291e-04
pm10tmean 0.001499111 0.0006156902 2.434847 1.558453e-02
```
| | Estimate| Std. Error| t value| Pr(>|t|)|
|:-----------|--------:|----------:|--------:|------------------:|
|(Intercept) | 5.1665| 0.0113| 458.7149| 0.0000|
|seasonQ2 | -0.1093| 0.0167| -6.5470| 0.0000|
|seasonQ3 | -0.1555| 0.0170| -9.1618| 0.0000|
|seasonQ4 | -0.0603| 0.0167| -3.6077| 0.0004|
|pm10tmean | 0.0015| 0.0006| 2.4348| 0.0156|

Notice now that the `pm10tmean` coefficient is quite a bit larger than before and its `t value` is large, suggesting a strong association. How is this possible?

Expand All @@ -72,35 +75,33 @@ In the following model we include temperature (`tmpd`) and dew point temperature



```
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.62066568788 0.16471183741 34.1242365 1.851690e-96
date -0.00002984198 0.00001315212 -2.2689856 2.411521e-02
seasonQ2 -0.05805970053 0.02299356287 -2.5250415 1.218288e-02
seasonQ3 -0.07655519887 0.02904104658 -2.6361033 8.906912e-03
seasonQ4 -0.03154694305 0.01832712585 -1.7213252 8.641910e-02
tmpd -0.00295931276 0.00128835065 -2.2969777 2.244054e-02
dptp 0.00068342228 0.00103489541 0.6603781 5.096144e-01
pm10tmean 0.00237049992 0.00065856022 3.5995189 3.837886e-04
```
| | Estimate| Std. Error| t value| Pr(>|t|)|
|:-----------|--------:|----------:|-------:|------------------:|
|(Intercept) | 5.6207| 0.1647| 34.1242| 0.0000|
|date | 0.0000| 0.0000| -2.2690| 0.0241|
|seasonQ2 | -0.0581| 0.0230| -2.5250| 0.0122|
|seasonQ3 | -0.0766| 0.0290| -2.6361| 0.0089|
|seasonQ4 | -0.0315| 0.0183| -1.7213| 0.0864|
|tmpd | -0.0030| 0.0013| -2.2970| 0.0224|
|dptp | 0.0007| 0.0010| 0.6604| 0.5096|
|pm10tmean | 0.0024| 0.0007| 3.5995| 0.0004|

Notice that the `pm10tmean` coefficient is even bigger than it was in the previous model. There appears to still be an association between PM10 and mortality. The effect size is small, but we will discuss that later.

Finally, another class of potential confounders includes other pollutants. Before we place blame on PM10 as a harmful pollutant, it's important that we examine whether there might be another pollutant that can explain what we're observing. NO2 is a good candidate because it shares some of the same sources as PM10 and is known to be related to mortality. Let's see what happens when we include that in the model.


```
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.61378604085 0.16440280471 34.1465345 2.548704e-96
date -0.00002973484 0.00001312231 -2.2659756 2.430503e-02
seasonQ2 -0.05143935218 0.02338034983 -2.2001105 2.871069e-02
seasonQ3 -0.06569205605 0.02990520457 -2.1966764 2.895825e-02
seasonQ4 -0.02750381423 0.01849165119 -1.4873639 1.381739e-01
tmpd -0.00296833498 0.00128542535 -2.3092239 2.174371e-02
dptp 0.00070306996 0.00103262057 0.6808599 4.965877e-01
no2tmean 0.00126556418 0.00086229169 1.4676753 1.434444e-01
pm10tmean 0.00174189857 0.00078432327 2.2208937 2.725117e-02
```
| | Estimate| Std. Error| t value| Pr(>|t|)|
|:-----------|--------:|----------:|-------:|------------------:|
|(Intercept) | 5.6138| 0.1644| 34.1465| 0.0000|
|date | 0.0000| 0.0000| -2.2660| 0.0243|
|seasonQ2 | -0.0514| 0.0234| -2.2001| 0.0287|
|seasonQ3 | -0.0657| 0.0299| -2.1967| 0.0290|
|seasonQ4 | -0.0275| 0.0185| -1.4874| 0.1382|
|tmpd | -0.0030| 0.0013| -2.3092| 0.0217|
|dptp | 0.0007| 0.0010| 0.6809| 0.4966|
|no2tmean | 0.0013| 0.0009| 1.4677| 0.1434|
|pm10tmean | 0.0017| 0.0008| 2.2209| 0.0273|

Notice in the table of coefficients that the `no2tmean` coefficient is similar in magnitude to the `pm10tmean` coefficient, although its `t value` is not as large. The `pm10tmean` coefficient appears to be statistically significant, but it is somewhat smaller in magnitude now.

Expand All @@ -123,7 +124,7 @@ Notice that the variable `pm10tmean` comes near the bottom of the list in terms

However, just because PM10 is not a strong predictor of mortality doesn't mean that it does not have a relevant association with mortality. Given the tradeoffs that have to be made when developing a prediction model, PM10 is not high on the list of predictors that we would include--we simply cannot include every predictor.


## Frequently Asked Questions


## Summary
Expand Down

0 comments on commit eb7c315

Please sign in to comment.