Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

translate point estimates to quantile #4

Closed
vpnagraj opened this issue Dec 15, 2020 · 7 comments
Closed

translate point estimates to quantile #4

vpnagraj opened this issue Dec 15, 2020 · 7 comments

Comments

@vpnagraj
Copy link
Contributor

the covid 19 forecast hub requires point estimates and pre-determined quantile values for each target.

the quantile values required are enumerated in the technical readme:

https://github.com/reichlab/covid19-forecast-hub/blob/master/data-processed/README.md#quantile

once we decide on an initial model / targets to pursue, we will need establish how to go about translating a point estimate to quantiles.

@vpnagraj
Copy link
Contributor Author

a little more detail on one possible approach here ...

may be a naive way of thinking about it, but what if we used the point estimate from a model prediction along with its standard error to seed a simulated distribution of predicted values? to do so, we would have to assume the predicted values follow a certain distribution ... but from there we could cut the simulated values at relevant quantiles

below is some code to do that with a linear model of faithful data under the assumption that the predicted value follows a normal distribution:

## fit the model
eruptfit <- lm(eruptions ~ waiting, data = faithful)

## get the prediction interval
## from prediction interval calculate standard error
pred <- 
  predict(eruptfit, data.frame(waiting = 80), level = 0.95, interval = "predict") %>%
  as.data.frame() %>%
  dplyr::mutate(se = (upr - fit) / qt(0.975, nrow(faithful)))

## sidebar ... is this standard error formula correct?
## see https://stackoverflow.com/questions/38109501/how-does-predict-lm-compute-confidence-interval-and-prediction-interval

## what is standard error for CI?
tmp <- predict(eruptfit, data.frame(waiting = 80), level = 0.95, interval = "predict", se.fit = TRUE)
tmp$se.fit
[1] 0.03625177
## use standard error for CI to calculate standard error for PI
sqrt(tmp$se.fit ^ 2 + tmp$residual.scale ^ 2)
[1] 0.4978346
## what whas the se for the PI per our calculation?
pred$se
[1] 0.4978511
## assume distribution of forecast values follows normal distribution ...
## mean = prediction
## sd = standard error of prediction
## "simulate" values by generate random normal distribution
sims <- rnorm(1000, mean = pred$fit, sd = pred$se)

## cut the simulated values into the quantiles of interest
quantile(sims, probs = c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99))
      1%     2.5%       5%      10%      15%      20%      25%      30%      35%      40%      45%      50%      55%      60%      65% 
3.001737 3.149908 3.346475 3.560254 3.672461 3.768598 3.843439 3.925468 3.986631 4.069986 4.132211 4.175931 4.251500 4.288355 4.344637 
     70%      75%      80%      85%      90%      95%    97.5%      99% 
4.407399 4.499772 4.603745 4.719678 4.833324 5.030727 5.146453 5.357499 

@stephenturner
Copy link
Contributor

So the 50th percentile here, the median -- how does that relate to the point estimate?

@stephenturner
Copy link
Contributor

Few other approaches in the fable framework:

  1. See fable bookdown section on simulation: https://tidyverts.github.io/tidy-forecasting-principles/methods.html#simulation
  2. See about halfway down, the hilo() function.

The resulting forecasts are contained in a “fable” (forecast table), and both point forecasts and forecast distributions are available in the table for the next five years. Confidence intervals can be extracted from the distribution using the hilo() function.

The hilo() function can also be used on fable objects, which allows you to extract multiple intervals at once.

@vpnagraj
Copy link
Contributor Author

So the 50th percentile here, the median -- how does that relate to the point estimate?

good point. but in this case the 50th percentile from quantile is generated from a random normal distribution that is centered on the point estimate. rnorm(size=1000, ...) above so i would think that the median would be pretty close to the mean

looks like it is (using objects created from example code above):

pred$fit
[1] 4.17622

@stephenturner
Copy link
Contributor

stephenturner commented Dec 21, 2020

This section of the FPP3 book is a good primer on forecast distributions, and later on shows how to use bootstrapping to get prediction intervals. https://otexts.com/fpp3/prediction-intervals.html

Also, from: https://otexts.com/fpp3/forecasting-using-transformations.html

If a transformation has been used, then the prediction interval is first computed on the transformed scale, then the end points are back-transformed to give a prediction interval on the original scale.

stephenturner added a commit that referenced this issue Dec 21, 2020
@stephenturner
Copy link
Contributor

stephenturner added a commit that referenced this issue Dec 29, 2020
@vpnagraj
Copy link
Contributor Author

vpnagraj commented Jan 6, 2021

we have this working (see https://github.com/signaturescience/focustools/blob/master/scratch/fable-submission-mockup-allmetrics.R#L28-L38)

the quantiles meet the valid entry file format once bound, formatted with target names, etc.

closing for now, though we may need to revisit if/when we implement another kind of model outside of the fable framework

@vpnagraj vpnagraj closed this as completed Jan 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants