Track data used across forecasting #17

stephenturner · 2021-01-11T14:26:22Z

fable::model and focustools::ts_fit both take data from make_tsibble as input. ts_sumulative_forecast requires the incident forecast, as well as the data used to create that incident forecast. There's a check in there that the first forecasted week is 1 week ahead of the last week of recorded data in the supplied dataset, and this should catch egregious errors supplying the wrong data that wasn't used to generate the inc forecast. This could be done with something like this in the ts_fit function

attr(icases_forecast, ".data") <- deparse_substitute(.data)

Then later something like this in the ts_sumulative_forecast

stopifnot(deparse(substitute(.data))==attr(inc_forecast, ".data"))

Doesn't seem to change the tibble that's output. Example:

return_data_with_attr <- function(.data) {
  out <- tibble::as_tibble(head(.data))
  attr(out, ".data") <- deparse(substitute(.data))
  return(out)
}

> irishead <- return_data_with_attr(iris)
> irishead
# A tibble: 6 x 5
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>       <dbl>        <dbl>       <dbl> <fct>  
1          5.1         3.5          1.4         0.2 setosa 
2          4.9         3            1.4         0.2 setosa 
3          4.7         3.2          1.3         0.2 setosa 
4          4.6         3.1          1.5         0.2 setosa 
5          5           3.6          1.4         0.2 setosa 
6          5.4         3.9          1.7         0.4 setosa 
> attr(irishead, ".data")
[1] "iris"

The text was updated successfully, but these errors were encountered:

stephenturner · 2021-01-12T22:12:43Z

So it seems like the best place to put this would be in ts_fit but you can't use the TSLM with lag 3 (according to the readme, I still haven't dived in yet to grok why). As of now, both in the walkthrough in the readme and in the pipeline function, we're calling model() directly. If we were calling model() inside a function call in the package we could tack on this attribute easily, and this attribute could be picked up in ts_forecast. Any thoughts on an immediate solution to this one @vpnagraj or should we keep this one open for now and not tackle it just yet? It's a potential gotcha that probably won't bite us now, but may as we develop subnational forecasts with different data objects.

vpnagraj · 2021-01-12T22:52:35Z

so unless i'm missing it, the code above just checks the name of the object, right? it's not actually checking equivalent values in the data? to do that i guess we could make ts_forecast()return a list with the forecast tibble and the data as elements. slippery slope until we're basically writing our own s3 class for this stuff.

to be honest, thinking about this now i'm not sure how dangerous this is. even in the state-level forecasts, how would we get into a situation where we were making cumulative forecast from different data than we used for the incident forecast? especially with the one-stop forecast_pipeline()? i mean the possibility of that sort of goof will always be there interactively. but i'm having a hard time realistically imaging this biting us, especially as we automate more and more.

still bugs me. let's leave the issue open. maybe revisit after we get #26 working? and if we ever publish the pkg then i definitely want to address this.

stephenturner · 2021-01-13T13:36:42Z

Yep, just the name. Perhaps an in-between just storing the name versus storing the actual data in a list would be to store a hash eg with digest of the data tibble, and ensure the hashes match.

return_data_with_attr <- function(.data) {
  out <- tibble::as_tibble(head(.data))
  attr(out, ".data") <- digest::digest(.data)
  return(out)
}

irishead <- return_data_with_attr(iris)
attr(irishead, ".data")
#> [1] "d3c5d071001b61a9f6131d3004fd0988"

But, I think I'm in agreement with you all around. (A) this isn't that dangerous now, but (B) when we make public or try to publish, this wouldn't be terribly hard to implement I don't think.

Punt to post #26

vpnagraj · 2021-03-30T16:30:44Z

going to close this one out. havent run into any issues losing track of data between forecasts yet. function docs do provide a reminder that data should match:

https://github.com/signaturescience/focustools/blob/master/R/ts.R#L158

if needed we can reopen later

stephenturner mentioned this issue Jan 19, 2021

data prep to translate target forecasts to submission file format #6

Closed

vpnagraj mentioned this issue Mar 1, 2021

retrieve parameters from ARIMA models and store during pipeline #48

Closed

vpnagraj closed this as completed Mar 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track data used across forecasting #17

Track data used across forecasting #17

stephenturner commented Jan 11, 2021

stephenturner commented Jan 12, 2021

vpnagraj commented Jan 12, 2021

stephenturner commented Jan 13, 2021

vpnagraj commented Mar 30, 2021 •

edited

Loading

Track data used across forecasting #17

Track data used across forecasting #17

Comments

stephenturner commented Jan 11, 2021

stephenturner commented Jan 12, 2021

vpnagraj commented Jan 12, 2021

stephenturner commented Jan 13, 2021

vpnagraj commented Mar 30, 2021 • edited Loading

vpnagraj commented Mar 30, 2021 •

edited

Loading