Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track data used across forecasting #17

Closed
stephenturner opened this issue Jan 11, 2021 · 4 comments
Closed

Track data used across forecasting #17

stephenturner opened this issue Jan 11, 2021 · 4 comments

Comments

@stephenturner
Copy link
Contributor

fable::model and focustools::ts_fit both take data from make_tsibble as input. ts_sumulative_forecast requires the incident forecast, as well as the data used to create that incident forecast. There's a check in there that the first forecasted week is 1 week ahead of the last week of recorded data in the supplied dataset, and this should catch egregious errors supplying the wrong data that wasn't used to generate the inc forecast. This could be done with something like this in the ts_fit function

attr(icases_forecast, ".data") <- deparse_substitute(.data)

Then later something like this in the ts_sumulative_forecast

stopifnot(deparse(substitute(.data))==attr(inc_forecast, ".data"))

Doesn't seem to change the tibble that's output. Example:

return_data_with_attr <- function(.data) {
  out <- tibble::as_tibble(head(.data))
  attr(out, ".data") <- deparse(substitute(.data))
  return(out)
}
> irishead <- return_data_with_attr(iris)
> irishead
# A tibble: 6 x 5
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>       <dbl>        <dbl>       <dbl> <fct>  
1          5.1         3.5          1.4         0.2 setosa 
2          4.9         3            1.4         0.2 setosa 
3          4.7         3.2          1.3         0.2 setosa 
4          4.6         3.1          1.5         0.2 setosa 
5          5           3.6          1.4         0.2 setosa 
6          5.4         3.9          1.7         0.4 setosa 
> attr(irishead, ".data")
[1] "iris"
@stephenturner
Copy link
Contributor Author

So it seems like the best place to put this would be in ts_fit but you can't use the TSLM with lag 3 (according to the readme, I still haven't dived in yet to grok why). As of now, both in the walkthrough in the readme and in the pipeline function, we're calling model() directly. If we were calling model() inside a function call in the package we could tack on this attribute easily, and this attribute could be picked up in ts_forecast. Any thoughts on an immediate solution to this one @vpnagraj or should we keep this one open for now and not tackle it just yet? It's a potential gotcha that probably won't bite us now, but may as we develop subnational forecasts with different data objects.

@vpnagraj
Copy link
Contributor

so unless i'm missing it, the code above just checks the name of the object, right? it's not actually checking equivalent values in the data? to do that i guess we could make ts_forecast()return a list with the forecast tibble and the data as elements. slippery slope until we're basically writing our own s3 class for this stuff.

to be honest, thinking about this now i'm not sure how dangerous this is. even in the state-level forecasts, how would we get into a situation where we were making cumulative forecast from different data than we used for the incident forecast? especially with the one-stop forecast_pipeline()? i mean the possibility of that sort of goof will always be there interactively. but i'm having a hard time realistically imaging this biting us, especially as we automate more and more.

still bugs me. let's leave the issue open. maybe revisit after we get #26 working? and if we ever publish the pkg then i definitely want to address this.

@stephenturner
Copy link
Contributor Author

Yep, just the name. Perhaps an in-between just storing the name versus storing the actual data in a list would be to store a hash eg with digest of the data tibble, and ensure the hashes match.

return_data_with_attr <- function(.data) {
  out <- tibble::as_tibble(head(.data))
  attr(out, ".data") <- digest::digest(.data)
  return(out)
}

irishead <- return_data_with_attr(iris)
attr(irishead, ".data")
#> [1] "d3c5d071001b61a9f6131d3004fd0988"

But, I think I'm in agreement with you all around. (A) this isn't that dangerous now, but (B) when we make public or try to publish, this wouldn't be terribly hard to implement I don't think.

Punt to post #26

@vpnagraj
Copy link
Contributor

vpnagraj commented Mar 30, 2021

going to close this one out. havent run into any issues losing track of data between forecasts yet. function docs do provide a reminder that data should match:

https://github.com/signaturescience/focustools/blob/master/R/ts.R#L158

if needed we can reopen later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants