-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Take 1 on stan-based predictions when model is fitted using stan::optimizing #865
Conversation
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great to see, thanks a bunch. A whole bunch of comments, mostly minor, inline. This change will allow us to remove a lot of code, but there are a few other places that need some refactoring.
There is a whole chain of functions related to generating the future predictions that should be removed since this is now being done in Stan: sample_predictive_trend
, sample_model
, and sample_posterior_predictive
can all go.
predictive_samples
is a function for returning the actual samples of yhat
and trend
. This function is exported and so should stay. Right now it calls sample_posterior_predictive
. In this diff, it should instead get the samples of yhat
and trend
straight from the GQ (yhat_samples
and trend_samples
). To avoid duplication of logic with predict.prophet
, we will want to take all of the logic in predict.prophet
for generating the GQ and put it in a helper function, perhaps that will return stan.fit. predict.prophet
will call that function and will then pick up in L1378 here to extract the predictions and compute the components. predictive_samples
will call that function, but then just directly return the samples without further computation.
I have made most of the changes and created a new function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. Handling of cap
in predict is correct. I have a few minor issues. If you could also delete the piecewise_linear
function in L1308, and the piecewise_logistic
in L1334 that'd be great, since we shouldn't be using them anymore.
Once you make these updates I'll go ahead and install it locally and test the performance, and then it'll just be a matter of handling MCMC and this will be finished.
For MCMC, what we currently do is we do the simulation separately for each parameter value. It'd be nice, and I think it should be possible, to do the same thing here: Basically everything done in predict
from L1306 to L1341 is done inside of a loop, where each time the parameters put into stan_init
would be a different value of the draw. The n_samp
is then m$uncertainty_samples / (number of MCMC samples)
. Does that make sense and sound reasonable to you? You can see how it is currently done in https://github.com/facebook/prophet/blob/master/R/R/prophet.R#L1435 . It would not require special casing for the MCMC vs. not-MCMC cases - if not MCMC, then it can still be the same loop, just the number of samples we loop over will be 1.
But let's leave that for the future and I'll go ahead and do some performance testing on this before we polish it up.
I've made the changes. Let me know if anything comes up in the testing process. For predictions under the sampling option, looping over what I have currently with the optimizing option should be fine I think. Let me go through that part of the code to better understand how its done currently. Do you think we can also give a try to the sampling option in stan for predictions, if it provides any performance benefits? |
Great, I'll test this next week. As for doing sampling in prediction: I think there are two ways we can handle MCMC. The first is to do the sampling on fit, and then on predict loop over the samples. This is what I had described earlier. The second approach is to not do MCMC on fit, but instead just do the full MCMC on predict. There are pros and cons to each. The second approach (MCMC just on predict) would be a lot cleaner to implement (the pro). There are two cons though. The first is that if you want to make a prediction for a different set of dates, you don't have to re-do the MCMC. Looping over samples for predict is kindof expensive, but still way cheaper than the MCMC. I'm not sure how important that use case is, but it is a nice property to have that you only have to do the MCMC once. The second issue is that we currently do MCMC only on fit, and so moving it to predict would be a pretty significant change and could affect people's workflows. I try to minimize those sorts of changes across versions unless there's a pretty compelling reason to. That's the main reason why I prefer approach (1), even though it's definitely going to be a bit uglier in the implementation. If there's another advantage to moving it to predict that you can think of let me know and we can discuss if it is worth doing. |
I agree with your points and approach 2 may be too big of a change and lead to instabilities. After the tests, I will go ahead and make the changes to implement approach 1 . |
OK, I did a head-to-head comparison on two of the documentation examples: the quickstart one here, https://github.com/facebook/prophet/blob/master/notebooks/quick_start.ipynb and the sub-daily data one at the top of here , https://github.com/facebook/prophet/blob/master/notebooks/non-daily_data.ipynb The good news is that the predictions seem to be the same. The bad news is that it is quite a bit slower. The blocker for this last summer was an issue in rstan #519 that transferring data from Stan into R was really slow. They pushed a fix for that issue that did make things much faster, but there's still a pretty noticeable lag, and most importantly it's making things quite a bit slower than they currently are. In the quickstart example I really want the flexiblity that we get from moving predictions to Stan and am super excited by how much code was deleted in this PR, but a 5-10x slowdown is just too much. I think we can do better. The reason it is slow is because we are passing back from Stan these very large T_pred x n_samp matrices. When I make T_pred small (by not including the training data in We do still have to have an option to return the full samples for the predictive_samples function, and so that would still be slow, but I think all of the other gains that we get from doing this would be worth the slowdown in predictive_samples, as long as predict stays fast. This is going to require a bit of a refactor of the GQ block, to move it into a function that would return percentiles if those are requested or the full samples if those are requested. And Stan doesn't seem to have a quantile function yet, but it does have a sort function that we can use, and we could in R compute the indicies of the quantiles that we want and pass those in as ints. Hopefully computing the percentiles row by row like that won't be too slow (I'd expect not given that it will be compiled down). That's a lot of text, but here's what I would propose:
|
A few other minor things that were important for testing:
|
I did notice a slowdown although I didn't test it enough to figure out the reason. I will try to move the quantile calculations to stan and I think a simple sort should be close enough. Let me dig into some examples of the GQ block, I will reach out to you if I have any questions. |
This looks useful for computing quantiles: https://discourse.mc-stan.org/t/quantile-function-in-stan/3642/12 Hopefully this is fast and then the I/O costs drop down to a very small number and we're all good! I'd be happy to do the Python implementation (deleting code is my favorite). |
Thanks, will take a look. I should be able make the changes and push the code in the next few days. |
Returning the quantiles from stan seems to help. In my test cases the predict function is much faster. Let me know if you see a similar reduction in prediction times in the notebook examples. |
This looks great. On the quickstart example it was 40% faster, and on the sub-daily data example it was 22% faster. Let's get the missing components in, and then we can merge the PR and see if there's any additional performance tuning that can be done afterwards. I believe the missing components are:
I think the best way to handle this will probably be to add an additional flag input to GQ for whether all samples should be returned or just the quantiles. On predict, this will return an empty matrix for samples, on
Now that I think about it, this will be a bit of a challenge with this approach of computing quantiles in stan. Suppose we have 200 MCMC samples and we want to estimate uncertainty intervals with 1000 samples. What we currently do is we loop through those 200 MCMC samples, and then draw 5 uncertainty samples for each one. We stack all of those to get the total 1000 and then compute quantiles. The only way that would work here would be if for every MCMC sample we extracted the full set of samples, and then computed the quantiles back in R. As we know from above, transferring all of those samples from Stan back to R is very slow. The alternative is to for MCMC do the sampling at predict time rather than fit time. We can still do MAP optimization at fit time, and then use it to initialize the sampling at predict time. We discussed this a bit above and I was not a fan of this approach because it means you have to repeat the sampling if you change the set of dates at which you make predictions. This makes MCMC not-an-option if you want to do "prophet as a service" type of design. Here, the more common pattern is to make predictions on all the dates you think you'll need and store those (where this wouldn't be an issue), but I don't know what other people externally might be doing. This would, however, make the total fit+predict time no slower for MCMC than it currently is. @seanjtaylor what do you think? I lean a bit towards the 2nd? |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
With the addition of the cmdstanpy backend I think using GQ for predictions in Stan is going to be a bit more difficult (more places to have to handle two different GQ APIs). I've been thinking now that the best approach for making it easier to customize the model will be to make the model more modular and have a clear API/separation between functions that are evaluating trends/link functons, and the other Prophet pieces like constructing the regressor matrix. So I'm going to go ahead and close this now since I don't think we'll be be pursuing this option right now. But a big thanks @shubhankaray for exploring this approach and assessing its viability. You do great work! |
I inadvertently introduced many small changes by auto-formatting the code in R studio. Let me know if you would like me to revert these changes. I have also retained some of the older code by adding the prefix "older_" to compare new vs. old predictions. I will remove these once I am done with the remaining work with stan::sampling.