Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take 1 on stan-based predictions when model is fitted using stan::optimizing #865

Closed
wants to merge 5 commits into from

Conversation

shubhankaray
Copy link

I inadvertently introduced many small changes by auto-formatting the code in R studio. Let me know if you would like me to revert these changes. I have also retained some of the older code by adding the prefix "older_" to compare new vs. old predictions. I will remove these once I am done with the remaining work with stan::sampling.

@facebook-github-bot
Copy link
Contributor

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

Copy link
Contributor

@bletham bletham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great to see, thanks a bunch. A whole bunch of comments, mostly minor, inline. This change will allow us to remove a lot of code, but there are a few other places that need some refactoring.

There is a whole chain of functions related to generating the future predictions that should be removed since this is now being done in Stan: sample_predictive_trend, sample_model, and sample_posterior_predictive can all go.

predictive_samples is a function for returning the actual samples of yhat and trend. This function is exported and so should stay. Right now it calls sample_posterior_predictive. In this diff, it should instead get the samples of yhat and trend straight from the GQ (yhat_samples and trend_samples). To avoid duplication of logic with predict.prophet, we will want to take all of the logic in predict.prophet for generating the GQ and put it in a helper function, perhaps that will return stan.fit. predict.prophet will call that function and will then pick up in L1378 here to extract the predictions and compute the components. predictive_samples will call that function, but then just directly return the samples without further computation.

R/inst/stan/prophet.stan Outdated Show resolved Hide resolved
R/inst/stan/prophet.stan Outdated Show resolved Hide resolved
R/R/prophet.R Outdated Show resolved Hide resolved
R/R/prophet.R Outdated Show resolved Hide resolved
R/R/prophet.R Outdated Show resolved Hide resolved
R/R/prophet.R Outdated Show resolved Hide resolved
R/R/prophet.R Outdated Show resolved Hide resolved
R/R/prophet.R Outdated Show resolved Hide resolved
R/R/prophet.R Outdated Show resolved Hide resolved
R/R/prophet.R Show resolved Hide resolved
@shubhankaray
Copy link
Author

shubhankaray commented Mar 11, 2019

I have made most of the changes and created a new function data_for_stan which is called from fit.prophet and predict.prophet to create the list dat. Previously I was assuming that the df used for predictions included historical data, but now this is no longer the case; and I am using two separate dfs namely fit.seasonal.features & predict.seasonal.features in data_for_stan. Not sure if this is the best way, but please take look and let me know what you think. Also I am not sure about how cap and cap_pred is initialized in this function, having not used with this part of the model in my forecasting work.

Copy link
Contributor

@bletham bletham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Handling of cap in predict is correct. I have a few minor issues. If you could also delete the piecewise_linear function in L1308, and the piecewise_logistic in L1334 that'd be great, since we shouldn't be using them anymore.

Once you make these updates I'll go ahead and install it locally and test the performance, and then it'll just be a matter of handling MCMC and this will be finished.

For MCMC, what we currently do is we do the simulation separately for each parameter value. It'd be nice, and I think it should be possible, to do the same thing here: Basically everything done in predict from L1306 to L1341 is done inside of a loop, where each time the parameters put into stan_init would be a different value of the draw. The n_samp is then m$uncertainty_samples / (number of MCMC samples). Does that make sense and sound reasonable to you? You can see how it is currently done in https://github.com/facebook/prophet/blob/master/R/R/prophet.R#L1435 . It would not require special casing for the MCMC vs. not-MCMC cases - if not MCMC, then it can still be the same loop, just the number of samples we loop over will be 1.

But let's leave that for the future and I'll go ahead and do some performance testing on this before we polish it up.

R/R/prophet.R Outdated Show resolved Hide resolved
R/R/prophet.R Outdated Show resolved Hide resolved
R/R/prophet.R Outdated Show resolved Hide resolved
R/R/prophet.R Outdated Show resolved Hide resolved
R/R/prophet.R Outdated Show resolved Hide resolved
R/R/prophet.R Outdated Show resolved Hide resolved
@shubhankaray
Copy link
Author

I've made the changes. Let me know if anything comes up in the testing process.

For predictions under the sampling option, looping over what I have currently with the optimizing option should be fine I think. Let me go through that part of the code to better understand how its done currently. Do you think we can also give a try to the sampling option in stan for predictions, if it provides any performance benefits?

@bletham
Copy link
Contributor

bletham commented Mar 16, 2019

Great, I'll test this next week.

As for doing sampling in prediction: I think there are two ways we can handle MCMC. The first is to do the sampling on fit, and then on predict loop over the samples. This is what I had described earlier.

The second approach is to not do MCMC on fit, but instead just do the full MCMC on predict.

There are pros and cons to each. The second approach (MCMC just on predict) would be a lot cleaner to implement (the pro). There are two cons though. The first is that if you want to make a prediction for a different set of dates, you don't have to re-do the MCMC. Looping over samples for predict is kindof expensive, but still way cheaper than the MCMC. I'm not sure how important that use case is, but it is a nice property to have that you only have to do the MCMC once. The second issue is that we currently do MCMC only on fit, and so moving it to predict would be a pretty significant change and could affect people's workflows. I try to minimize those sorts of changes across versions unless there's a pretty compelling reason to. That's the main reason why I prefer approach (1), even though it's definitely going to be a bit uglier in the implementation. If there's another advantage to moving it to predict that you can think of let me know and we can discuss if it is worth doing.

@shubhankaray
Copy link
Author

I agree with your points and approach 2 may be too big of a change and lead to instabilities. After the tests, I will go ahead and make the changes to implement approach 1 .

@bletham
Copy link
Contributor

bletham commented Mar 18, 2019

OK, I did a head-to-head comparison on two of the documentation examples: the quickstart one here, https://github.com/facebook/prophet/blob/master/notebooks/quick_start.ipynb and the sub-daily data one at the top of here , https://github.com/facebook/prophet/blob/master/notebooks/non-daily_data.ipynb

The good news is that the predictions seem to be the same. The bad news is that it is quite a bit slower.

The blocker for this last summer was an issue in rstan #519 that transferring data from Stan into R was really slow. They pushed a fix for that issue that did make things much faster, but there's still a pretty noticeable lag, and most importantly it's making things quite a bit slower than they currently are. In the quickstart example predict was 5x slower (25s vs. 5s), and in the non-daily data (which has a much larger history, and so the matrix of samples being transferred from Stan is much larger) it was 10x slower (4mins vs. 25 seconds).

I really want the flexiblity that we get from moving predictions to Stan and am super excited by how much code was deleted in this PR, but a 5-10x slowdown is just too much.

I think we can do better. The reason it is slow is because we are passing back from Stan these very large T_pred x n_samp matrices. When I make T_pred small (by not including the training data in future and forecasting only 2 periods), then this PR is actually 3x faster than current prophet. It is only for T_pred about 80 that it starts to be slower. That is, for an 80x1000 matrix we break even, but bigger than that and it starts to be slower. So I'm thinking that the right solution here would be to not return the T_pred x nsamp matrix unless we really need it. Most of the time (for a usual call to predict), we only care about the percentiles of trend_samples and yhat_samples (that's what predict computes, and then throws away the samples). If we move the percentiles computing into Stan, then it would just have to return a T_pred x 2 matrix, which based on the test above I'd expect to be much faster.

We do still have to have an option to return the full samples for the predictive_samples function, and so that would still be slow, but I think all of the other gains that we get from doing this would be worth the slowdown in predictive_samples, as long as predict stays fast.

This is going to require a bit of a refactor of the GQ block, to move it into a function that would return percentiles if those are requested or the full samples if those are requested. And Stan doesn't seem to have a quantile function yet, but it does have a sort function that we can use, and we could in R compute the indicies of the quantiles that we want and pass those in as ints. Hopefully computing the percentiles row by row like that won't be too slow (I'd expect not given that it will be compiled down).

That's a lot of text, but here's what I would propose:

  • Let's move the quantile computing in to Stan, so that we return a (T_pred x 2) matrix rather than a (T_pred x n_samp).
  • Everything happening in the GQ block can be moved into a function that returns those things so that all of the other kinda-large stuff in the GQ block (A_pred, A_sim) can be declared in the function rather than in the GQ block and (hopefully I think?) thus not have to be returned to R.
  • If that is working and is fast, then we can add in the option to return all of the samples, and will hook predictive_samples up to that.

@bletham
Copy link
Contributor

bletham commented Mar 18, 2019

A few other minor things that were important for testing:

  • You can delete the predict_trend function
  • In predict.prophet, there are a bunch of functions that do not have their package specified. Every function that does not come from base needs to have its package declared, like dplyr::mutate, dplyr::select, dplyr::mutate_at, and dplyr::funs I think were all of them.

@shubhankaray
Copy link
Author

I did notice a slowdown although I didn't test it enough to figure out the reason. I will try to move the quantile calculations to stan and I think a simple sort should be close enough. Let me dig into some examples of the GQ block, I will reach out to you if I have any questions.

@seanjtaylor
Copy link
Contributor

This looks useful for computing quantiles: https://discourse.mc-stan.org/t/quantile-function-in-stan/3642/12

Hopefully this is fast and then the I/O costs drop down to a very small number and we're all good! I'd be happy to do the Python implementation (deleting code is my favorite).

@shubhankaray
Copy link
Author

Thanks, will take a look. I should be able make the changes and push the code in the next few days.

@shubhankaray
Copy link
Author

Returning the quantiles from stan seems to help. In my test cases the predict function is much faster. Let me know if you see a similar reduction in prediction times in the notebook examples.

@bletham
Copy link
Contributor

bletham commented Apr 3, 2019

This looks great. On the quickstart example it was 40% faster, and on the sub-daily data example it was 22% faster. Let's get the missing components in, and then we can merge the PR and see if there's any additional performance tuning that can be done afterwards.

I believe the missing components are:

  • The predictive_samples function.

I think the best way to handle this will probably be to add an additional flag input to GQ for whether all samples should be returned or just the quantiles. On predict, this will return an empty matrix for samples, on predictive_samples it will incur the extra cost to return the full samples matrix.

  • MCMC sampling.

Now that I think about it, this will be a bit of a challenge with this approach of computing quantiles in stan. Suppose we have 200 MCMC samples and we want to estimate uncertainty intervals with 1000 samples. What we currently do is we loop through those 200 MCMC samples, and then draw 5 uncertainty samples for each one. We stack all of those to get the total 1000 and then compute quantiles. The only way that would work here would be if for every MCMC sample we extracted the full set of samples, and then computed the quantiles back in R. As we know from above, transferring all of those samples from Stan back to R is very slow.

The alternative is to for MCMC do the sampling at predict time rather than fit time. We can still do MAP optimization at fit time, and then use it to initialize the sampling at predict time. We discussed this a bit above and I was not a fan of this approach because it means you have to repeat the sampling if you change the set of dates at which you make predictions. This makes MCMC not-an-option if you want to do "prophet as a service" type of design. Here, the more common pattern is to make predictions on all the dates you think you'll need and store those (where this wouldn't be an issue), but I don't know what other people externally might be doing. This would, however, make the total fit+predict time no slower for MCMC than it currently is.

@seanjtaylor what do you think? I lean a bit towards the 2nd?

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@bletham
Copy link
Contributor

bletham commented Feb 7, 2020

With the addition of the cmdstanpy backend I think using GQ for predictions in Stan is going to be a bit more difficult (more places to have to handle two different GQ APIs). I've been thinking now that the best approach for making it easier to customize the model will be to make the model more modular and have a clear API/separation between functions that are evaluating trends/link functons, and the other Prophet pieces like constructing the regressor matrix. So I'm going to go ahead and close this now since I don't think we'll be be pursuing this option right now. But a big thanks @shubhankaray for exploring this approach and assessing its viability. You do great work!

@bletham bletham closed this Feb 7, 2020
@bletham bletham mentioned this pull request Mar 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants