Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up the prediction #2030

Open
orenmatar opened this issue Sep 18, 2021 · 15 comments
Open

Speeding up the prediction #2030

orenmatar opened this issue Sep 18, 2021 · 15 comments

Comments

@orenmatar
Copy link
Contributor

I was working on scaling up Prophet to be used as a forecast engine for hundreds of thousands of items for a project I'm working on, and realized that predicting with Prophet takes the vast majority of the running time,
After some research I figured out a way to speed it up significantly, and wrote the following post:

https://towardsdatascience.com/how-to-run-facebook-prophet-predict-x100-faster-cce0282ca77d#4b11-31bf17c6772

I think it would be great if something like this was implemented.

@winedarksea
Copy link
Contributor

winedarksea commented Apr 4, 2022

@orenmatar and @tcuongd do you know if anyone has made an attempt on this yet? Otherwise I might take a try at integrating that code.

@orenmatar
Copy link
Contributor Author

@winedarksea I don't know of any attempt... and would love to see one. If you have any questions about the code - I'd be happy to help.
BTW, I'm working on optimizing the fit now. It can be done without using pystan, which consumes a lot of time.

@winedarksea
Copy link
Contributor

@orenmatar thanks, hoping to get this done in the next week or two (perhaps a foolish hope). First have to figure out the new cmdstanpy backend for the develop

@winedarksea
Copy link
Contributor

some questions for you @orenmatar

  1. Easy first: uncertainty Samples is the same as "k" in your code, right?
  2. Seasonal Components have an uncertainty part in the original code, when Prophet(mcmc_samples=300).
    • doesn't look like it adds much compute to add it in, just takes a percentile of components in predict_seasonal_components when uncertainty_samples is True, then add or multiple terms along with trend and random
    • it really looks like it is only used in plot_components, are you aware if this is actually used in yhat_upper/lower when mcmc sampling is used?
  3. Flat growth, which is one of the 3 official growths now, how does this look as an addition to your code?
        elif prophet_obj.growth == "flat":
            sample_trends = np.zeros(k, len(forecast_df))

I also played around with this variation based on the flat_trend function but it is wrong, being not centered around 0

        elif prophet_obj.growth == "flat":
            sample_trends = np.repeat(
                prophet_obj.params['m'] * np.ones_like(future_time_series),
                k, axis=0
            )

@orenmatar
Copy link
Contributor Author

@winedarksea

  1. that's right - k is just the number of samples we take for the trend
  2. As far as I could tell from the code it doesn't influence the yhat_upper and lower.. and having tested it on many items without that component - I believe I'm right... But maybe it would be best to address one of the original writers of the code.
  3. Ya that seems like the simplest way to do it. But if the growth is flat then I believe the only source of uncertainty is the sigma of the normal distribution or errors, right? In which case there's no need to create a matrix of 0s, instead we can find the required percentile on a normal distribution. e.g. if sigma = 5, then for 95% uncertainty we have yhat+51.96 for the upper yhat and yhat-51.96 for the lower. this will take even less time and will be more precise than creating a random matrix and computing percentiles on it... but since the saves time here is insignificant, the simplest code to write would be what you suggested.

Good luck! looking forward to seeing this implemented!

@winedarksea
Copy link
Contributor

winedarksea commented Apr 22, 2022

I've been away on vacation in Italy, but I'm back!

I have a question which really is for the Prophet maintainers if they will come to the party? I am FB Internal and from what I can tell Ben Letham is entirely moved on to other projects - for example in Workspaces for Prophet he ignored call outs to him from others when there was a bug in the internal build. So request to @dmitryvinn and @tcuongd as to whom to coordinate work on?

My basic question is how many internal functions supporting the prediction need to be maintained. The simplest way to integrate these speed ups would be to cut the number of functions - because most of them are nested in for loops that will no longer be needed.

OLD/CURRENT WAY

  • predict
    • predict_uncertainty
      • generates percentiles from samples from:
      • sample_posterior_predictive
        • generates make_all_seasonality_features
        • generates many samples from:
        • sample_model
          • adds a random component to:
          • sample_predictive_trend
            • generates a new trend from piecewise_linear (or logistic)

New Way

  • predict
    • predict_uncertainty (method changed, output same)

Would be obsolete
sample_posterior_predictive, sample_model, sample_predictive_trend

Alternatives:
Could maintain them unchanged, but remove them from use in predict_uncertainty.
Or could attempt to rewrite to access from new method.
sample_posterior_predictive would be fairly easy to update to the new method, the others would be harder.

A possible compromise solution would be to use the old way when mcmc_samples is True, which users are already expecting to be slow, and use the new way (with a trimmed down number of functions supporting it) for other cases.

@orenmatar
Copy link
Contributor Author

@winedarksea How is it going with the project?
BTW, I just published another post on optimizing Prophet, which you may find interesting...
https://medium.com/towards-data-science/need-for-speed-optimizing-facebook-prophet-fit-method-to-run-20x-faster-166cd258f456

@winedarksea
Copy link
Contributor

@orenmatar I messaged Ben Letham internally and his current priority is updating the internal platforming for Prophet at Meta which I may be helping with. But he did look at your work and liked it, so I think we will get this in - although that's a matter of months probably.

Neural Prophet (which is from a different team) uses PyTorch as well, and I think having a PyTorch alternative backend would be great since many people already have some familiarity with it. Altlhough 60 ms is hardly slow to begin with!

@orenmatar
Copy link
Contributor Author

@winedarksea I think 60ms is slow compared to most linear regressions which is what Prophet ultimately is. some datasets really do contain hundreds of thousands of items if not millions, so these gains could be meaningful. but for sure it is more important to optimize the predict method... Thanks for the update!

@nicolaerosia
Copy link

@orenmatar do you have a fork where you implemented the changes? Thanks!

@orenmatar
Copy link
Contributor Author

@nicolaerosia I don't... I only implemented it as a side function to be called separately and @winedarksea started working on a fork. Not too difficult to replace the existing confidence interval with the new one. I'll be happy to help if there's a real chance it will be incorporated into the main package.

@nicolaerosia
Copy link

@orenmatar I have hopes for inclusion, it seems like @tcuongd is active maintaining!
@winedarksea could you please share your branch? I see something here, https://github.com/winedarksea/prophet/tree/vectorization but it is not related

@winedarksea
Copy link
Contributor

@nicolaerosia sure thing.
So right now, I am working with the idea that the "old way" (unmodified) will be run whenever mcmc_sampling=True. The reason for this is that our 'new' function will lose some of the inner functional calls (buried in slow for loops) that are not hidden and are possibly maybe being used by people and that we don't want to break. See my post above.
Basically I still have unresolved feelings on what the best way to change the code base is, since we don't want to break anything people may be depending on

@nicolaerosia
Copy link

@winedarksea I think it's a good idea, do you have a branch I can take a look at? Thanks a lot!

@winedarksea
Copy link
Contributor

sure, I need to publish my changes, I'll try to do that tomorrow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants