-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Negative Binomial or Poisson to handle counts data? #337
Comments
Stan has negative binomial parameterizations that don't require constraints, such as neg_binomial_2. I think this is a very reasonable thing to do. Basically the model is
where
My only concern would be that if the count values are relatively small then we would likely need a lot of data to reasonably estimate seasonality. I think MCMC would be important here. On the other hand if counts are large, then a normal distribution probably provides a reasonable enough approximation. Worth doing though! |
I recently modified the underlying Stan model to assume a negative binomial distribution like that suggested above. My concern at the moment is that it appears that prophet scales the target values before fitting. This cannot be done when using such a distribution which assumes integer target values. Are there any thoughts on the sensitivity of the prophet model to the scaling of the data? For instance, are parameters initialized under the assumption that the target values will be between 0 and 1? |
Good point about the scaling for negative binomial link function. There is hard-coded in the Stan a prior on the noise term of N(0, 1/2). This is rather weak for y \in [0, 1], but would probably need to be adjusted if the data were not scaled. The priors on changepoint delta, seasonality beta, and holiday beta would also be scale-dependent. These can be directly set with I'm pretty sure there isn't anything else that is scale dependent or assumes y <= 1. One other thing to note is that the Gaussian link function is encoded in both the Stan and in the R/Py, so you'll need to also adjust the |
Regarding the poisson link function - I thought it will be out in the 0.5 version. Do you know when you'll add this feature? |
We were working on getting it in place with #865 but that got stuck by some perf issues in rstan for which we're still trying to figure out the best workaround, so unfortunately not yet. |
Hi @bletham . Thank you for this amazing package called fbprophet!! It's moving the forecasting world to a new level. |
Could we assume it is fine in pystan , if so, could you please release a workable python version for now ? |
I was targeting for this to happen after #501, which would have made it easier / more generic, but after the addition of the cmdstanpy backend I've decided to no longer pursue #501 so this should just be done directly. This should be able to look a lot like how there is currently a switch between the linear and logistic trends; here we would have a switch between link functions. The link function is defined in Stan, right here: prophet/python/stan/unix/prophet.stan Line 117 in 46e5611
prophet/python/stan/unix/prophet.stan Line 124 in 46e5611
So that would need a switch to alternatively use a NB/Poisson. What is currently prophet/python/fbprophet/forecaster.py Lines 1422 to 1427 in 46e5611
Lines 1579 to 1583 in 46e5611
And I believe that should be it. So code-wise, this should not require massive changes. The main questions that I have is around validation (checking on some realistic small-count datasets that this is doing something reasonable / making sure that the fitting doesn't fail / do we need NB or is Poisson sufficient?). |
@bletham I would be interested to contribute somehow to the effort of bringing these count-data likelihoods into the library since this was super important for me at my previous job. What can I do? |
@oren0e sorry for the slow reply, and thanks for being willing to contribute! |
@bletham sadly I don't have the data I was working on since it was left on my previous company's servers. |
I implemented a NB likelihood in #1544. There were significant numerical issues around the hinge function that is required to convert the latent forecast into a positive process rate. Discussion of this is in #1500. As discussed there, I'm not very optimistic about the NB likelihood being broadly useful in Prophet due to these challenges. For the purposes of handling small-count data (especially when we're trying to get a forecast that stays positive), there are some much more robust approaches that are explored in #1668 that I think provide a better direction than a NB likelihood. So in light of the issues in my PR, this effort is deprioritized and probably won't ever make it into the package. Though interested individuals can of course patch in my PR and try it out! |
There is a simple regression algorithm for counts data, called Poisson regression. This algorithm assumes that every regressor has a multiplicative effect. It's similar to computing the log of the data, except it works even when the data has zeroes.
It's conceivable that you could replace the Poisson distribution with the more general Negative Binomial distribution. The NB distribution is a generalization of the Poisson that allows the mean to be different from the variance. In contrast, in a Poisson distribution the mean is always the same as the variance.
The main difficulty with just changing the Normal distribution to a Negative Binomial is it's then necessary to add the constraint that$0 < \mu < \sigma^2$ .
Does this seem like a good idea?
The text was updated successfully, but these errors were encountered: