-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-Gaussian likelihoods #1500
Comments
@bletham ive had #337 on my todo list for a while now so I can get started on this after #1472 gets merged in. The other idea I had was for robust regression by switching the likelihood to a Cauchy or t-distribution to handle extreme values which could close #229 - although I haven’t tried this in prophet yet. Seems there’s a reasonable demand for #307 as well . Do you have a rough date in mind for the 0.7 release ? Assuming all these would also need to be ported to R as well. Given the lockdown, i have a bit more time on my hands :) |
That's great. My impression is that a significant part of the use-case for #307 (trends that saturate on only one end) is for time series that are known to always be positive, and so actually the NB likelihood (or Gamma likelihood as shown in #1492!) may actually be the right model as opposed to doing it via the trend. There is some downside also to expanding the number of trends / likelihoods supported. From the development point of view, filling up the code with switch statements makes the code harder to read / maintain. That will especially be the case when switching independently on both trend and likelihood. It can also add an additional barrier to using the package. For instance, a user may feel like they have to decide if they should use a Gaussian likelihood or a student-t likelihood, perhaps without really knowing what either of those means. And then they may have no idea what dof they should choose, and that this adds another parameter that they have to sweep over. There is certainly a lot that can be done with documentation to help with these decisions, but there's also certainly a cost to a proliferation of model options. To me, NB likelihood definitely is on the "worth-it" side of that trade-off. Especially because the decision of when to use it should be pretty clear from the setting. But as we get further into the tail of options that are useful to a smaller population of time series, there I think the ideal approach would be to not have these built-in to the package, but rather make it easy for people to make these sorts of customizations themselves. That way more expert users would have the flexibility they want without muddying the waters for those without statistical modeling expertise. But as I note above that will be quite an effort. For v0.7: we don't have a particular date in mind right now for the release, but if we can get the NB likelihood in we'd definitely push right after that. That'd be a really great improvement. If you wanted to implement it in Py I'd be happy to do the R translation! |
@bletham yeh thats fine with me - i can start on the implementation of NB likelihood in Py this weekend. I agree with adding numerous switch statements for trend and likelihood can be difficult to maintain in the future. I suppose for those extra likelihood options - there could be a dedicated page in the docs for advanced users to implement it themselves - are you considering having a separate class for the users to pass in trend and likelihood definitions ? I assume this would be for the initialisation and predictions - but what about the edits with regards to the model fitting in stan? Im assuming the option of having everything in stan #340 is now not possible due to #501 ? |
Yeah, I don't think it's going to be possible to implement a new trend or likelihood without making changes to both stan and Py/R. You make a good point about documentation. #1466 provides a really clean example of what needs to be done to try a new trend. So perhaps pointing to that PR as a guide to follow is a good first step that would be sufficient for many who want to implement new trends. The commit for NB likelihood could be similar. (Although #1492 shows that for some likelihoods it will be a bit more complicated; basically it will be more complicated if the mean isn't directly one of the parameters.) |
Hi @ryankarlos and @bletham , |
@palexbg i'm not sure about specific resources but i usually just have a look at stan documentation - https://mc-stan.org/docs/2_23/functions-reference/unbounded-continuous-distributions.html or https://mc-stan.org/docs/2_23/functions-reference/unbounded-discrete-distributions.html for example or have a look at the Stan forums if I'm having trouble with the implementation. |
I just added a first take at negative binomial likelihood in the nb_likelihood branch, with a diff and example notebook in #1544. It seems to be doing something reasonable but I'd definitely like to test it on some real problems. I used a logistic hinge function in to ensure that the mean being passed into the negative binomial function is always positive. I'm a bit concerned about potential numerical issues from having a steep hinge ( The variance also seems to be a bit high. The prior might not be quite right (I basically pulled it out of a hat, it'll need some more tuning). My test dataset also didn't use noise actually from the NB distribution, so it could be because of that misspecification. We'll have to see how it looks on real datasets. If you have any real datasets, please post them!! |
I have tried to use the NB likelihood on this data set from Kaggle. This is an hourly count data. I took the first 19 days from the training set. As noted before, the variance seems high with NB. Another thing that I remember that bothered me a lot is the poor fit to large values, although in the case of NB some of them are within the uncertainty bounds, they are still too many to be considered outliers I think. Lastly, I did go through #1544 and the move from Stan parameterization to scipy parameterization seems ok to me @bletham (as was discussed in #337). The code for the plots (didn't find a suitable place where I have permissions to upload a notebook here): import pandas as pd
import matplotlib.pyplot as plt
from fbprophet import Prophet
# read the data
train: pd.DataFrame = pd.read_csv('train.csv')
train['datetime'] = pd.to_datetime(train['datetime'])
# leave the first 19 days
df: pd.DataFrame = train[train['datetime'] < '2011-01-20 00:00:00']
df = df.loc[:, ['datetime', 'count']]
df.rename(columns={'datetime': 'ds', 'count': 'y'}, inplace=True)
# Propheting
# normal prophet
df['cap'] = 250
m = Prophet(growth='logistic', seasonality_mode='multiplicative')
m.fit(df)
fcast = m.predict()
fig = m.plot(fcast)
plt.title('Gaussian Likelihood')
plt.show()
# negative-binomial likelihood
m = Prophet(growth='logistic',
seasonality_mode='multiplicative',
likelihood='NegBinomial')
m.fit(df)
fcast = m.predict()
fig = m.plot(fcast)
plt.title('NB Likelihood')
plt.show() |
This is a great example, thanks for coming up with it! I made a tweak to account for the different shape of daily seasonality for working days vs. non working days: m = Prophet(growth='logistic', daily_seasonality=False)
m.add_seasonality(name='workingday_daily', period=1, fourier_order=5, condition_name='workingday')
m.add_seasonality(name='notworkingday_daily', period=1, fourier_order=5, condition_name='notworkingday') I'm not totally sure what to make of these. The model is doing a great job at picking up the different seasonality shapes (unimodal on non working days, bimodal on working days due to rush hours. But this time series does have more variance on the working-day peaks than it does on the non-working day peaks, and neither likelihood is really able to capture that correctly. With the NB likelihood, the fact that the noise can be skewed is leading the model to underestimate the workingday peak while overestimating the noise on non-working days . Gaussian likelihood also does both of these things, but (probably because the noise distribution is symmetric) not as badly; working day peaks are (better?) estimated as being quite a bit higher than non-working day peaks, and there isn't as much noise inflation. I wonder if these characteristics of the noise distribution suggest that the main thing we need is the logistic link function (that is forcing the whole forecast to be positive) and it doesn't really matter as much if the likelihood on top of that is Gaussian or NB, or even that Gaussian may be preferable to NB in a dataset such as this. |
Well, I have stumbled upon another count data set, also involving bikes. It can be found in this link. Unless I am missing something very basic, the performance is bad and the noise inflation is huge. Also, during the fitting process I kept on receiving these exceptions (maybe that's the reason for the bad performance):
The model did converge though. Plots and code below. I still don't get why the gaussian version can't seem to fit "high points" (here the underfitted points are not even considered as high...). import pandas as pd
import matplotlib.pyplot as plt
from fbprophet import Prophet
# read the data
df: pd.DataFrame = pd.read_csv('Fremont_Bridge_Bicycle_Counter.csv')
df['Date'] = pd.to_datetime(df['Date'])
# fill NAs with something quick
df.fillna(0, inplace=True)
# pick one of the series
df = df.loc[:, ['Date', 'Fremont Bridge West Sidewalk']]
df.rename(columns={'Date': 'ds', 'Fremont Bridge West Sidewalk': 'y'}, inplace=True)
## fit on all data ##
# Gaussian
df['cap'] = 1000
m = Prophet(growth='logistic', seasonality_mode='multiplicative', daily_seasonality=True, weekly_seasonality=True,
yearly_seasonality=True)
m.fit(df)
fcast = m.predict()
fig = m.plot(fcast)
plt.title('Gaussian Likelihood')
plt.show()
# Negative Binomial
m = Prophet(growth='logistic', seasonality_mode='multiplicative', daily_seasonality=True, weekly_seasonality=True,
yearly_seasonality=True, likelihood='NegBinomial')
m.fit(df)
fcast = m.predict()
fig = m.plot(fcast)
plt.title('Negative-Binomial Likelihood')
plt.show()
## fit for 2 years only ##
df: pd.DataFrame = df[df['Date'] < '2015-01-01 00:00:00']
df['cap'] = 800
# Gaussian
m = Prophet(growth='logistic', seasonality_mode='multiplicative', daily_seasonality=True, weekly_seasonality=True,
yearly_seasonality=True)
m.fit(df)
fcast = m.predict()
fig = m.plot(fcast)
plt.title('Gaussian Likelihood, 2 Years')
plt.show()
# Negative Binomial
m = Prophet(growth='logistic', seasonality_mode='multiplicative', daily_seasonality=True, weekly_seasonality=True,
yearly_seasonality=True, likelihood='NegBinomial')
m.fit(df)
fcast = m.predict()
fig = m.plot(fcast)
plt.title('Negative-Binomial Likelihood, 2 Years')
plt.show() |
@oren0e thanks for sharing this, yeah the numerical issues of the hinge function are pretty significant on this problem. I've been doing some testing of various approaches, and in #1668 just posted an approach of adjusting the future trend simulation to keep it positive. I tried it on this dataset and it gives this: The model fit isn't great because we have here a case of daily seasonality-within-seasonality (there is yearly seasonality in the magnitude of daily seasonality), but the future uncertainty at least seems reasonable. (Aggregating counts by day makes the model much more appropriate, though then there isn't really any challenge keeping positive) |
@bletham can you explain what do you mean by
I'm not sure I fully understand the concept of seasonality-within-seasonality. |
Basically the multiplicative model is
where each of the seasonalities is stationary; so, If we factor out the yearly seasonality, it is
So, yearly seasonality adds to What we see in this time series is a little different. There is a spike every day. That spike is daily seasonality; it starts of low in the morning, peaks during the day, and goes back to being low overnight for the next day. By the model structure, |
I wrote up a version of negative binomial and had it fit the data in #1500 (comment), I couldn't avoid the readability issues that come with having switch statements but by using a log link function I seem to have avoided some of the numerical issues. This properly accounts for heteroskedasticity in count data and credible intervals don't lie on non-integers. I also added a function to sample time series from the prior distribution to check prior distributions of functionals and catch numerical issues that might come from prior parameters. Could this be useful as a PR? Code at https://github.com/blakemoya/prophet/tree/nb |
Hi! For those who might find it useful, I've added support for Prophet-like models with Negative Binomial and Gamma likelihoods in the Prophetverse library. This implementation uses a link function similar to a ReLU:
|
I'm creating this issue as a place to discuss supporting non-Gaussian likelihoods.
There's been a fair amount of discussion on this for Poisson / Negative Binomial likelihoods (#337). This comes up in problems with small count data. I think a Negative Binomial likelihood would be a very high-value add to the package.
#1492 provides a working example of a Gamma likelihood. As discussed there, the application is for positive and positively-skewed time series.
I see two main ways of getting this functionality in place.
Handling it like trend
Selecting different likelihoods is related to the current functionality of selecting different trends. In the current release there are two trends: piecewise linear or piecewise logistic. #1466 just landed and adds a third (flat). In the Stan code we pass in an indicator for which trend to use:
prophet/python/stan/unix/prophet.stan
Line 98 in 16e632a
and then compute the trend using the appropriate function:
prophet/python/stan/unix/prophet.stan
Lines 118 to 124 in 16e632a
Then, anywhere in the Py code where the trend is computed, we similarly use a switch to select the correct function. Here is an example, and #1466 is a great example of all of the places where the trend function is used:
prophet/python/fbprophet/forecaster.py
Lines 1315 to 1323 in 16e632a
We could do a similar thing with the likelihood. For Gaussian and Negative Binomial this would be fairly clean. There could be a switch on the likelihood in the Stan code here:
prophet/python/stan/unix/prophet.stan
Line 136 in 16e632a
and then in the Py code, the likelihood doesn't play a huge role. In fact the only place in the Py code where the Gaussian assumption shows up is when estimating uncertainty intervals here:
prophet/python/fbprophet/forecaster.py
Line 1466 in 16e632a
For NB, that would just have to be replaced with simulating from the NB distribution.
However, there are three main issues with this approach:
Different likelihoods may have different parameters. Gaussian has the noise variance. Poisson has no parameter. Negative Binomial has the dispersion parameter phi (https://mc-stan.org/docs/2_20/functions-reference/nbalt.html). Extend model to other distributions #1492 shows the differences for Gamma. Trying to override the existing Gaussian noise parameter,
prophet/python/stan/unix/prophet.stan
Line 132 in 16e632a
may not be reasonable due to changes in valid ranges / reasonable priors.
The posterior mean might require distribution-specific computation. With the Gaussian likelihood we have something like
y ~ Normal(trend(t) + seasonality(t) + regressors(t), sigma)
so we can computeyhat
(the posterior mean) directly astrend(t) + seasonality(t) + regressors(t)
. The same would be true for the Poisson or Negative Binomial likelihoods. But it isn't true for the Gamma likelihood, where the mean is alpha/beta. I wonder if the Gamma distribution could be reparameterized in such a way to get around this, but that would probably require implementing the reparameterized distribution in Stan and so it'd be nice to have a cleaner solution that doesn't require that.Trying out custom link functions would require making changes in several places throughout the code. The actual amount of code would probably be relatively small, but it would take a lot of effort for someone to be sure that they'd gotten all of the right places. Extend model to other distributions #1492 is along the lines of what would be required, as is add implementation for constant trend in Python #1466.
Creating a Model class
I think the alternative would be to add a whole bunch more structure to how the model is defined and make it so the available modeling options (primarily trend and likelihood) could be customized just by implementing a new model class (let's call it Model here). The Model class would define the trend and likelihood. All of the interaction in the Prophet class with either the trend or the link function would be handled by calling this class in a way that has a uniform API for all trends/likelihoods. This would handle #705 at the same time as allowing for custom likelihoods. For instance, switch statements like this:
prophet/python/fbprophet/forecaster.py
Lines 1315 to 1323 in 16e632a
would be replaced by something like
Code like
prophet/python/fbprophet/forecaster.py
Lines 1216 to 1219 in 16e632a
might be replaced by something like
The main advantage of this is that it would enable trying out custom trends or link functions by implenting code in a single place. But there are some challenges too that I haven't thought through fully enough yet:
trend
component that can be plotted.What to do next
I'd be interested to hear people's thoughts on how to make the model likelihood and/or trend easily customizable.
My most immediate interest in changing the likelihood is to add a Negative Binomial likelihood. I think this could be done pretty easily with the switch approach, and the code change required would be even less than #1466. Since this is one of the issues that comes up the most frequently I'm inclined to take the most immediate path on this and just get it done that way. It could also be a nice test case that ensures there aren't any other considerations from swapping out the likelihood that I've missed.
If we want to extend to any additional likelihoods, then I think making some type of an abstraction to produce a clean interface between Prophet and the likelihood would be necessary. The results in #1492 for the Gamma likelihood are quite compelling, and I think makes a perfect example of what needs to be supported by such an abstraction.
The text was updated successfully, but these errors were encountered: