Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-Gaussian likelihoods #1500

Open
bletham opened this issue May 17, 2020 · 15 comments
Open

Non-Gaussian likelihoods #1500

bletham opened this issue May 17, 2020 · 15 comments

Comments

@bletham
Copy link
Contributor

bletham commented May 17, 2020

I'm creating this issue as a place to discuss supporting non-Gaussian likelihoods.

There's been a fair amount of discussion on this for Poisson / Negative Binomial likelihoods (#337). This comes up in problems with small count data. I think a Negative Binomial likelihood would be a very high-value add to the package.

#1492 provides a working example of a Gamma likelihood. As discussed there, the application is for positive and positively-skewed time series.

I see two main ways of getting this functionality in place.

Handling it like trend
Selecting different likelihoods is related to the current functionality of selecting different trends. In the current release there are two trends: piecewise linear or piecewise logistic. #1466 just landed and adds a third (flat). In the Stan code we pass in an indicator for which trend to use:

int trend_indicator; // 0 for linear, 1 for logistic, 2 for flat

and then compute the trend using the appropriate function:
if (trend_indicator == 0) {
trend = linear_trend(k, m, delta, t, A, t_change);
} else if (trend_indicator == 1) {
trend = logistic_trend(k, m, delta, t, cap, A, t_change, S);
} else if (trend_indicator == 2) {
trend = flat_trend(m, T);
}

Then, anywhere in the Py code where the trend is computed, we similarly use a switch to select the correct function. Here is an example, and #1466 is a great example of all of the places where the trend function is used:
if self.growth == 'linear':
trend = self.piecewise_linear(t, deltas, k, m, self.changepoints_t)
elif self.growth == 'logistic':
cap = df['cap_scaled']
trend = self.piecewise_logistic(
t, cap, deltas, k, m, self.changepoints_t)
elif self.growth == 'flat':
# constant trend
trend = self.flat_trend(t, m)

We could do a similar thing with the likelihood. For Gaussian and Negative Binomial this would be fairly clean. There could be a switch on the likelihood in the Stan code here:


and then in the Py code, the likelihood doesn't play a huge role. In fact the only place in the Py code where the Gaussian assumption shows up is when estimating uncertainty intervals here:
noise = np.random.normal(0, sigma, df.shape[0]) * self.y_scale

For NB, that would just have to be replaced with simulating from the NB distribution.

However, there are three main issues with this approach:

  • Different likelihoods may have different parameters. Gaussian has the noise variance. Poisson has no parameter. Negative Binomial has the dispersion parameter phi (https://mc-stan.org/docs/2_20/functions-reference/nbalt.html). Extend model to other distributions #1492 shows the differences for Gamma. Trying to override the existing Gaussian noise parameter,

    sigma_obs ~ normal(0, 0.5);

    may not be reasonable due to changes in valid ranges / reasonable priors.

  • The posterior mean might require distribution-specific computation. With the Gaussian likelihood we have something like y ~ Normal(trend(t) + seasonality(t) + regressors(t), sigma) so we can compute yhat (the posterior mean) directly as trend(t) + seasonality(t) + regressors(t). The same would be true for the Poisson or Negative Binomial likelihoods. But it isn't true for the Gamma likelihood, where the mean is alpha/beta. I wonder if the Gamma distribution could be reparameterized in such a way to get around this, but that would probably require implementing the reparameterized distribution in Stan and so it'd be nice to have a cleaner solution that doesn't require that.

  • Trying out custom link functions would require making changes in several places throughout the code. The actual amount of code would probably be relatively small, but it would take a lot of effort for someone to be sure that they'd gotten all of the right places. Extend model to other distributions #1492 is along the lines of what would be required, as is add implementation for constant trend in Python #1466.

Creating a Model class

I think the alternative would be to add a whole bunch more structure to how the model is defined and make it so the available modeling options (primarily trend and likelihood) could be customized just by implementing a new model class (let's call it Model here). The Model class would define the trend and likelihood. All of the interaction in the Prophet class with either the trend or the link function would be handled by calling this class in a way that has a uniform API for all trends/likelihoods. This would handle #705 at the same time as allowing for custom likelihoods. For instance, switch statements like this:

if self.growth == 'linear':
trend = self.piecewise_linear(t, deltas, k, m, self.changepoints_t)
elif self.growth == 'logistic':
cap = df['cap_scaled']
trend = self.piecewise_logistic(
t, cap, deltas, k, m, self.changepoints_t)
elif self.growth == 'flat':
# constant trend
trend = self.flat_trend(t, m)

would be replaced by something like

trend = self.model.evaluate_trend(df, self.params, self.changepoints_t)

Code like

df2['yhat'] = (
df2['trend'] * (1 + df2['multiplicative_terms'])
+ df2['additive_terms']
)

might be replaced by something like

df2['yhat'] = self.model.evaluate_yhat(df2['trend'] * (1 + df2['multiplicative_terms']) + df2['additive_terms'])

The main advantage of this is that it would enable trying out custom trends or link functions by implenting code in a single place. But there are some challenges too that I haven't thought through fully enough yet:

  • It will be a significant effort to scope out this class and figure out where to draw the line between what stays in the main Prophet class and what is relegated to the Model class and thus made customizable. It seems like computing the linear regressor feature matrix seems would stay in the Prophet class, as would handling future trend uncertainty (which means the piecewise trend formulation would be required for any trend).
  • The definitions of model components may depend on the likelihood. The Gamma likelihood in Extend model to other distributions #1492 is a good example of this, where there is now a trend for the alpha parameter and a separate trend for the beta parameter. This will have major implications throughout the stack, including for things like plotting where we assume that there is always a trend component that can be plotted.

What to do next
I'd be interested to hear people's thoughts on how to make the model likelihood and/or trend easily customizable.

My most immediate interest in changing the likelihood is to add a Negative Binomial likelihood. I think this could be done pretty easily with the switch approach, and the code change required would be even less than #1466. Since this is one of the issues that comes up the most frequently I'm inclined to take the most immediate path on this and just get it done that way. It could also be a nice test case that ensures there aren't any other considerations from swapping out the likelihood that I've missed.

If we want to extend to any additional likelihoods, then I think making some type of an abstraction to produce a clean interface between Prophet and the likelihood would be necessary. The results in #1492 for the Gamma likelihood are quite compelling, and I think makes a perfect example of what needs to be supported by such an abstraction.

@ryankarlos
Copy link
Contributor

ryankarlos commented May 17, 2020

@bletham ive had #337 on my todo list for a while now so I can get started on this after #1472 gets merged in. The other idea I had was for robust regression by switching the likelihood to a Cauchy or t-distribution to handle extreme values which could close #229 - although I haven’t tried this in prophet yet. Seems there’s a reasonable demand for #307 as well . Do you have a rough date in mind for the 0.7 release ? Assuming all these would also need to be ported to R as well. Given the lockdown, i have a bit more time on my hands :)

@bletham
Copy link
Contributor Author

bletham commented May 18, 2020

That's great. My impression is that a significant part of the use-case for #307 (trends that saturate on only one end) is for time series that are known to always be positive, and so actually the NB likelihood (or Gamma likelihood as shown in #1492!) may actually be the right model as opposed to doing it via the trend.

There is some downside also to expanding the number of trends / likelihoods supported. From the development point of view, filling up the code with switch statements makes the code harder to read / maintain. That will especially be the case when switching independently on both trend and likelihood. It can also add an additional barrier to using the package. For instance, a user may feel like they have to decide if they should use a Gaussian likelihood or a student-t likelihood, perhaps without really knowing what either of those means. And then they may have no idea what dof they should choose, and that this adds another parameter that they have to sweep over.

There is certainly a lot that can be done with documentation to help with these decisions, but there's also certainly a cost to a proliferation of model options. To me, NB likelihood definitely is on the "worth-it" side of that trade-off. Especially because the decision of when to use it should be pretty clear from the setting. But as we get further into the tail of options that are useful to a smaller population of time series, there I think the ideal approach would be to not have these built-in to the package, but rather make it easy for people to make these sorts of customizations themselves. That way more expert users would have the flexibility they want without muddying the waters for those without statistical modeling expertise. But as I note above that will be quite an effort.

For v0.7: we don't have a particular date in mind right now for the release, but if we can get the NB likelihood in we'd definitely push right after that. That'd be a really great improvement. If you wanted to implement it in Py I'd be happy to do the R translation!

@ryankarlos
Copy link
Contributor

ryankarlos commented May 19, 2020

@bletham yeh thats fine with me - i can start on the implementation of NB likelihood in Py this weekend. I agree with adding numerous switch statements for trend and likelihood can be difficult to maintain in the future. I suppose for those extra likelihood options - there could be a dedicated page in the docs for advanced users to implement it themselves - are you considering having a separate class for the users to pass in trend and likelihood definitions ? I assume this would be for the initialisation and predictions - but what about the edits with regards to the model fitting in stan? Im assuming the option of having everything in stan #340 is now not possible due to #501 ?

@bletham
Copy link
Contributor Author

bletham commented May 19, 2020

Yeah, I don't think it's going to be possible to implement a new trend or likelihood without making changes to both stan and Py/R.

You make a good point about documentation. #1466 provides a really clean example of what needs to be done to try a new trend. So perhaps pointing to that PR as a guide to follow is a good first step that would be sufficient for many who want to implement new trends. The commit for NB likelihood could be similar. (Although #1492 shows that for some likelihoods it will be a bit more complicated; basically it will be more complicated if the mean isn't directly one of the parameters.)

@palexbg
Copy link

palexbg commented May 20, 2020

Hi @ryankarlos and @bletham ,
I have been also wanting to investigate and contribute to this extension for some time, I am happy to see that it is being picked up!
Would any of you have a suggestion on where to start (any previous work done specifically on this) or of any paper(s) that could be used to do sanity checks on?

@ryankarlos
Copy link
Contributor

ryankarlos commented May 25, 2020

@palexbg i'm not sure about specific resources but i usually just have a look at stan documentation - https://mc-stan.org/docs/2_23/functions-reference/unbounded-continuous-distributions.html or https://mc-stan.org/docs/2_23/functions-reference/unbounded-discrete-distributions.html for example or have a look at the Stan forums if I'm having trouble with the implementation.
For getting other ideas for other non gaussian likelihoods/use cases, sometimes I find this useful https://docs.pymc.io/nb_examples/index.html.

@bletham
Copy link
Contributor Author

bletham commented Jun 18, 2020

I just added a first take at negative binomial likelihood in the nb_likelihood branch, with a diff and example notebook in #1544. It seems to be doing something reasonable but I'd definitely like to test it on some real problems.

I used a logistic hinge function in to ensure that the mean being passed into the negative binomial function is always positive. I'm a bit concerned about potential numerical issues from having a steep hinge (exp(100 * y_mean)) but it worked fine in my test dataset, but we'll have to see how it works more broadly.

The variance also seems to be a bit high. The prior might not be quite right (I basically pulled it out of a hat, it'll need some more tuning). My test dataset also didn't use noise actually from the NB distribution, so it could be because of that misspecification. We'll have to see how it looks on real datasets.

If you have any real datasets, please post them!!

Screenshot-20200617173113-1112x650

@oren0e
Copy link

oren0e commented Jul 19, 2020

I have tried to use the NB likelihood on this data set from Kaggle. This is an hourly count data. I took the first 19 days from the training set.
I used the same basic code from ben's notebook (#1544)
The plots:
gaus_likelihood
nb_likelihood

As noted before, the variance seems high with NB. Another thing that I remember that bothered me a lot is the poor fit to large values, although in the case of NB some of them are within the uncertainty bounds, they are still too many to be considered outliers I think.

Lastly, I did go through #1544 and the move from Stan parameterization to scipy parameterization seems ok to me @bletham (as was discussed in #337).

The code for the plots (didn't find a suitable place where I have permissions to upload a notebook here):

import pandas as pd
import matplotlib.pyplot as plt
from fbprophet import Prophet

# read the data
train: pd.DataFrame = pd.read_csv('train.csv')
train['datetime'] = pd.to_datetime(train['datetime'])

# leave the first 19 days
df: pd.DataFrame = train[train['datetime'] < '2011-01-20 00:00:00']
df = df.loc[:, ['datetime', 'count']]
df.rename(columns={'datetime': 'ds', 'count': 'y'}, inplace=True)

# Propheting
# normal prophet
df['cap'] = 250
m = Prophet(growth='logistic', seasonality_mode='multiplicative')
m.fit(df)
fcast = m.predict()
fig = m.plot(fcast)
plt.title('Gaussian Likelihood')
plt.show()

# negative-binomial likelihood
m = Prophet(growth='logistic',
            seasonality_mode='multiplicative',
            likelihood='NegBinomial')
m.fit(df)
fcast = m.predict()
fig = m.plot(fcast)
plt.title('NB Likelihood')
plt.show()

@bletham
Copy link
Contributor Author

bletham commented Jul 20, 2020

This is a great example, thanks for coming up with it!

I made a tweak to account for the different shape of daily seasonality for working days vs. non working days:

m = Prophet(growth='logistic', daily_seasonality=False)
m.add_seasonality(name='workingday_daily', period=1, fourier_order=5, condition_name='workingday')
m.add_seasonality(name='notworkingday_daily', period=1, fourier_order=5, condition_name='notworkingday')

And that produces these fits:
gaussian
nb

I'm not totally sure what to make of these. The model is doing a great job at picking up the different seasonality shapes (unimodal on non working days, bimodal on working days due to rush hours. But this time series does have more variance on the working-day peaks than it does on the non-working day peaks, and neither likelihood is really able to capture that correctly. With the NB likelihood, the fact that the noise can be skewed is leading the model to underestimate the workingday peak while overestimating the noise on non-working days . Gaussian likelihood also does both of these things, but (probably because the noise distribution is symmetric) not as badly; working day peaks are (better?) estimated as being quite a bit higher than non-working day peaks, and there isn't as much noise inflation.

I wonder if these characteristics of the noise distribution suggest that the main thing we need is the logistic link function (that is forcing the whole forecast to be positive) and it doesn't really matter as much if the likelihood on top of that is Gaussian or NB, or even that Gaussian may be preferable to NB in a dataset such as this.

@oren0e
Copy link

oren0e commented Aug 19, 2020

Well, I have stumbled upon another count data set, also involving bikes. It can be found in this link.
I have tried to test the NB likelihood on this data, first on all of it, then just on 2 years.

Unless I am missing something very basic, the performance is bad and the noise inflation is huge. Also, during the fitting process I kept on receiving these exceptions (maybe that's the reason for the bad performance):

Exception: neg_binomial_2_lpmf: Location parameter[1] is 0, but must be > 0!  (in 'unknown file name' at line 145)
Exception: neg_binomial_2_lpmf: Location parameter[3] is 0, but must be > 0!  (in 'unknown file name' at line 145)
Exception: neg_binomial_2_lpmf: Location parameter[5] is 0, but must be > 0!  (in 'unknown file name' at line 145)
Exception: neg_binomial_2_lpmf: Location parameter[149] is 0, but must be > 0!  (in 'unknown file name' at line 145)
Exception: neg_binomial_2_lpmf: Location parameter[2551] is 0, but must be > 0!  (in 'unknown file name' at line 145)

The model did converge though. Plots and code below. I still don't get why the gaussian version can't seem to fit "high points" (here the underfitted points are not even considered as high...).
seattle_gaus_all
seattle_nb_all
seattle_gaus_2y
seattle_nb_2y

import pandas as pd
import matplotlib.pyplot as plt
from fbprophet import Prophet

# read the data
df: pd.DataFrame = pd.read_csv('Fremont_Bridge_Bicycle_Counter.csv')
df['Date'] = pd.to_datetime(df['Date'])

# fill NAs with something quick
df.fillna(0, inplace=True)

# pick one of the series
df = df.loc[:, ['Date', 'Fremont Bridge West Sidewalk']]
df.rename(columns={'Date': 'ds', 'Fremont Bridge West Sidewalk': 'y'}, inplace=True)

## fit on all data ##
# Gaussian
df['cap'] = 1000

m = Prophet(growth='logistic', seasonality_mode='multiplicative', daily_seasonality=True, weekly_seasonality=True,
            yearly_seasonality=True)
m.fit(df)
fcast = m.predict()
fig = m.plot(fcast)
plt.title('Gaussian Likelihood')
plt.show()

# Negative Binomial
m = Prophet(growth='logistic', seasonality_mode='multiplicative', daily_seasonality=True, weekly_seasonality=True,
            yearly_seasonality=True, likelihood='NegBinomial')
m.fit(df)
fcast = m.predict()
fig = m.plot(fcast)
plt.title('Negative-Binomial Likelihood')
plt.show()


## fit for 2 years only ##
df: pd.DataFrame = df[df['Date'] < '2015-01-01 00:00:00']
df['cap'] = 800

# Gaussian
m = Prophet(growth='logistic', seasonality_mode='multiplicative', daily_seasonality=True, weekly_seasonality=True,
            yearly_seasonality=True)
m.fit(df)
fcast = m.predict()
fig = m.plot(fcast)
plt.title('Gaussian Likelihood, 2 Years')
plt.show()

# Negative Binomial
m = Prophet(growth='logistic', seasonality_mode='multiplicative', daily_seasonality=True, weekly_seasonality=True,
            yearly_seasonality=True, likelihood='NegBinomial')
m.fit(df)
fcast = m.predict()
fig = m.plot(fcast)
plt.title('Negative-Binomial Likelihood, 2 Years')
plt.show()

@bletham
Copy link
Contributor Author

bletham commented Sep 15, 2020

@oren0e thanks for sharing this, yeah the numerical issues of the hinge function are pretty significant on this problem. I've been doing some testing of various approaches, and in #1668 just posted an approach of adjusting the future trend simulation to keep it positive. I tried it on this dataset and it gives this:
prophet7

The model fit isn't great because we have here a case of daily seasonality-within-seasonality (there is yearly seasonality in the magnitude of daily seasonality), but the future uncertainty at least seems reasonable.

(Aggregating counts by day makes the model much more appropriate, though then there isn't really any challenge keeping positive)
prophet8

@oren0e
Copy link

oren0e commented Sep 22, 2020

@bletham can you explain what do you mean by

The model fit isn't great because we have here a case of daily seasonality-within-seasonality (there is yearly seasonality in the magnitude of daily seasonality)

I'm not sure I fully understand the concept of seasonality-within-seasonality.

@bletham
Copy link
Contributor Author

bletham commented Sep 24, 2020

Basically the multiplicative model is

y(t) = trend(t) * (1 + daily(t) + weekly(t) + yearly(t))

where each of the seasonalities is stationary; so, daily(t) repeats itself exactly every day.

If we factor out the yearly seasonality, it is

y(t) = trend(t) * (1 + daily(t) + weekly(t)) + trend(t) * yearly(t)

So, yearly seasonality adds to y(t) by an amount proportional to the current trend(t). Because yearly(t) is smooth, this will capture broad fluctuations in y throughout the year.

What we see in this time series is a little different. There is a spike every day. That spike is daily seasonality; it starts of low in the morning, peaks during the day, and goes back to being low overnight for the next day. By the model structure, daily(t) must be a fixed % of trend(t), meaning, the height of that spike can depend only on trend(t). What we see instead is that the height of the daily spike depends on day of year - it is high in the summer, low in the winter. Thus, the amount of daily seasonality is not stationary as assumed by the model, but rather there is a yearly seasonal effect in the daily seasonality. In the model, yearly seasonality applies a fixed offset throughout the entire day; it can't change the magnitude of the daily seasonality and so you get bad fit as a result.

@blakemoya
Copy link

I wrote up a version of negative binomial and had it fit the data in #1500 (comment), I couldn't avoid the readability issues that come with having switch statements but by using a log link function I seem to have avoided some of the numerical issues.

fig

This properly accounts for heteroskedasticity in count data and credible intervals don't lie on non-integers. I also added a function to sample time series from the prior distribution to check prior distributions of functionals and catch numerical issues that might come from prior parameters.

Could this be useful as a PR? Code at https://github.com/blakemoya/prophet/tree/nb

@felipeangelimvieira
Copy link

Hi! For those who might find it useful, I've added support for Prophet-like models with Negative Binomial and Gamma likelihoods in the Prophetverse library. This implementation uses a link function similar to a ReLU:

def _to_positive(x, threshold):
    return jnp.where(x < threshold, jnp.exp(x - threshold) * threshold, x)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants