Using boxcox transforamtion #647

momonala · 2018-08-14T10:19:07Z

Hi, thanks for the great forecasting tool. Though this is not directly a part of the forecasting code, I saw an issue regarding applying boxcox and transformations, so maybe this is an ok forum to ask my question.

I am currently having some trouble configuring the boxcox transformation to normalize data. When applying boxcox, I get better a better loss (mean average error) than not applying it. However, even after applying the inverse boxcox transformation, I get values in the seasonality decomposition part that do not seem to be correct. Specifically, the sum of trend+yearly+weekly do not add up to yhat. Below is an example of not applying boxcox:

And the results of (yhat-trend-yearly-weekly).plot() is a flat line at zero. Perfect, as expected. However, when applying box cox, predicting a forecast, inverting, then plotting, I get this. Notice how the trend, yearly, and weekly components do not add up to yhat.

I can prove this by plotting (yhat-trend-yearly-weekly).plot() in blue. I also overlayed an orange plot of yearly*5000 to show that the residual still models the yearly trend.

I wrapped prophet in my own class, but here is the psudeo code of what I am doing:

from scipy.stats import boxcox

class Forecast:
    def _inv_box(self, y):
        if self._lambda == 0:
            return np.exp(y) - 1
        else:
            return np.exp(np.log(self._lambda * y + 1) / self._lambda) - 1

    self.df  #... data frame with y and ds
    self.df['y'], self._lambda = boxcox(self.df['y'])
    
    # predict
    self.model= Prophet(daily_seasonality=True, weekly_seasonality=True, yearly_seasonality=True)
    future_data = self.model.make_future_dataframe(periods=100)
    self.df_forecast = self.model.predict(future_data)

    #invert boxcox
    self.df['y'] = self.df['y'].apply(self._inv_box)
    cols = ['yhat', 'yhat_lower', 'yhat_upper',
            'trend', 'trend_lower', 'trend_upper']
    if self.yearly:
        cols.append('yearly')
        cols.append('yearly_lower')
        cols.append('yearly_upper')
    if self.weekly:
        cols.append('weekly')
        cols.append('weekly_lower')
        cols.append('weekly_upper')
    if self.daily:
        cols.append('daily')
        cols.append('daily_lower')
        cols.append('daily_upper')
    self.df_forecast[cols] = self.df_forecast[cols].apply(self._inv_box)

    self.model.history['y'] = self.df['y']

Any insight is appreciated. Thanks.

bletham · 2018-08-14T21:49:21Z

If I understand this code and description correctly, then you are applying the inverse transformation to each component separately. The components will sum appropriately in the transformed space, but not in the untransformed space.

Suppose w and y are seasonalities in the transformed space. f is the total estimate in the transformed space. We have then that

f = w + y

But

BoxCoxInv(f) = BoxCoxInv(w + y) != BoxCoxInv(w) + BoxCoxInv(y)

in general. Any strictly convex or concave transformation will not give equality in that last step there.

So the overall estimate yhat would be meaningful to look at after it has been inverse transformed, but the transformation induces a different model for how the components combine.

If you want interpretability of the components post-transform and are trying to handle the skew in the noise, you might consider just using a log transform.
If you fit your data in the log-transformed space and apply the inverse transform (exp), then

exp(f) = exp(w + y) = exp(w) * exp(y)

That is, the inverse transform here converts additive seasonality to multiplicative seasonality. So you can still inverse transform each component independently, and then just interpret it as multiplicative instead of additive.

momonala · 2018-08-15T08:54:20Z

Thanks for the quick response. Ok this explanation makes a lot of sense, that the components do not maintain their additive properties when converting back and forth from the transformed space. The log/exp transformation worked great, thanks for the tip.

momonala closed this as completed Aug 15, 2018

bletham mentioned this issue Mar 13, 2019

How to explain Y axis values of plot_components for yearly, monthy, weekly. #876

Closed

bletham mentioned this issue May 1, 2019

Would you be interested at adding preprocessing steps? #944

Closed

mitchelloharawild mentioned this issue May 14, 2019

fable interface to the prophet model #966

Closed

bletham mentioned this issue Jan 29, 2020

When we use prophet for fault prediction, negative numbers appear, but negative numbers are not reasonable. #1234

Closed

bletham mentioned this issue Mar 5, 2020

Extra regressors: should they be made stationary before adding to the linear part #1375

Closed

bletham mentioned this issue Aug 24, 2020

Can the plot_components graph be reproduced from "model.predict" results ( is it the same )? #1635

Closed

bletham mentioned this issue Sep 15, 2020

Strategies for positive predictions #1668

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using boxcox transforamtion #647

Using boxcox transforamtion #647

momonala commented Aug 14, 2018

bletham commented Aug 14, 2018

momonala commented Aug 15, 2018

Using boxcox transforamtion #647

Using boxcox transforamtion #647

Comments

momonala commented Aug 14, 2018

bletham commented Aug 14, 2018

momonala commented Aug 15, 2018