Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model seasonality grows too much #1673

Closed
justethomas opened this issue Sep 15, 2020 · 2 comments
Closed

model seasonality grows too much #1673

justethomas opened this issue Sep 15, 2020 · 2 comments

Comments

@justethomas
Copy link

justethomas commented Sep 15, 2020

Hi everyone,

I seem to have an overfit problem with my model regarding the growth of the prediction over time. you can see the prophet prediction in blue compared with a tbats prediction in green and the serie in red

image

I don't have this kind of results on other tests with train sets more stables over time. Here's a snippet of my code.

def hyperparameter_tuning(df: pd.DataFrame) -> pd.DataFrame:
    all_params = [dict(zip(PARAM_GRID.keys(), v)) for v in itertools.product(*PARAM_GRID.values())]

    mse = []

    for params in all_params:

        m = Prophet(**params).fit(df)
        df_cv = cross_validation(
            m,
            initial = '{} hours'.format(len(train_p)),
            horizon = '147 hours',
        )

        df_p = performance_metrics(df_cv, rolling_window = 1)
        mse.append(df_p['mse'].values[0])

    tuning_results = pd.DataFrame(all_params)
    tuning_results['mse'] = mse
    tuning_results.sort_values('mse', inplace = True)

    return tuning_results

def prophet_forecast(m, train: pd.DataFrame, test: pd.DataFrame, changepoint_prior_scale: float,
                     seasonality_prior_scale: float) -> pd.DataFrame:
    m.weekly_seasonality = args.weekly_weight
    m.daily_seasonality = args.daily_weight
    m.seasonality_mode = 'additive'
    m.changepoint_prior_scale = changepoint_prior_scale,
    m.changepoint_prior_scale = seasonality_prior_scale
    m.growth = 'linear'

    m.stan_backend.logger = None
    m.fit(train)

    future = m.make_future_dataframe(periods = len(test), freq = 'H')

    forecast = m.predict(future)

    return forecast.tail(len(test))

params = hyperparameter_tuning(df)

m = Prophet()

prophet_prediction = prophet_forecast(
            m,
            train_p,
            test_p,
            params.changepoint_prior_scale[0],
            params.seasonality_prior_scale[0])

the hyperparameter_tuning process improves my already good predictions, but doesn't help at all with very bad predictions such as this one

my next idea was to use the cap feature of the 'logistic' growth mode, but this seems a bit like a fraud since I theoretically don't have any limit (this data is website visits)

Do you guys have any ideas which parameters should I tune please?

PS: I use as a training set the log of my real values

@bletham
Copy link
Contributor

bletham commented Sep 15, 2020

The Prophet model assumes stationary seasonality. In this case the magnitude of daily seasonality is clearly fluctuating, and generally increasing in time. The only way Prophet can capture that is with multiplicative seasonality on a fluctuating trend, that is generally increasing with time. With log transformed data, the additive seasonality you are using is equivalent to multiplicative seasonality so that is basically what it is happening here. In the future prediction, the trend (and thus magnitude of daily seasonality) is increasing in a really unreasonable way. I suspect this is because of the log transform. The exp() inverse transform is a bit unstable and I've seen it amplify small trend changes into very unreasonable large ones, much along the lines of what is happening here.

I'm guessing that you're using the log transform to get positive predictions. I've been doing some analysis of different strategies for that lately (one of which is a log transform) and just posted about it in #1668. There, I posted a new strategy which I implemented in a ProphetPos class that I think could do a lot better on this time series. Is there any chance you could post the data for this time series so I could try it out?

@justethomas
Copy link
Author

I see, I ran it again without the log transformation and the results make more sense.

image

Regarding the dataset, unfortunately i'm not allowed to share it as it is corporate data.

Thank you for your help!

@bletham bletham closed this as completed Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants