Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting negative prediction even all training data are greater than or equal to 0. #524

Closed
stephensheng opened this issue May 17, 2018 · 10 comments

Comments

@stephensheng
Copy link

Hello,

Thanks for this great tool!

Recently, I ran into this issue, even I fed in a set of positive data, the prediction contains some negative results. The input file is here:
example1.txt

Prediction:
output_without_log
Trend and seasonality:
output2_without_log

Cheers,
Stephen

@vhpietil
Copy link

Prophet assumes constant daily component for all weekdays. The daily component on your data varies very much between weekdays and weekends. Thats why the fit is poor and you get negative predictions.

@stephensheng
Copy link
Author

stephensheng commented May 17, 2018

thank you, but how to solve this problem? disable weekly seasonality? or how can I remove the effect of weekends? @vhpietil

@vhpietil
Copy link

See #434 for lenghty discussion about the same problem.

Are you interested in predicting all days? If you are not interesten in weekends for example, you can remove them from the training data and you can forecast weekdays better.

@stephensheng
Copy link
Author

stephensheng commented May 17, 2018

yes, I want to remove weekend,and I try disable weekly seasonality, which improves the prediction to some extend. But is it like prophet can not fit a cure that change rapidly everyday?
output_bandwidth_4weeks_20minutes_remove_weekend_bigger_than_10

like the graph above, cause I don't want to treat that part of data as outliers. and I would like to fit them onto the graph.

thank you for your help

@bletham
Copy link
Contributor

bletham commented May 21, 2018

+1 for what @vhpietil says, this is the same issue from #434 and his suggestion to separate out weekends and weekdays is a great option (and did indeed produce a much better forecast).

Here you separated out weekdays to get predictions just for them: let's call that forecast1. You could also then separate out weekends and get predictions just for them, let's call that forecast2. Then, you can combine the two into a single dataframe that pulls weekdays from forecast and weekends from forecast2. The prophet plot would work if given this merged dataframe and would plot the data and forecasts from both models. (You could use the plot method on either model). Component plots would not work.

Alternatively, you can have just one model that makes predictions for all data by creating extra regressors for the weekday daily seasonality. This is a lot more effort, but @vhpietil gives a working example in #434.

The Prophet model treats weekly seasonality and daily seasonality as being entirely separate components, and does not have a concept of daily-seasonality-that-depends-on-day-of-week. This is a recurring issue and so definitely the procedure from #434 needs to be made easier and put in the documentation.

@bletham
Copy link
Contributor

bletham commented Jun 7, 2018

I'm going to go ahead and close this, and consolidate the general issue of having seasonalities that depend on other factors in #538, so follow along there for updates.

@Lal4Tech
Copy link

A simple trick can be take natural log of the 'y' after incrementing by 1(avoid the situations take the log of zeros and negative values).
import numpy as np
df['y'] = df['y'] + 1
df['y'] = np.log(df['y'])
#get forecast
forecaset['yhat'] = np.exp(forecaset['yhat'])-1 #can do same for other columns as well.

@hlreicha
Copy link

Hi I did your trick of taking the log and getting a forecast but it gives me prediction that is way above my datapoints.
how

My code looks like this:
p = Prophet(weekly_seasonality=True)
#rename dataframe columns
df = df.rename(columns={df.columns[0]: "ds", df.columns[1]: "y"})
#set dataframe ds to timeseries
df['ds'] = pd.to_datetime(df['ds'], utc=True)
df['ds']= df.ds.dt.date
df.index.freq = 'D'
#adding one and taking a log
df['y'] = df['y'] + 1
df['y'] = np.log(df['y'])
#get forecast
p.fit(df) # df is a pandas.DataFrame with 'y' and 'ds' columns
future = p.make_future_dataframe(periods=365)
time_pred = p.predict(future)
#convert back
time_pred = np.exp(time_pred)-1
time_pred[time_pred.columns[1:]] = np.exp(time_pred[time_pred.columns[1:]])-1
p.plot(time_pred)

@cod-r
Copy link

cod-r commented Aug 30, 2020

@HarilalOP that trick works, thank you!

Is there a way to return my values back to the original when using m.plot()? I'm talking about the black dots from the graph, they are keeping the log values when plotting.
image

@john-karp
Copy link

A simpler trick for handling daily patterns that change over the course of the week is to disable explicit daily seasonality, and increase the resolution of weekly seasonality so it can capture daily patterns. E.g weekly_seasonality=28, daily_seasonality=False. I think this should actually work a bit better than lumping all weekdays together and lumping all weekend days together, because at least in the data I've modeled, user behavior differs between Monday and Friday for example, and likewise between Saturday and Sunday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants