Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where to pass regressor using add_regressor function? #709

Closed
SeanFLynch opened this issue Oct 18, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@SeanFLynch
Copy link

commented Oct 18, 2018

Hi,

I'm having great difficulty using the add_regressor function. I have a dataframe that looks like this:

ds region y ad_spend
2016-03-01 UK 1797.0 516.64
2016-03-01 DE 0.0 0.00
2016-03-01 FR 0.0 0.00
2016-03-01 NL 0.0 0.00
2016-03-01 BE - FR 0.0 0.00
2016-03-01 BE - NL 0.0 0.00
2016-03-02 UK 2696.0 523.91

There are values for the ad_spend column that go into the future. When I run this:

REGIONS = ['UK','DE','FR','NL']
results = []
for region in REGIONS:
subdf = rev.loc[rev['region'] == region]
m = Prophet(holidays=uksales)
m.add_regressor('ad_spend')
m.fit(subdf)
result = m.predict(m.make_future_dataframe(periods = 10))
result = result.assign(region=region)
results.append(result)
df.predict = pd.concat(results)

I keep getting this error:

ValueError Traceback (most recent call last)
in ()
6 m.add_regressor('ad_spend')
7 m.fit(subdf)
----> 8 result = m.predict(m.make_future_dataframe(periods = 10))
9 result = result.assign(region=region)
10 results.append(result)

/Users/seanlynch/anaconda2/lib/python2.7/site-packages/fbprophet/forecaster.pyc in predict(self, df)
1036 if df.shape[0] == 0:
1037 raise ValueError('Dataframe has no rows.')
-> 1038 df = self.setup_dataframe(df.copy())
1039
1040 df['trend'] = self.predict_trend(df)

/Users/seanlynch/anaconda2/lib/python2.7/site-packages/fbprophet/forecaster.pyc in setup_dataframe(self, df, initialize_scales)
249 if name not in df:
250 raise ValueError(
--> 251 'Regressor "{}" missing from dataframe'.format(name))
252
253 df = df.sort_values('ds')

ValueError: Regressor "ad_spend" missing from dataframe

I'm unclear where I'm supposed to be putting the ad_spend column. Any help would be much appreciated?

@bletham

This comment has been minimized.

Copy link
Contributor

commented Oct 18, 2018

It's in the line

result = m.predict(m.make_future_dataframe(periods = 10))

You need to know the extra regressor both in the past (subdf) and in the future (the dataframe passed to predict). A typically workflow would be to do

future = m.make_future_dataframe(periods = 10)
future['ad_spend'] = ...
result = m.predict(future)

where you would have to decide how to get your future values for ad_spend.

The error messaging can probably be improved here to make it more clear that it is predict() that requires the extra regressor to be specified.

@SeanFLynch

This comment has been minimized.

Copy link
Author

commented Oct 19, 2018

@bletham Thanks for that. Still not completely sure what I'm supposed to do here. Do I pass the historic ad spend data in the same dataframe with the columns 'y' and 'ds' and then pass the predicted future ad_spend in a separate dataframe. Or do I pass the all the ad_spend and future predicted ad_spend in a separate dataframe?

@bletham

This comment has been minimized.

Copy link
Contributor

commented Oct 23, 2018

The first one is correct; you pass the historic ad spend to fit in the same dataframe as y and ds. Then on predict, you pass in a new dataframe that has the future dates (and possibly past dates for which you want predictions) and also includes the ad spend as a column there. Fit and predict get separate dataframes, and each must have ad spend as a column that covers all of the dates in that dataframe.

@bletham bletham closed this Dec 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.