Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive Strategy Bug #4

Open
ThiagoCM opened this issue Apr 14, 2021 · 2 comments
Open

Recursive Strategy Bug #4

ThiagoCM opened this issue Apr 14, 2021 · 2 comments

Comments

@ThiagoCM
Copy link

Hello.
First, I would like to thank you guys for this amazing example on applying Direct and Recursive Strategy to N Step Ahead Forecasting. I was taking a look on the recursive strategy and came upon a doubt regarding it's implementation, where I think there's a bug.

If you take a look at the picture above (you can find the math in this article), the recursive strategy is basically the 1-step-ahead direct strategy with a "feedback" (the value found at each iteration will be inserted on target array).

When you're doing this piece of code

new_point = fcasted_values[-1] if len(fcasted_values) > 0 else 0.0
target = target.append(pd.Series(index=[date], data=new_point))

You're actually inserting the first prediction (N=1) on the recursive strategy with 0.0 value, instead of actually finding the prediction (N=1) value. This will affect the lags used on the features matrix, since there will be a lag with an incorrect value in all prediction steps.

Below you can see the target and feature values for 3 iteractions after inserting 0.0 as the first prediction.

Iteraction 1

Features
                     hour  weekday  dayofyear  ...      lag_1      lag_8     lag_25
2020-01-12 20:00:00    20        6         12  ...  65.919495  80.427320  72.718000
2020-01-12 21:00:00    21        6         12  ...  34.952133  57.917430  33.341960
2020-01-12 22:00:00    22        6         12  ...  33.911217  56.941563  33.081734
2020-01-12 23:00:00    23        6         12  ...  33.244377  56.193405  33.683514
2020-01-13 00:00:00     0        0         13  ...  33.390755  53.786278  33.244377

Target
2020-01-12 20:00:00    34.952133
2020-01-12 21:00:00    33.911217
2020-01-12 22:00:00    33.244377
2020-01-12 23:00:00    33.390755
2020-01-13 00:00:00     0.000000

Iteraction 2

Features
                     hour  weekday  dayofyear  ...      lag_1      lag_8     lag_25
2020-01-12 21:00:00    21        6         12  ...  34.952133  57.917430  33.341960
2020-01-12 22:00:00    22        6         12  ...  33.911217  56.941563  33.081734
2020-01-12 23:00:00    23        6         12  ...  33.244377  56.193405  33.683514
2020-01-13 00:00:00     0        0         13  ...  33.390755  53.786278  33.244377
2020-01-13 01:00:00     1        0         13  ...   0.000000  59.202316  33.407020

Target
2020-01-12 21:00:00    33.911217
2020-01-12 22:00:00    33.244377
2020-01-12 23:00:00    33.390755
2020-01-13 00:00:00     0.000000
2020-01-13 01:00:00    34.342800

Iteraction 3

Features
                     hour  weekday  dayofyear  ...      lag_1      lag_8     lag_25
2020-01-12 22:00:00    22        6         12  ...  33.911217  56.941563  33.081734
2020-01-12 23:00:00    23        6         12  ...  33.244377  56.193405  33.683514
2020-01-13 00:00:00     0        0         13  ...  33.390755  53.786278  33.244377
2020-01-13 01:00:00     1        0         13  ...   0.000000  59.202316  33.407020
2020-01-13 02:00:00     2        0         13  ...  34.342800  68.944670  32.057076

Target
2020-01-12 22:00:00    33.244377
2020-01-12 23:00:00    33.390755
2020-01-13 00:00:00     0.000000
2020-01-13 01:00:00    34.342800
2020-01-13 02:00:00     2.395295

Also, I didn't understand why you used, on the recursive strategy, the trained model (which is returned either from the linear_model or xgboost_model functions) instead of the 1 Step Ahead model (which is used on the Direct Estrategy).

Does this make any sense or have I understand something wrong?

@JamesLarkinWhite
Copy link

JamesLarkinWhite commented Apr 22, 2022

I just found this tutorial and had the same thought rerading the implementation of the recursive forecast.

What i wrote before seems to be nonesense to me now...

I guess you would have to make a prediction before entering the loop and append the last value of the resulting array instead of 0.0 in case of the first prediction (N=1) .

At least i have seen this in a few entries for the M4 competition?

Edit: I try to implement this idea. The two variables initial_target and intial_prediction are not really needed but i thought it might help to understand my general idea.It would be really nice if somebody could give me a feedback wether or not this is a viable solution or not:

def forecast_multi_recursive_fix(y, model, lags, n_steps=FCAST_STEPS, step="1H"):

	"""Multi-step recursive forecasting using the input time 
	series data and a pre-trained machine learning model
	
	Parameters
	----------
	y: pd.Series holding the input time-series to forecast
	model: an already trained machine learning model implementing the scikit-learn interface
	lags: list of lags used for training the model
	n_steps: number of time periods in the forecasting horizon
	step: forecasting time period given as Pandas time series frequencies
	
	Returns
	-------
	fcast_values: pd.Series with forecasted values indexed by forecast horizon dates 
	"""

	def create_recursive_features(target, lags):
		rec_target = target.copy()
		# forecast: create ts features
		ts_features = create_ts_features(rec_target)
		# forecast: create lag features
		if len(lags) > 0:
			lags_features = create_lag_features(rec_target, lags=lags)
			rec_features = ts_features.join(lags_features, how="outer").dropna()
		else:
			rec_features = ts_features

		return rec_features


	# get the dates to forecast
	last_date = y.index[-1] + pd.Timedelta(hours=1)
	fcast_range = pd.date_range(last_date, periods=n_steps, freq=step)

	fcasted_values = []
	target = y.copy()

	# initial Prediction for first step:
	initial_features = create_recursive_features(target, lags)
	initial_prediction = model.predict(initial_features)  # take value from original target array

	for date in fcast_range:

		new_point = fcasted_values[-1] if len(fcasted_values) > 0 else initial_prediction[-1]

		target = target.append(pd.Series(index=[date], data=new_point))
		# forecast: create recursive features
		features = create_recursive_features(target,lags)

		# forecast: Predict
		predictions = model.predict(features)
		# forecast: append predictions to fcasted_values List
		fcasted_values.append(predictions[-1])


	return pd.Series(index=fcast_range, data=fcasted_values)

@JamesLarkinWhite
Copy link

It would be nice if you could revie this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants