ForecastX with mix of future known and future unknown predictors #4378
-
I've just started using Let's consider the official example in documentation. Suppose that we are in 1960, and want to predict for next 2 years, and somehow I tried the following, but it failed. Codefrom sktime.datasets import load_longley
from sktime.forecasting.arima import ARIMA
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import ForecastX
from sktime.forecasting.var import VAR
y, X = load_longley()
y_tr = y[:"1960"]
y_ts = y["1960":]
X_tr = X[:"1960"]
X_ts = X["1960":]
fh = ForecastingHorizon([1, 2, 3])
pipe = ForecastX(
forecaster_X=VAR(),
forecaster_y=ARIMA(),
columns=["ARMED", "POP"],
)
pipe = pipe.fit(y_tr, X=X_tr, fh=fh)
y_pred = pipe.predict(X=X_ts.drop(columns=["ARMED", "POP"]), fh=fh)
Will appreciate some guidance in this problem. Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 9 replies
-
the problem in your example seems to be the funny indexing of The chain of cause/effect for the problem:
If you want to avoid that, make sure to subset so the past/future This code works: from sktime.datasets import load_longley
from sktime.forecasting.arima import ARIMA
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import ForecastX
from sktime.forecasting.var import VAR
y, X = load_longley()
y_tr = y[:"1959"]
y_ts = y["1960":]
X_tr = X[:"1959"]
X_ts = X["1960":]
fh = ForecastingHorizon([1, 2, 3])
pipe = ForecastX(
forecaster_X=VAR(),
forecaster_y=ARIMA(),
columns=["ARMED", "POP"],
)
pipe = pipe.fit(y_tr, X=X_tr, fh=fh)
y_pred = pipe.predict(X=X_ts.drop(columns=["ARMED", "POP"]), fh=fh) Side note, you could also avoid the "drop" in predict. |
Beta Was this translation helpful? Give feedback.
-
Interesting question: what should/could an informative error message have been here? And, when should it have been raised? |
Beta Was this translation helpful? Give feedback.
-
Another observation is that sktime/sktime/forecasting/compose/_pipeline.py Line 1425 in 19500a7 Any reason for this? Given This leads to very unexpected results: >>> pipe.predict_quantiles(X=X_ts, fh=fh, alpha=[0.3, 0.7])
Quantiles
0.05 0.95
1960 69583.430473 70587.653223
1961 69569.814972 70576.864647
1962 72161.834476 73168.900067 |
Beta Was this translation helpful? Give feedback.
-
I've been following this discussion and found the examples and solutions very insightful. However, I am encountering a specific challenge in this direction. It would be great if it could be addressed by In my current project, I have a dataset where the availability of exogenous variables varies:
From what I've gathered, Here’s a simplified version of my problem using the import numpy as np
import pandas as pd
from sktime.datasets import load_longley
# Load the dataset
y, X = load_longley()
# Split the data to simulate a future prediction scenario
X_tr, X_ts = X.loc[:"1959"], X.loc["1960":]
# Introducing missing data for the example
X_ts.loc[:, "UNEMP"] = np.nan # Completely unknown (for 1960, 1961, 1962)
X_ts.loc["1962", "ARMED"] = np.nan # Partly unknown (for 1962)
X_ts.loc["1961":"1962", "POP"] = np.nan # Partly unknown (for 1961 and 1962)
print(X_ts) Output:
Could you suggest how to best approach this scenario with I'd appreciate any insights. Thank you in advance! |
Beta Was this translation helpful? Give feedback.
the problem in your example seems to be the funny indexing of
pandas
when you use colon with dates.Unlike the very same indexing with integers (!), it produces overlapping folds.
The chain of cause/effect for the problem:
X_tr
,X_ts
both end up containing 1960fh=[1,2,3]
starting from period 0 = 1960). I think you probably intended to ask it to predict 1960, 1961, 1962.X_ts
) only for 1960, 1961, 1962predict
, you end up with missing data in 1963 (from the X-forecast and the X-known, 1963 is present in X-forecast but missing in X-known)ARIMA
can…