`ForecasterAutoreg` fails to fit when index do not start from 0 #576

yarnabrina · 2023-10-23T09:40:41Z

MCVE

import numpy
import pandas

numpy.random.seed(seed=0)

data = pandas.DataFrame(
    numpy.random.random(size=3 * 20).reshape((20, 3)),
    index=numpy.arange(3, 3 + 20),  # commenting this line will make it work
    columns=["y", "x1", "x2"],
)

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()

from skforecast.ForecasterAutoreg import ForecasterAutoreg
forecaster = ForecasterAutoreg(regressor, 1)

forecaster.fit(data.iloc[:, 0], exog=data.iloc[:, 1:])

Error

ValueError: Input X contains NaN.
LinearRegression does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

Expectation

The expectation was that it should work successfully, as passed endogenous and exogenous variables still have same index.

(Found as part of sktime/sktime#5447)

JavierEscobarOrtiz · 2023-10-24T07:40:10Z

Hello @yarnabrina

Thank you for opening the issue!

We have identified the root of the problem. What I can recommend you to avoid this error is to use a pd.RangeIndex object instead of a np.arange(). This will allow you to use an index that doesn't start at 0.

import numpy as np
import pandas as pd

np.random.seed(seed=0)

data = pd.DataFrame(
    np.random.random(size=3 * 20).reshape((20, 3)),
    index=pd.RangeIndex(3, 3 + 20), # Changed to pd.RangeIndex()
    columns=["y", "x1", "x2"],
)

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()

from skforecast.ForecasterAutoreg import ForecasterAutoreg
forecaster = ForecasterAutoreg(regressor, 1)

forecaster.fit(data.iloc[:, 0], exog=data.iloc[:, 1:])

We will have a look 😄

Thank you!

JavierEscobarOrtiz · 2023-11-09T17:08:55Z

Hi @yarnabrina

This bug is fixed in version 0.11.0. Although this version is not yet released in PyPI, you can test it by installing skforecast from GitHub.

pip install git+https://github.com/JoaquinAmatRodrigo/skforecast@0.11.x

Hope it helps!

JavierEscobarOrtiz · 2023-11-18T19:25:47Z

Skforecast 0.11.0 has been released in PyPI.

JavierEscobarOrtiz added the bug Something isn't working label Oct 24, 2023

JavierEscobarOrtiz linked a pull request Nov 9, 2023 that will close this issue

Feature fix bugs #585

Merged

JavierEscobarOrtiz closed this as completed Nov 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ForecasterAutoreg` fails to fit when index do not start from 0 #576

`ForecasterAutoreg` fails to fit when index do not start from 0 #576

yarnabrina commented Oct 23, 2023

JavierEscobarOrtiz commented Oct 24, 2023 •

edited

JavierEscobarOrtiz commented Nov 9, 2023

JavierEscobarOrtiz commented Nov 18, 2023

ForecasterAutoreg fails to fit when index do not start from 0 #576

ForecasterAutoreg fails to fit when index do not start from 0 #576

Comments

yarnabrina commented Oct 23, 2023

MCVE

Error

Expectation

JavierEscobarOrtiz commented Oct 24, 2023 • edited

JavierEscobarOrtiz commented Nov 9, 2023

JavierEscobarOrtiz commented Nov 18, 2023

`ForecasterAutoreg` fails to fit when index do not start from 0 #576

`ForecasterAutoreg` fails to fit when index do not start from 0 #576

JavierEscobarOrtiz commented Oct 24, 2023 •

edited