Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ForecasterAutoreg fails to fit when index do not start from 0 #576

Closed
yarnabrina opened this issue Oct 23, 2023 · 3 comments · Fixed by #585
Closed

ForecasterAutoreg fails to fit when index do not start from 0 #576

yarnabrina opened this issue Oct 23, 2023 · 3 comments · Fixed by #585
Labels
bug Something isn't working

Comments

@yarnabrina
Copy link

MCVE

import numpy
import pandas

numpy.random.seed(seed=0)

data = pandas.DataFrame(
    numpy.random.random(size=3 * 20).reshape((20, 3)),
    index=numpy.arange(3, 3 + 20),  # commenting this line will make it work
    columns=["y", "x1", "x2"],
)

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()

from skforecast.ForecasterAutoreg import ForecasterAutoreg
forecaster = ForecasterAutoreg(regressor, 1)

forecaster.fit(data.iloc[:, 0], exog=data.iloc[:, 1:])

Error

ValueError: Input X contains NaN.
LinearRegression does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

Expectation

The expectation was that it should work successfully, as passed endogenous and exogenous variables still have same index.

(Found as part of sktime/sktime#5447)

@JavierEscobarOrtiz
Copy link
Collaborator

JavierEscobarOrtiz commented Oct 24, 2023

Hello @yarnabrina

Thank you for opening the issue!

We have identified the root of the problem. What I can recommend you to avoid this error is to use a pd.RangeIndex object instead of a np.arange(). This will allow you to use an index that doesn't start at 0.

import numpy as np
import pandas as pd

np.random.seed(seed=0)

data = pd.DataFrame(
    np.random.random(size=3 * 20).reshape((20, 3)),
    index=pd.RangeIndex(3, 3 + 20), # Changed to pd.RangeIndex()
    columns=["y", "x1", "x2"],
)

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()

from skforecast.ForecasterAutoreg import ForecasterAutoreg
forecaster = ForecasterAutoreg(regressor, 1)

forecaster.fit(data.iloc[:, 0], exog=data.iloc[:, 1:])

We will have a look 😄

Thank you!

@JavierEscobarOrtiz JavierEscobarOrtiz added the bug Something isn't working label Oct 24, 2023
@JavierEscobarOrtiz JavierEscobarOrtiz linked a pull request Nov 9, 2023 that will close this issue
@JavierEscobarOrtiz
Copy link
Collaborator

Hi @yarnabrina

This bug is fixed in version 0.11.0. Although this version is not yet released in PyPI, you can test it by installing skforecast from GitHub.

pip install git+https://github.com/JoaquinAmatRodrigo/skforecast@0.11.x

Hope it helps!

@JavierEscobarOrtiz
Copy link
Collaborator

Skforecast 0.11.0 has been released in PyPI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants