time series forecasting on data with missing values #297

hadisotudeh · 2021-11-20T14:30:40Z

It seems the example provided in the documentation does not support data with some missing values.

I thought the example will handle missing value, but it does not seem so:

hadisotudeh · 2021-11-21T08:11:50Z

I filled in the missing rows/data and still I get the same error. The time column is in '2021-11-06 23:59:00' format and it increases minute by minute.

hadisotudeh · 2021-11-21T08:23:59Z

Adding this line, fixed the issue

estimator_list": ["prophet"]
It seems the error is because of the other estimators

int-chaos · 2021-11-21T21:47:23Z

This is because arima and sarimax do no support missing values, but prophet does.

import numpy as np
from flaml import AutoML

X_train = np.arange('2021-11-06', '2021-11-07', dtype='datetime64[m]')
y_train = np.random.random(size=len(X_train))

print(X_train)

automl = AutoML()
automl.fit(
    X_train=X_train[:1380],  # a single column of timestamp
    y_train=y_train[:1380],  # value for each timestamp
    period=60,  # time horizon to forecast, e.g., 60 minutes
    task="ts_forecast",
    time_budget=5,  # time budget in seconds
    estimator_list=["arima", "sarimax"],
    log_file_name="test_minutes.log",
)
print(automl.predict(X_train[60:]))

I just tested this for arima and sarimax and it works.

sonichi · 2021-11-21T22:01:00Z

@int-chaos DataTransformer.fit_transform() is supposed to fill the missing values. Is it not doing so for time series data?

int-chaos · 2021-11-21T22:09:55Z

No it is not. In DataTransformer.fit_transform(), the time stamp column is popped out then after all the necessary transformations inserted back in, so it would not know that it is missing a time series data and only fill in missing values for the exogenous variables.

That is something I can work on implementing in DataTransformer.fit_transform()

int-chaos · 2021-11-22T14:28:30Z

@hadisotudeh Does it work after you filled in the missing timestamp? If not, can you please check that you have the latest version of flaml because I know that there was a problem with this in a previous version where the data was shuffled, thus causing the problem. If the problem still exists, would you please send me the dataset and I will test it out on my end.

hadisotudeh · 2021-11-22T15:11:15Z

@int-chaos, First, I had missing values. I filled it and tried, but it did not work. Later, I found out that my data also has duplicate rows and rows with the same timestamp, but different values for the output column. "prophet" was working without throwing any exception, and ARIMA family libraries threw a vague exception out.

After I also fixed the duplicate issue, the model.fit part worked.
It seems the steps I mentioned above are good ideas to be implemented as part of the pipeline and output meaningful messages.

I expected to hear from automl that my data has missing values or duplicate rows (no matter if it handles it or not).

P.S. is there an option in the forecasting model to limit it to only positive predictions?

int-chaos · 2021-11-22T15:39:29Z

Yes, duplicate rows will lead to errors for the ARIMA family libraries and I agree with you that the exceptions are unclear.

Thank you for your suggestions on data handling and warnings/exceptions. I will work on implementing those.

To my knowledge, I do not believe there is a built-in functionality in the forecasting models to limit it to only positive predictions. You can check out facebook/prophet#1668 for more information. TLDR: clipping the predictions manually using np.clip(predictions, 0) is the best option, but currently clipping is not supported in FLAML.

sonichi · 2021-11-22T18:06:52Z

One way to ensure positive predictions is to log-transform the labels before fit, and exp-transform the predictions afterwards. If the labels are distributed in a wide range this is worth trying.

sonichi assigned int-chaos Nov 20, 2021

int-chaos mentioned this issue Dec 26, 2021

Time series forecasting with sklearn regressors #362

Merged

sonichi linked a pull request Jan 6, 2022 that will close this issue

Time series forecasting with sklearn regressors #362

Merged

sonichi closed this as completed in #362 Jan 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time series forecasting on data with missing values #297

time series forecasting on data with missing values #297

hadisotudeh commented Nov 20, 2021

hadisotudeh commented Nov 21, 2021

hadisotudeh commented Nov 21, 2021

int-chaos commented Nov 21, 2021

sonichi commented Nov 21, 2021

int-chaos commented Nov 21, 2021 •

edited

Loading

int-chaos commented Nov 22, 2021

hadisotudeh commented Nov 22, 2021

int-chaos commented Nov 22, 2021

sonichi commented Nov 22, 2021

time series forecasting on data with missing values #297

time series forecasting on data with missing values #297

Comments

hadisotudeh commented Nov 20, 2021

hadisotudeh commented Nov 21, 2021

hadisotudeh commented Nov 21, 2021

int-chaos commented Nov 21, 2021

sonichi commented Nov 21, 2021

int-chaos commented Nov 21, 2021 • edited Loading

int-chaos commented Nov 22, 2021

hadisotudeh commented Nov 22, 2021

int-chaos commented Nov 22, 2021

sonichi commented Nov 22, 2021

int-chaos commented Nov 21, 2021 •

edited

Loading