Add Timeseries forecasting for column-major data, and introduce Timeseries output feature #3212

tgaddair · 2023-03-06T04:40:52Z

Follow-up to the work done in #1131, but for PyTorch instead of TensorFlow.

This PR adds the following:

Support for windowing over column-major timeseries data. For timeseries input features, this allows users to provide the timeseries column a ordinary numbers and then look back over a sliding window of preprocessing.window_size to form the row-major timeseries input.
Support for timeseries output feature, with similar support for column-major forecasting when preprocessing.horizon is set.
LudwigModel.forecast and ludwig forecast command that allows forecasting with a trained model.
Support for passthrough and dense encoders with timeseries inputs of constant length.
huber loss as an alternative to MAE / MSE (default for timeseries output).
mean_absolute_percentage_error (MAPE) loss and metric
Optional multiplier to apply in the projector decoder to the output after activation.

Example:

Download and unpack hourly weather data from https://www.kaggle.com/selfishgene/historical-hourly-weather-data
ludwig train --config config.yaml --dataset temperature.csv
ludwig forecast -n 10 --model_path results/experiment_run/model --dataset temperature.csv

for more information, see https://pre-commit.ci

github-actions · 2023-03-06T06:10:13Z

Unit Test Results

        6 files ±    0         6 suites ±0 7h 55m 16s ⏱️ + 22m 50s
  4 087 tests +  56   4 044 ✔️ +  56   43 💤 +1 0 ❌ - 1
12 231 runs +145 12 099 ✔️ +140 132 💤 +6 0 ❌ - 1

Results for commit 4adb4ba. ± Comparison against base commit aa49636.

♻️ This comment has been updated with latest results.

…o forecasting

for more information, see https://pre-commit.ci

…o forecasting

w4nderlust · 2023-03-07T21:46:51Z

ludwig/api.py

+                    next_series[feature.column] = pd.Series(preds[key].iloc[-1])
+
+            next_preds = pd.DataFrame(next_series)
+            dataset = pd.concat([dataset, next_preds], axis=0).reset_index(drop=True)


this appraoch is totally fine, but obviously we are preprocessing over and over again datapoints we have already processed, so there's something to be optimzied.

Also from a model perspective, I belive HF does something smart for caching activations for the next steps of text generation, which is kinda similar here potentially, so we can look into that roo.

Nice, yeah I think we should definitely think about improvements here. But it definitely depends on how far back / forward we want to forecast. If we're looking at +/- 100 samples this should be fine.

w4nderlust · 2023-03-07T21:49:12Z

ludwig/decoders/generic_decoders.py

@@ -146,7 +148,7 @@ def input_shape(self):
        return self.dense.input_shape

    def forward(self, inputs, **kwargs):
-        values = self.activation(self.dense(inputs))
+        values = self.activation(self.dense(inputs)) * self.multiplier


I actually forgot we already had the activation! that was probably for doing softmax / sigmoid for vector features :)

w4nderlust · 2023-03-07T21:51:49Z

ludwig/encoders/sequence_encoders.py

        while len(input_sequence.shape) < 3:
            input_sequence = input_sequence.unsqueeze(-1)
        hidden = self.reduce_sequence(input_sequence)

+        # Output may have [batch_size, s, 1] shape, so ensure it is [batch_size, s]
+        hidden = hidden.reshape((batch_size, -1))


not sure what we decided for sequence and text (probalby to remove the passthrough encoder) but this may not work in some scnarios (happy to elaborate)

Hmm, seems so from the tests. Our docs are misleading, then. What do you think is a good workaround in this case?

for more information, see https://pre-commit.ci

…o forecasting

MihailMiller · 2023-05-25T21:15:16Z

I'm trying to use the example provided in examples/forecasting which uses 'timeseries' as the output feature type. However, I encounter a ConfigValidationError indicating that 'timeseries' is not a supported output feature type. Here is the error message:

ludwig.error.ConfigValidationError: Output feature Seattle_next uses an invalid/unsupported output type 'timeseries'. Supported output features: ['binary', 'category', 'number', 'set', 'vector', 'sequence', 'text'].

I cloned the Ludwig repository and installed it in a Python 3.8 environment.
I navigated to examples/forecasting and ran the example using the provided command: "ludwig train --config config.yaml --dataset temperature.csv"

As this is one of the provided examples, I expect it to run without any configuration errors. The 'timeseries' output feature should be supported, according to the example.

Ludwig version: v0.7.4
Python version: 3.8

I'd appreciate guidance on how to resolve this issue. Could you confirm whether 'timeseries' is a supported output feature in the current Ludwig version? If it is, what am I doing wrong?

Thank you for your assistance.

MihailMiller · 2023-05-25T21:36:57Z

For anyone else experiencing the same issue in the future, the solution was to clone the current repository state (v0.8dev) instead of the current release.

tgaddair · 2023-05-25T22:36:19Z

Glad you got it working @MihailMiller! v0.8 should be released in the next couple of weeks with this feature officially supported.

tgaddair and others added 13 commits March 5, 2023 10:42

Timeseries output feature

62fccf0

Support column-major format

52ee91e

Padding

d783917

Fixes

373bc07

Column major test

4bfb092

Fixed embedding

b0d9f4b

Forecast function

12e2cbe

CLI

fb6ca5c

Added example

5653e7b

Fixed split

28d91dc

Drop extra cols

cfc370b

Readme, fixed cli

4be1943

[pre-commit.ci] auto fixes from pre-commit.com hooks

9c02cb3

for more information, see https://pre-commit.ci

tgaddair changed the title ~~Forecasting~~ Add Timeseries forecasting for column-major data, and introduce Timeseries output feature Mar 6, 2023

tgaddair and others added 15 commits March 6, 2023 16:24

Introduce huber loss

fc73c1c

Added tests

e1f2c20

Fixed drop_extra_cols

71a5d9c

Merge

d15910e

Merge branch 'forecasting' of https://github.com/ludwig-ai/ludwig int…

c0409e2

…o forecasting

[pre-commit.ci] auto fixes from pre-commit.com hooks

7a82c31

for more information, see https://pre-commit.ci

Added metric

8a81764

Merge branch 'forecasting' of https://github.com/ludwig-ai/ludwig int…

aba0600

…o forecasting

Added huber description

70a59b4

Optimize forecast function

ad0ef21

Added multiplier

d67d3f1

Fixed version

1e5ab07

Added format

c2eb546

Added dense encoder options

988fc02

Fixed passthrough encoder

909d9db

tgaddair requested a review from w4nderlust March 7, 2023 21:25

tgaddair marked this pull request as ready for review March 7, 2023 21:26

w4nderlust reviewed Mar 7, 2023

View reviewed changes

tgaddair added 2 commits March 7, 2023 15:44

Revert reshape and use flatten_inputs

5ef8a72

Updated example

a28a63f

w4nderlust approved these changes Mar 9, 2023

View reviewed changes

tgaddair and others added 7 commits March 12, 2023 21:26

Merge

d4932db

Fixed loss args, added mape

edf9006

[pre-commit.ci] auto fixes from pre-commit.com hooks

619bc1b

for more information, see https://pre-commit.ci

Added MAPE tests

da66b38

Merge branch 'forecasting' of https://github.com/ludwig-ai/ludwig int…

b4b6aa2

…o forecasting

Fixed test

734e410

Merge

4adb4ba

tgaddair merged commit 25f31db into master Mar 14, 2023

tgaddair deleted the forecasting branch March 14, 2023 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Timeseries forecasting for column-major data, and introduce Timeseries output feature #3212

Add Timeseries forecasting for column-major data, and introduce Timeseries output feature #3212

tgaddair commented Mar 6, 2023 •

edited

github-actions bot commented Mar 6, 2023 •

edited

w4nderlust Mar 7, 2023

tgaddair Mar 7, 2023

w4nderlust Mar 7, 2023

w4nderlust Mar 7, 2023

tgaddair Mar 7, 2023

MihailMiller commented May 25, 2023

MihailMiller commented May 25, 2023 •

edited

tgaddair commented May 25, 2023

Add Timeseries forecasting for column-major data, and introduce Timeseries output feature #3212

Add Timeseries forecasting for column-major data, and introduce Timeseries output feature #3212

Conversation

tgaddair commented Mar 6, 2023 • edited

github-actions bot commented Mar 6, 2023 • edited

Unit Test Results

w4nderlust Mar 7, 2023

Choose a reason for hiding this comment

tgaddair Mar 7, 2023

Choose a reason for hiding this comment

w4nderlust Mar 7, 2023

Choose a reason for hiding this comment

w4nderlust Mar 7, 2023

Choose a reason for hiding this comment

tgaddair Mar 7, 2023

Choose a reason for hiding this comment

MihailMiller commented May 25, 2023

MihailMiller commented May 25, 2023 • edited

tgaddair commented May 25, 2023

tgaddair commented Mar 6, 2023 •

edited

github-actions bot commented Mar 6, 2023 •

edited

MihailMiller commented May 25, 2023 •

edited