Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Timeseries forecasting for column-major data, and introduce Timeseries output feature #3212

Merged
merged 37 commits into from
Mar 14, 2023

Conversation

tgaddair
Copy link
Collaborator

@tgaddair tgaddair commented Mar 6, 2023

Follow-up to the work done in #1131, but for PyTorch instead of TensorFlow.

This PR adds the following:

  • Support for windowing over column-major timeseries data. For timeseries input features, this allows users to provide the timeseries column a ordinary numbers and then look back over a sliding window of preprocessing.window_size to form the row-major timeseries input.
  • Support for timeseries output feature, with similar support for column-major forecasting when preprocessing.horizon is set.
  • LudwigModel.forecast and ludwig forecast command that allows forecasting with a trained model.
  • Support for passthrough and dense encoders with timeseries inputs of constant length.
  • huber loss as an alternative to MAE / MSE (default for timeseries output).
  • mean_absolute_percentage_error (MAPE) loss and metric
  • Optional multiplier to apply in the projector decoder to the output after activation.

Example:

@tgaddair tgaddair changed the title Forecasting Add Timeseries forecasting for column-major data, and introduce Timeseries output feature Mar 6, 2023
@github-actions
Copy link

github-actions bot commented Mar 6, 2023

Unit Test Results

         6 files  ±    0           6 suites  ±0   7h 55m 16s ⏱️ + 22m 50s
  4 087 tests +  56    4 044 ✔️ +  56    43 💤 +1  0  - 1 
12 231 runs  +145  12 099 ✔️ +140  132 💤 +6  0  - 1 

Results for commit 4adb4ba. ± Comparison against base commit aa49636.

♻️ This comment has been updated with latest results.

@tgaddair tgaddair requested a review from w4nderlust March 7, 2023 21:25
@tgaddair tgaddair marked this pull request as ready for review March 7, 2023 21:26
next_series[feature.column] = pd.Series(preds[key].iloc[-1])

next_preds = pd.DataFrame(next_series)
dataset = pd.concat([dataset, next_preds], axis=0).reset_index(drop=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this appraoch is totally fine, but obviously we are preprocessing over and over again datapoints we have already processed, so there's something to be optimzied.

Also from a model perspective, I belive HF does something smart for caching activations for the next steps of text generation, which is kinda similar here potentially, so we can look into that roo.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, yeah I think we should definitely think about improvements here. But it definitely depends on how far back / forward we want to forecast. If we're looking at +/- 100 samples this should be fine.

@@ -146,7 +148,7 @@ def input_shape(self):
return self.dense.input_shape

def forward(self, inputs, **kwargs):
values = self.activation(self.dense(inputs))
values = self.activation(self.dense(inputs)) * self.multiplier
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually forgot we already had the activation! that was probably for doing softmax / sigmoid for vector features :)

while len(input_sequence.shape) < 3:
input_sequence = input_sequence.unsqueeze(-1)
hidden = self.reduce_sequence(input_sequence)

# Output may have [batch_size, s, 1] shape, so ensure it is [batch_size, s]
hidden = hidden.reshape((batch_size, -1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what we decided for sequence and text (probalby to remove the passthrough encoder) but this may not work in some scnarios (happy to elaborate)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, seems so from the tests. Our docs are misleading, then. What do you think is a good workaround in this case?

@tgaddair tgaddair merged commit 25f31db into master Mar 14, 2023
@tgaddair tgaddair deleted the forecasting branch March 14, 2023 17:15
@MihailMiller
Copy link

I'm trying to use the example provided in examples/forecasting which uses 'timeseries' as the output feature type. However, I encounter a ConfigValidationError indicating that 'timeseries' is not a supported output feature type. Here is the error message:

ludwig.error.ConfigValidationError: Output feature Seattle_next uses an invalid/unsupported output type 'timeseries'. Supported output features: ['binary', 'category', 'number', 'set', 'vector', 'sequence', 'text'].

I cloned the Ludwig repository and installed it in a Python 3.8 environment.
I navigated to examples/forecasting and ran the example using the provided command: "ludwig train --config config.yaml --dataset temperature.csv"

As this is one of the provided examples, I expect it to run without any configuration errors. The 'timeseries' output feature should be supported, according to the example.

Ludwig version: v0.7.4
Python version: 3.8

I'd appreciate guidance on how to resolve this issue. Could you confirm whether 'timeseries' is a supported output feature in the current Ludwig version? If it is, what am I doing wrong?

Thank you for your assistance.

@MihailMiller
Copy link

MihailMiller commented May 25, 2023

For anyone else experiencing the same issue in the future, the solution was to clone the current repository state (v0.8dev) instead of the current release.

@tgaddair
Copy link
Collaborator Author

Glad you got it working @MihailMiller! v0.8 should be released in the next couple of weeks with this feature officially supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants