Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] Can Merlion handle multi-series datasets? #52

Open
tszumowski opened this issue Dec 22, 2021 · 5 comments
Open

[FEATURE REQUEST] Can Merlion handle multi-series datasets? #52

tszumowski opened this issue Dec 22, 2021 · 5 comments

Comments

@tszumowski
Copy link

This is more a question, but I didn't see a tag for it. Does Merlion support modeling multiple-series datasets? I understand from the README that it supports multi-variate models. I was curious to know if it supports multi-series. For example, consider this OJ Sales Dataset. In this case, the data contains weekly sales of orange juice over 121 weeks. There are 3,991 stores included and three brands of orange juice per store so that 11,973 models can be trained.
I understand one can train independent models for each of the stores. However, I was interested in knowing if Merlion can take in data from multiple stores to learn correlations between them.

Another example of a multi-series dataset can be found in this article.

@tszumowski
Copy link
Author

I saw in the Merlion paper on page 14 it mentions the Int_MF dataset which has:

  • 21 time series
  • 22 variables
    But the dataset is marked internal.

That sounds like an example of multi-series I'm interested in. Was Merlion run on that? If so, how was it configured?

@aadyotb
Copy link
Contributor

aadyotb commented Jan 10, 2022

Hi, thanks for your question @tszumowski. Merlion is already capable of supporting multivariate time series datasets for forecasting and anomaly detection. For forecasting, I suggest you check out ts_datasets.forecast.SeattleTrail. For anomaly detection, I suggest you check out ts_datasets.anomaly.MSL.

@aadyotb aadyotb closed this as completed Jan 10, 2022
@tszumowski
Copy link
Author

@aadyotb thank you for the reply. However, what you referenced is multi-variate, not multi-series. Some also cal it multi-instance, or multi-segment. Multi-series means there are multiple time-series as part of the scenario. You wish to forecast all time-series, but the underlying model may apply to all of the series in question. See my referenced scenario where one attempts to predict sales for many stores. In that case, there may be insufficient information to forecast sales for a single store in a silo, but aggregating across all stores (time-series) the model can learn to forecast given features applicable to all stores.

@aadyotb
Copy link
Contributor

aadyotb commented Jan 11, 2022

Ah, thanks for the clarification. In the paper, we actually train a separate model for each time series. In this case, the data loader is iterable as for time_series, metadata in loader: .... We currently don't support multi-series data as you describe it, though we may look into it in the future. Re-opening this issue due to earlier misunderstanding.

@aadyotb
Copy link
Contributor

aadyotb commented Feb 10, 2022

@tszumowski, @isenilov has added an initial version of multi-series training for the DAGMM model in #65. Does this roughly match your expectation? As we consider adding a more general version of this feature to our roadmap, your feedback would be welcome. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants