Add incremental training option to DAGMM model #65

isenilov · 2022-01-27T12:53:44Z

Currently, the DAGMM model object gets created every time one calls the train method

Line 132 in e21f7be

self.dagmm = self._build_model(X.shape[1]).to(self.device)

However, it makes it impossible to:

update the model by training it on a new dataset
train the model on several timeseries with the same timestamps and column names

The proposed change adds an option to perform incremental training passing corresponding dictionary to the train method which disables the model recreation. The change does not affect the existing API so the behavior stays the same if no train_config is passed.

The same change can also be applied to some of the other models.

@aadyotb would love to hear your opinion.

aadyotb · 2022-01-28T23:32:40Z

Thanks for the contribution. I'm hesitant to merge this as is because it's a pretty ad-hoc way of handling multi-series settings (a feature requested by #52). The main downsides are:

It requires incremental calls to train() and doesn't provide scope for a single model to train on multiple time series in a single go. I can envision settings where batch training across time series may be called for.
We would need to add a lot of boilerplate to individual models in order to support the current implementation.

If this is a direction you're interested in pursuing, I think a better approach would be to design a base class / mixin which implements common features of multi-series training (incremental training would be a good default option). Then, any model which supports such behavior can simply inherit from this base class. If a model has specific batch training which handles all time series together, it can override the default incremental multi-series training. I'm open to further discussion on this topic though.

As an aside: the docs job is currently failing because the docstring for train_config is not formatted as proper ReST. You'd probably want to create a bulleted list for this, with the syntax described here.

isenilov · 2022-01-29T09:19:39Z

Thank you for the detailed comment!

it's a pretty ad-hoc way of handling multi-series

Agree :) That is why it is still a draft created for further discussion.

What you propose definitely makes sense. I guess changing DetectorBase and adding a new method like train_incremental is not an optimal way as it would require changing all the models that inherit from it.

Even though I am not a fan of multiple inheritance, this seems like a good use case for it.
So going to the details I think you propose the following model supporting incremental training:

# base.py
class IncrementalTrainingMixin:
    @abstractmethod
    def train(train_data: Union[TimeSeries, Iterable[TimeSeries]], ...):
        # the rest of the arguments are the same as in the `DetectorBase`
        pass

# dagmm.py
class DAGMM(IncrementalTrainingMixin, DetectorBase):
    # the order means the `train` method from the`IncrementalTrainingMixin` overrides the one from the `DetectorBase`
    def train(train_data: Union[TimeSeries, Iterable[TimeSeries]], ...):
        if isinstance (train_data, TimeSeries):
            ...
            # run the existing training method
            self._train(X=data)
        if isinstance (train_data, Iterable[TimeSeries]):
            # incrementally train the model as proposed in the PR
            ...
            for data in train_data:
                self._train(X=data, incremental=True)

What do you think?
If this is okayish direction - two other questions:

Should we override the train method or just add something like train_multiple to IncrementalTrainingMixin?
Data shuffling has to be handled outside of the train method in this implementation. Maybe we should have it inside with a shuffle: bool argument? Will be tricky to handle in the case of passed iterator/generator though.

aadyotb · 2022-01-29T22:19:42Z

This direction is more or less what I had in mind. I think it makes most sense to add a new train_multiple method, to avoid adding a lot of boilerplate code such as the if/else blocks. Additionally, it's clear way to signal which models do or don't support the feature.

For data shuffling, I think this can be quite model-specific. But some common behaviors (across both anomaly detection and forecasting) include

training on one time series at a time and possibly shuffling the order
training for multiple epochs (i.e. multiple passes over the training data)
creating a dataset of (lookback, lookahead) slices from each time series and shuffling them (the file models/forecast/seq_ar_common.py contains helper code for this & is used for tree models; however we may wish to re-visit this behavior in a second PR, and abstract it out a little more)
others you can think of?

Of course, some of these may be complicated if we are using a lazy iterator of time series. So maybe we could start by assuming we have a List[TimeSeries] and then introduce a custom lazy loader at a later time, if the need arises.

I suspect that the cleanest implementation will be to include params shuffle and n_epochs in a train config dict.

What do you think?

isenilov · 2022-01-31T18:57:35Z

Thanks @aadyotb for the comment! This makes sense, let me draft something tomorrow.

isenilov · 2022-02-01T16:56:00Z

Added the initial (draft) implementation of multiple series training.
It has several limitations/TODOs/points for discussion:

It is implemented only for the particular DAGMM model having MultipleTimeseriesDetectorMixin and MultipleTimeseriesModelMixin chain of abstract mixins. Please advise if it conforms to Merlion architecture or part of the logic can be shared somewhere upward the inheritance chain.
A model is no longer recreated if exists in the train method.
Very basic shuffling implementation.
post_rule_train_config is just propagated to the train method as is and runs the same number of times. Not sure this is expected behavior.
n_pochs and shuffle have default values of 1 and False in the train_multiple method - there might be a better place to define them and better default values themselves.
no unit tests for the method (TODO)

PS: not sure why the doc building fails...

aadyotb · 2022-02-02T22:18:53Z

PS: not sure why the doc building fails...

You should call pip install -r docs/requirements.txt and sphinx-build -b html docs/source docs/build/html from the rootdir of Merlion. The docs CI is configured to fail if there are any warnings, so you should figure out where the warning is coming from & fix it accordingly.

aadyotb

Thanks for the contribution! This looks mostly good to me, barring a couple of small issues I noted inline. Happy to approve once these are fixed. Also, see my comment above on how to diagnose why the docs build is failing.

aadyotb · 2022-02-03T00:35:42Z

merlion/models/anomaly/dagmm.py

+        else:
+            anomaly_labels = [None] * len(train_data)
+        n_epochs = train_config.pop("n_epochs", 1)
+        shuffle = train_config.pop("shuffle", False)


Maybe you can make this shuffle = train_config.pop("shuffle", n_epochs > 1)?

Makes sense. I will also add that shuffling is turned on by default for n_epochs > 1 in the docstring.

aadyotb · 2022-02-03T00:42:03Z

merlion/models/anomaly/dagmm.py

+                        train_config=train_config, post_rule_train_config=post_rule_train_config
+                    )
+                )
+        return train_scores_list


There is a potential issue here, where the post-rule (calibrator and threshold) is trained individually on each time series. I think this is fine for the time being, but can you add a #FIXME comment here saying that the post-rule needs to be re-trained on the train_scores from all the models?

Yes, I was not sure how this incremental training would affect the post_rule...

on the train_scores from all the models

Do you mean from all the epochs?

Yes, from all the epochs.

isenilov · 2022-02-03T08:55:32Z

Thank you for the feedback! I will address the remarks and add a unit test for the method as well to make it complete.
UPD: In case merge - when do you think the next version with these changes is going to be released?

aadyotb · 2022-02-03T15:55:51Z

merlion/models/anomaly/dagmm.py

-        self, train_data: List[TimeSeries], anomaly_labels: List[TimeSeries] = None,
-        train_config=None, post_rule_train_config=None
+        self, multiple_train_data: List[TimeSeries], anomaly_labels: List[TimeSeries] = None,
+        train_config=dict(), post_rule_train_config=None


Can you change the default value of train_config back to None? See here for why train_config=dict() can be a problem. The preferred pattern would be train_config = {} if train_config is None else train_config. Looks good to me once this is addressed.

😅 Spending time with other languages makes me forget some of the Python pitfalls.
I agree that it is safer to avoid this at all even if a mutation doesn't happen.
Returned the None back with additional if.

aadyotb · 2022-02-03T15:58:46Z

UPD: In case merge - when do you think the next version with these changes is going to be released?

I can push out v1.1.2 once this is merged, along with one other PR in the works by me.

aadyotb · 2022-03-03T00:41:56Z

@isenilov v1.1.2 is now out with this feature.

Add incremental training option to DAGMM model

245cdc4

salesforce-cla bot added the cla:signed label Jan 27, 2022

Fix type hints

7419345

isenilov added 2 commits February 1, 2022 18:08

Initial implementation of train_multiple method using mixins

629e9ae

Fix docstring

651ba51

Merge branch 'main' into incr_train_dagmm

88b366c

aadyotb reviewed Feb 3, 2022

View reviewed changes

isenilov added 3 commits February 3, 2022 13:39

Address PR comments, fix docs

4043406

Default empty dict for train_config

8c6e5b7

Add basic unit test for train_multiple method

323d141

isenilov marked this pull request as ready for review February 3, 2022 13:02

aadyotb approved these changes Feb 3, 2022

View reviewed changes

Revert train_config default value

6c3a9ad

aadyotb merged commit 37fb75c into salesforce:main Feb 4, 2022

aadyotb mentioned this pull request Feb 10, 2022

[FEATURE REQUEST] Can Merlion handle multi-series datasets? #52

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add incremental training option to DAGMM model #65

Add incremental training option to DAGMM model #65

isenilov commented Jan 27, 2022 •

edited

Loading

aadyotb commented Jan 28, 2022 •

edited

Loading

isenilov commented Jan 29, 2022 •

edited

Loading

aadyotb commented Jan 29, 2022 •

edited

Loading

isenilov commented Jan 31, 2022 •

edited

Loading

isenilov commented Feb 1, 2022

aadyotb commented Feb 2, 2022

aadyotb left a comment

aadyotb Feb 3, 2022

isenilov Feb 3, 2022

aadyotb Feb 3, 2022

isenilov Feb 3, 2022 •

edited

Loading

aadyotb Feb 3, 2022

isenilov commented Feb 3, 2022 •

edited

Loading

aadyotb Feb 3, 2022

isenilov Feb 3, 2022

aadyotb commented Feb 3, 2022 •

edited

Loading

aadyotb commented Mar 3, 2022

Add incremental training option to DAGMM model #65

Add incremental training option to DAGMM model #65

Conversation

isenilov commented Jan 27, 2022 • edited Loading

aadyotb commented Jan 28, 2022 • edited Loading

isenilov commented Jan 29, 2022 • edited Loading

aadyotb commented Jan 29, 2022 • edited Loading

isenilov commented Jan 31, 2022 • edited Loading

isenilov commented Feb 1, 2022

aadyotb commented Feb 2, 2022

aadyotb left a comment

Choose a reason for hiding this comment

aadyotb Feb 3, 2022

Choose a reason for hiding this comment

isenilov Feb 3, 2022

Choose a reason for hiding this comment

aadyotb Feb 3, 2022

Choose a reason for hiding this comment

isenilov Feb 3, 2022 • edited Loading

Choose a reason for hiding this comment

aadyotb Feb 3, 2022

Choose a reason for hiding this comment

isenilov commented Feb 3, 2022 • edited Loading

aadyotb Feb 3, 2022

Choose a reason for hiding this comment

isenilov Feb 3, 2022

Choose a reason for hiding this comment

aadyotb commented Feb 3, 2022 • edited Loading

aadyotb commented Mar 3, 2022

isenilov commented Jan 27, 2022 •

edited

Loading

aadyotb commented Jan 28, 2022 •

edited

Loading

isenilov commented Jan 29, 2022 •

edited

Loading

aadyotb commented Jan 29, 2022 •

edited

Loading

isenilov commented Jan 31, 2022 •

edited

Loading

isenilov Feb 3, 2022 •

edited

Loading

isenilov commented Feb 3, 2022 •

edited

Loading

aadyotb commented Feb 3, 2022 •

edited

Loading