Preprocessing non-contiguous segments #171

sarahmish · 2021-02-02T20:57:15Z

Currently most pipelines share the same preprocessing primitives and in the following order:

mlprimitives.custom.timeseries_preprocessing.time_segments_aggregate
this makes the signal equi-spaced based on the specified interval.
sklearn.impute.SimpleImputer
for imputing missing values.
sklearn.preprocessing.MinMaxScaler
normalizing the data between a specified range.
mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences
creating multiple training window examples based on the window_size and step_size.

However, it is not always the case that we want to make the signal equi-spaced but rather retain the gaps within the signal. For this task, there are two main considerations that need to happen.

normalize the data first to maintain the specified range.
create segments based on the suggested max_gap, then for each segment apply the primitive 1, 2 & 4 shown above, then concatenate them together.

the sequence of preprocessing primitives would be:

"sklearn.preprocessing.MinMaxScaler",
"orion.primitives.timeseries_preprocessing.segment", # suggested
"mlprimitives.custom.timeseries_preprocessing.time_segments_aggregate",
"sklearn.impute.SimpleImputer",
"mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences",
"orion.primitives.timeseries_preprocessing.concatenate" # suggested

The text was updated successfully, but these errors were encountered:

kb1ooo · 2021-10-26T14:14:47Z

I don't see any activity here, but I'm wondering if this may have been addressed since Feb?

sarahmish · 2021-10-29T19:17:32Z

Hi @kb1ooo! It's still under works

kb1ooo · 2021-10-29T19:32:59Z

@sarahmish thanks. Is there some work on it checked into a branch?

sarahmish · 2021-10-30T18:40:10Z

There isn't an active branch on this case. The primary change for this feature is in the rolling_window_sequences primitive. It currently works by slicing based on indexes. To make this change, we need to introduce slicing by timestamps and using a max_gap parameter to indicate the maximum gaps to between one element and another.

kb1ooo · 2021-11-01T22:11:38Z

@sarahmish ok right. Is there a simpler intermediate version where basically the data is pre-segmented (i.e. don't delegate the segmentation logic to orion, let it be the responsibility of the caller), and you would pass the data as say a list of dataframes instead of one dataframe? Then just iterate through the list, applying the same pipeline, and concatenate the rolling_window_sequences.

sarahmish · 2021-11-08T00:19:14Z

@kb1ooo that's definitely possible. Mechanically, you can just iterate over each dataframe calling orion.fit as a simple work around. My only concern is that you will be training the ML model on epochs with different batches each time. I don't know how that will affect the learning of the underlying model.

sarahmish added the new feature New feature label Feb 2, 2021

sarahmish mentioned this issue Feb 2, 2021

Is there pre-processing of data required? #162

Closed

sarahmish mentioned this issue Mar 21, 2022

Training on multiple different length samples for Anomaly detection #279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing non-contiguous segments #171

Preprocessing non-contiguous segments #171

sarahmish commented Feb 2, 2021

kb1ooo commented Oct 26, 2021

sarahmish commented Oct 29, 2021

kb1ooo commented Oct 29, 2021

sarahmish commented Oct 30, 2021

kb1ooo commented Nov 1, 2021

sarahmish commented Nov 8, 2021

Preprocessing non-contiguous segments #171

Preprocessing non-contiguous segments #171

Comments

sarahmish commented Feb 2, 2021

kb1ooo commented Oct 26, 2021

sarahmish commented Oct 29, 2021

kb1ooo commented Oct 29, 2021

sarahmish commented Oct 30, 2021

kb1ooo commented Nov 1, 2021

sarahmish commented Nov 8, 2021