[ENH] time series classification: shorthands for common sktime pipelines including TsfreshClassifier #1063

fkiraly · 2021-06-27T10:26:29Z

One of our most requested/used time series classifiers is the pipeline of tsfresh feature extraction and then an sklearn classifier.

There should be a shorthand TsfreshClassifier for that.

We should create a module with simple "feature extraction based" TSC strategies like this, e.g., feature_extr_based.

Other shorthands that would be great to have:

TabularizerClassifier, using the tabluarizer; a bonus feature would be using its potential "good first issue" extension to be created, the binner [ENH] Implement TimeBinner transformer, regular binning with aggregation for irregular time series #242
MatrixProfileClassifier using the matrix profile
SignatureClassifier - using signature - this already exists in @jambo6's work, so just needs to be moved into the same "TSC type" sub-folder
SevenNumberClassifier - using the seven-number-summary of the series (quartiles, mean, variance). A nice feature would be the ability to specify which of these to use, or more "sample summaries" like kurtosis or other percentiles; this should not use any sequential features, only features where the order does not matter.

All of these will have parameters: estimator - a sklearn classifier; and any parameters that come from the feature extractor, without additional nesting level (that you would get from Pipeline).

The text was updated successfully, but these errors were encountered:

TonyBagnall · 2021-06-27T16:28:48Z

Sure, although this is a little completist, the reason for having RocketClassifier etc is that there are results published with a specific set up which we want to reproduce. TSfresh classifier is not at all competitive according to Markus's experiments. MPClassifier is kind of ok (according to my own paper!) and I look forward to testing out the SignatureClassifier in the near future. We call the SevenNumberClassifier the summary stats classifier, but that is not informative enough. If there is not a published version to copy, what would we default the classifier to? Random Forest with 500 trees I guess, although CAWPE would be good from our perspective.

TonyBagnall · 2021-06-27T16:29:38Z

oh, and a version of the SevenNumberClassifier is the first default classifier I ever tried on TSC in 2012 :)

fkiraly · 2021-06-27T18:04:54Z

If there is not a published version to copy, what would we default the classifier to? Random Forest with 500 trees I guess, although CAWPE would be good from our perspective.

Yes, I'd either: use sklearn RandomForestClassifier with default settings as the estimator default, or set no default - the user has to choose a classifier.

TSfresh classifier is not at all competitive according to Markus's experiments.

I'd invoke here "relevance justifies inclusion" rather than "performance justifies inclusion".
Some of these classifiers are highly relevant as baselines in your and other historic benchmarks - therefore very relevant in the context of those studies. Sometimes these even give great out-of-the-box performance, and would be useful if a "quick" and "simple" solution is required.
The tsfresh one is one of the most requested ones - for instance, the most frequent search term on this GitHub page seems to be tsfresh.

fkiraly · 2021-06-27T18:05:37Z

PS: "according to Markus' experiments" (that are nowhere published?) is not a scientific reference.

fkiraly · 2021-06-27T18:06:50Z

pinging @jambo6 since I mis-spelt his name above.

TonyBagnall · 2021-06-27T18:47:31Z

this is not a scientific forum, I was talking in anecdote. I'll take any bets you want on tsfresh+random forest against anything close to sota. I'll run the experiment next week now you have said that :) But yes, all this is fine by me, I have after all championed to static classifiers against pure composition, having both routes is the best

TonyBagnall · 2021-06-27T19:08:26Z

one question is where to put them. I try to group by the core nature of the transformation involved to provide a basic, inevitably flawed, taxonomy

fkiraly · 2021-06-27T19:19:27Z

I'll take any bets you want on tsfresh+random forest against anything close to sota.

I believe you - tsfresh creates a lot of garbage features... but we still should have it on offer!

I have after all championed to static classifiers against pure composition, having both routes is the best

indeed, so let's be consistent! 😃

one question is where to put them. I try to group by the core nature of the transformation involved to provide a basic, inevitably flawed, taxonomy

I'd suggest a folder feature_extraction_based, that would be in line with your taxonomy?
It would be everything that follows the "simplistic" formula [ts feature extractor]->[classifier].

TonyBagnall · 2021-06-27T19:22:56Z

I'm not sure about that, since they are all based on some form of feature extraction except for distance based, although distances as features is itself a valid and published approach. I'll have a think about this.

fkiraly · 2021-06-27T22:47:32Z

I'm not sure about that, since they are all based on some form of feature extraction except for distance based, although distances as features is itself a valid and published approach. I'll have a think about this.

Yes, but not all TSC are of the simple form above, they typically also do sth else.
What I specifically mean, everything that goes into the simple pipeline structure [transformer][classifier] goes in that class, and nothing else. That would exclude distance based approaches, and the more complex approaches in the other folders.

MatthewMiddlehurst · 2021-07-14T09:46:02Z

Going to assign myself to this if no one else wants to take it, seems like a good series of small tasks. Can put the Catch22Classifier with them.

TonyBagnall · 2021-07-14T10:41:05Z

What I specifically mean, everything that goes into the simple pipeline structure [transformer][classifier] goes in that class, >and nothing else. That would exclude distance based approaches, and the more complex approaches in the other folders.

yeah ok that makes sense, We could have methods to create "standard" configurations rather than a class for each. So thinking

Signature
Catch22
TSFresh
but also maybe include
STC (although contracting might complicate things)
MrSEQL

TonyBagnall · 2021-10-16T10:18:08Z

@MatthewMiddlehurst did you complete this? We should write it up as an arxiv/workshop paper

MatthewMiddlehurst · 2021-10-18T17:36:24Z

@TonyBagnall Waiting on #1329 really. After the summary classifier is done and the package is refactored we can close this i think.

fkiraly changed the title ~~time series classification: shorthands for common pipelines~~ time series classification: shorthands for common pipelines including TsfreshClassifier Jun 27, 2021

fkiraly changed the title ~~time series classification: shorthands for common pipelines including TsfreshClassifier~~ time series classification: shorthands for common sktime pipelines including TsfreshClassifier Jun 27, 2021

MatthewMiddlehurst self-assigned this Jul 14, 2021

fkiraly mentioned this issue Jul 25, 2021

Seven/five number summary transformer #1202

Closed

TonyBagnall self-assigned this Oct 16, 2021

TonyBagnall changed the title ~~time series classification: shorthands for common sktime pipelines including TsfreshClassifier~~ [ENH] time series classification: shorthands for common sktime pipelines including TsfreshClassifier Oct 20, 2021

TonyBagnall closed this as completed in #1721 Dec 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] time series classification: shorthands for common sktime pipelines including TsfreshClassifier #1063

[ENH] time series classification: shorthands for common sktime pipelines including TsfreshClassifier #1063

fkiraly commented Jun 27, 2021 •

edited

TonyBagnall commented Jun 27, 2021

TonyBagnall commented Jun 27, 2021

fkiraly commented Jun 27, 2021 •

edited

fkiraly commented Jun 27, 2021

fkiraly commented Jun 27, 2021

TonyBagnall commented Jun 27, 2021

TonyBagnall commented Jun 27, 2021

fkiraly commented Jun 27, 2021

TonyBagnall commented Jun 27, 2021

fkiraly commented Jun 27, 2021

MatthewMiddlehurst commented Jul 14, 2021

TonyBagnall commented Jul 14, 2021

TonyBagnall commented Oct 16, 2021

MatthewMiddlehurst commented Oct 18, 2021

[ENH] time series classification: shorthands for common sktime pipelines including TsfreshClassifier #1063

[ENH] time series classification: shorthands for common sktime pipelines including TsfreshClassifier #1063

Comments

fkiraly commented Jun 27, 2021 • edited

TonyBagnall commented Jun 27, 2021

TonyBagnall commented Jun 27, 2021

fkiraly commented Jun 27, 2021 • edited

fkiraly commented Jun 27, 2021

fkiraly commented Jun 27, 2021

TonyBagnall commented Jun 27, 2021

TonyBagnall commented Jun 27, 2021

fkiraly commented Jun 27, 2021

TonyBagnall commented Jun 27, 2021

fkiraly commented Jun 27, 2021

MatthewMiddlehurst commented Jul 14, 2021

TonyBagnall commented Jul 14, 2021

TonyBagnall commented Oct 16, 2021

MatthewMiddlehurst commented Oct 18, 2021

fkiraly commented Jun 27, 2021 •

edited

fkiraly commented Jun 27, 2021 •

edited