Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] time series classification: shorthands for common sktime pipelines including TsfreshClassifier #1063

Closed
3 of 5 tasks
fkiraly opened this issue Jun 27, 2021 · 14 comments · Fixed by #1721
Closed
3 of 5 tasks
Assignees
Labels
feature request New feature or request good first issue Good for newcomers implementing algorithms Implementing algorithms, estimators, objects native to sktime interfacing algorithms Interfacing existing algorithms/estimators from third party packages module:classification classification module: time series classification

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 27, 2021

One of our most requested/used time series classifiers is the pipeline of tsfresh feature extraction and then an sklearn classifier.

  • There should be a shorthand TsfreshClassifier for that.

We should create a module with simple "feature extraction based" TSC strategies like this, e.g., feature_extr_based.

Other shorthands that would be great to have:

  • TabularizerClassifier, using the tabluarizer; a bonus feature would be using its potential "good first issue" extension to be created, the binner [ENH] Implement TimeBinner transformer, regular binning with aggregation for irregular time series #242
  • MatrixProfileClassifier using the matrix profile
  • SignatureClassifier - using signature - this already exists in @jambo6's work, so just needs to be moved into the same "TSC type" sub-folder
  • SevenNumberClassifier - using the seven-number-summary of the series (quartiles, mean, variance). A nice feature would be the ability to specify which of these to use, or more "sample summaries" like kurtosis or other percentiles; this should not use any sequential features, only features where the order does not matter.

All of these will have parameters: estimator - a sklearn classifier; and any parameters that come from the feature extractor, without additional nesting level (that you would get from Pipeline).

@fkiraly fkiraly added feature request New feature or request good first issue Good for newcomers implementing algorithms Implementing algorithms, estimators, objects native to sktime interfacing algorithms Interfacing existing algorithms/estimators from third party packages module:classification classification module: time series classification labels Jun 27, 2021
@fkiraly fkiraly changed the title time series classification: shorthands for common pipelines time series classification: shorthands for common pipelines including TsfreshClassifier Jun 27, 2021
@fkiraly fkiraly changed the title time series classification: shorthands for common pipelines including TsfreshClassifier time series classification: shorthands for common sktime pipelines including TsfreshClassifier Jun 27, 2021
@TonyBagnall
Copy link
Contributor

Sure, although this is a little completist, the reason for having RocketClassifier etc is that there are results published with a specific set up which we want to reproduce. TSfresh classifier is not at all competitive according to Markus's experiments. MPClassifier is kind of ok (according to my own paper!) and I look forward to testing out the SignatureClassifier in the near future. We call the SevenNumberClassifier the summary stats classifier, but that is not informative enough. If there is not a published version to copy, what would we default the classifier to? Random Forest with 500 trees I guess, although CAWPE would be good from our perspective.

@TonyBagnall
Copy link
Contributor

oh, and a version of the SevenNumberClassifier is the first default classifier I ever tried on TSC in 2012 :)

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 27, 2021

If there is not a published version to copy, what would we default the classifier to? Random Forest with 500 trees I guess, although CAWPE would be good from our perspective.

Yes, I'd either: use sklearn RandomForestClassifier with default settings as the estimator default, or set no default - the user has to choose a classifier.

TSfresh classifier is not at all competitive according to Markus's experiments.

I'd invoke here "relevance justifies inclusion" rather than "performance justifies inclusion".
Some of these classifiers are highly relevant as baselines in your and other historic benchmarks - therefore very relevant in the context of those studies. Sometimes these even give great out-of-the-box performance, and would be useful if a "quick" and "simple" solution is required.
The tsfresh one is one of the most requested ones - for instance, the most frequent search term on this GitHub page seems to be tsfresh.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 27, 2021

PS: "according to Markus' experiments" (that are nowhere published?) is not a scientific reference.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 27, 2021

pinging @jambo6 since I mis-spelt his name above.

@TonyBagnall
Copy link
Contributor

this is not a scientific forum, I was talking in anecdote. I'll take any bets you want on tsfresh+random forest against anything close to sota. I'll run the experiment next week now you have said that :) But yes, all this is fine by me, I have after all championed to static classifiers against pure composition, having both routes is the best

@TonyBagnall
Copy link
Contributor

one question is where to put them. I try to group by the core nature of the transformation involved to provide a basic, inevitably flawed, taxonomy

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 27, 2021

I'll take any bets you want on tsfresh+random forest against anything close to sota.

I believe you - tsfresh creates a lot of garbage features... but we still should have it on offer!

I have after all championed to static classifiers against pure composition, having both routes is the best

indeed, so let's be consistent! 😃

one question is where to put them. I try to group by the core nature of the transformation involved to provide a basic, inevitably flawed, taxonomy

I'd suggest a folder feature_extraction_based, that would be in line with your taxonomy?
It would be everything that follows the "simplistic" formula [ts feature extractor]->[classifier].

@TonyBagnall
Copy link
Contributor

I'm not sure about that, since they are all based on some form of feature extraction except for distance based, although distances as features is itself a valid and published approach. I'll have a think about this.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 27, 2021

I'm not sure about that, since they are all based on some form of feature extraction except for distance based, although distances as features is itself a valid and published approach. I'll have a think about this.

Yes, but not all TSC are of the simple form above, they typically also do sth else.
What I specifically mean, everything that goes into the simple pipeline structure [transformer][classifier] goes in that class, and nothing else. That would exclude distance based approaches, and the more complex approaches in the other folders.

@MatthewMiddlehurst
Copy link
Contributor

Going to assign myself to this if no one else wants to take it, seems like a good series of small tasks. Can put the Catch22Classifier with them.

@MatthewMiddlehurst MatthewMiddlehurst self-assigned this Jul 14, 2021
@TonyBagnall
Copy link
Contributor

What I specifically mean, everything that goes into the simple pipeline structure [transformer][classifier] goes in that class, >and nothing else. That would exclude distance based approaches, and the more complex approaches in the other folders.

yeah ok that makes sense, We could have methods to create "standard" configurations rather than a class for each. So thinking

Signature
Catch22
TSFresh
but also maybe include
STC (although contracting might complicate things)
MrSEQL

@TonyBagnall
Copy link
Contributor

@MatthewMiddlehurst did you complete this? We should write it up as an arxiv/workshop paper

@TonyBagnall TonyBagnall self-assigned this Oct 16, 2021
@MatthewMiddlehurst
Copy link
Contributor

@TonyBagnall Waiting on #1329 really. After the summary classifier is done and the package is refactored we can close this i think.

@TonyBagnall TonyBagnall changed the title time series classification: shorthands for common sktime pipelines including TsfreshClassifier [ENH] time series classification: shorthands for common sktime pipelines including TsfreshClassifier Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request good first issue Good for newcomers implementing algorithms Implementing algorithms, estimators, objects native to sktime interfacing algorithms Interfacing existing algorithms/estimators from third party packages module:classification classification module: time series classification
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants