Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Catch22 transformer update and Catch22Wrapper for pycatch22 #3431

Merged
merged 33 commits into from Nov 14, 2022

Conversation

MatthewMiddlehurst
Copy link
Contributor

@MatthewMiddlehurst MatthewMiddlehurst commented Sep 15, 2022

Updates the current Catch22 transformer and adds a Catch22Wrapper for the pycatch22 package (#2254).

Adds a features argument to select which Catch22 features to transform, deprecates the old transform_single_feature method. Improves docs and adds switches to a DataFrame input to allow for unequal length series.

The wrapper is what we used prior to the current implementation, but it would yield different results depending on the OS used. The package has been updated since its use, so I imagine the outputs will be different currently.

@MatthewMiddlehurst MatthewMiddlehurst added module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing enhancement Adding new functionality labels Sep 15, 2022
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's still a draft, but some comments:

  • afaik Catch22 transforms a single series into features and does not look at other series in a panel. So it should be series-to-primitives. I would also avoid loop over instances internally, unless the computation becomes faster that way. Still, I would avoid using numpy3D internally, since that unnecessarily restricts the transformer to equal length series.
  • if you introduce a features parameter that can be str, the set of valid str should be listed.

@MatthewMiddlehurst
Copy link
Contributor Author

I prefer the panel for multithreading, I imagine the overhead from swapping between formats using the transformer base class is pretty miniscule. Good point with unequal length, I didn't consider that but it should be a relatively easy change to DataFrame.

Will add the features to the doc, the names are very long which is a bit of a pain.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 1, 2022

to prevent the failures, ensure:

  • the soft dependency is added to sktime
  • the soft dependency is registered/isolated in the estimator

see instructions here: https://www.sktime.org/en/stable/developer_guide/dependencies.html

@MatthewMiddlehurst MatthewMiddlehurst changed the title Catch22 Update [ENH] Catch22 transformer update and Catch22Wrapper for pycatch22 Oct 4, 2022
@MatthewMiddlehurst MatthewMiddlehurst marked this pull request as ready for review October 4, 2022 17:47
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure all the tests pass

fkiraly
fkiraly previously requested changes Oct 8, 2022
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Is it correct that we are changing the signature and logic in the Catch22 transformer?

May I request some changes then:

  • I think it should be possible to allow int and list of int for features - did we not allow this in the past?
  • mean and variance (as in catch-24) should also be on the list
  • if we change the parameter signature of the old transformer, there needs to be deprecation
  • internally, I would use either numpy or pd-multiindex. pandas has deprecated functionality around handling of nested_univ so it is a risk to implement in that mtype.
  • if there is no efficient way to loop over instances, implement the transformer for a single series instead and rely on broadcasting functionality from the base class

@MatthewMiddlehurst
Copy link
Contributor Author

if we change the parameter signature of the old transformer, there needs to be deprecation

do you mean n_jobs?

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 8, 2022

if we change the parameter signature of the old transformer, there needs to be deprecation

do you mean n_jobs?

any - I don't know what was changed since the old parameters were not well documented

@MatthewMiddlehurst
Copy link
Contributor Author

MatthewMiddlehurst commented Oct 8, 2022

the only parameter change is n_jobs -1 to 1. features is new (see __init__). I can deprecate -1 to use the old functionality for now?

I think it should be possible to allow int and list of int for features - did we not allow this in the past?

mean and variance (as in catch-24) should also be on the list

internally, I would use either numpy or pd-multiindex. pandas has deprecated functionality around handling of nested_univ so it is a risk to implement in that mtype.

Can look at these. Would have catch24 as a parameter rather than in the default features. Don't think I have seen/used pd-multiindex before.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 8, 2022

Can look at these. Would rather have catch24 as a parameter rather than in the default features. Dont think I have seen/used pd-multiindex before

Use datatypes.get_examples to see an example.

the only parameter change is n_jobs -1 to 1. features is new (see init). I can deprecate -1 to use the old functionality for now?

I think that should be deprecated, since it is a change of default.

@MatthewMiddlehurst
Copy link
Contributor Author

I have deprecated the n_jobs value and included the new parameter options. Changing the internal data type can wait for another PR I think, it is listed as a bonus in the issue anyway.

Matthew Middlehurst added 2 commits October 26, 2022 16:10
…into catch22

� Conflicts:
�	sktime/classification/tests/_classification_test_reproduction.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants