New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Catch22 transformer update and Catch22Wrapper for pycatch22 #3431
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's still a draft, but some comments:
- afaik Catch22 transforms a single series into features and does not look at other series in a panel. So it should be series-to-primitives. I would also avoid loop over instances internally, unless the computation becomes faster that way. Still, I would avoid using
numpy3D
internally, since that unnecessarily restricts the transformer to equal length series. - if you introduce a
features
parameter that can bestr
, the set of validstr
should be listed.
I prefer the panel for multithreading, I imagine the overhead from swapping between formats using the transformer base class is pretty miniscule. Good point with unequal length, I didn't consider that but it should be a relatively easy change to DataFrame. Will add the features to the doc, the names are very long which is a bit of a pain. |
to prevent the failures, ensure:
see instructions here: https://www.sktime.org/en/stable/developer_guide/dependencies.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ensure all the tests pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Is it correct that we are changing the signature and logic in the Catch22
transformer?
May I request some changes then:
- I think it should be possible to allow
int
andlist
ofint
forfeatures
- did we not allow this in the past? - mean and variance (as in catch-24) should also be on the list
- if we change the parameter signature of the old transformer, there needs to be deprecation
- internally, I would use either numpy or
pd-multiindex
.pandas
has deprecated functionality around handling ofnested_univ
so it is a risk to implement in that mtype. - if there is no efficient way to loop over instances, implement the transformer for a single series instead and rely on broadcasting functionality from the base class
do you mean n_jobs? |
any - I don't know what was changed since the old parameters were not well documented |
the only parameter change is n_jobs -1 to 1. features is new (see __init__). I can deprecate -1 to use the old functionality for now?
Can look at these. Would have catch24 as a parameter rather than in the default features. Don't think I have seen/used pd-multiindex before. |
Use
I think that should be deprecated, since it is a change of default. |
I have deprecated the n_jobs value and included the new parameter options. Changing the internal data type can wait for another PR I think, it is listed as a bonus in the issue anyway. |
…into catch22 � Conflicts: � sktime/classification/tests/_classification_test_reproduction.py
Updates the current Catch22 transformer and adds a Catch22Wrapper for the pycatch22 package (#2254).
Adds a
features
argument to select which Catch22 features to transform, deprecates the oldtransform_single_feature
method. Improves docs and adds switches to a DataFrame input to allow for unequal length series.The wrapper is what we used prior to the current implementation, but it would yield different results depending on the OS used. The package has been updated since its use, so I imagine the outputs will be different currently.