Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] MrSEQL classifier revisited - with dependency isolation #4296

Closed
fkiraly opened this issue Mar 6, 2023 · 17 comments · Fixed by #5178
Closed

[ENH] MrSEQL classifier revisited - with dependency isolation #4296

fkiraly opened this issue Mar 6, 2023 · 17 comments · Fixed by #5178
Labels
enhancement Adding new functionality feature request New feature or request interfacing algorithms Interfacing existing algorithms/estimators from third party packages module:classification classification module: time series classification

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 6, 2023

From the discord, there is a request to include the MrSEQL classifier again, by @Laprama.

For context, MrSEQL was originally removed due to its dependencies on cython which resulted in a significant increase in installation complexity and reduction of compatibility, of sktime, see here: #1974

However, now, the skbase/python_dependencies tag mechanism allows dependencies to be fully isolated.

So, a clean way to proceed by including MrSEQL back into sktime could be:

  • create a MrSEQL python package out of the existing code, e.g., https://github.com/lnthach/Mr-SEQL
  • make that package a soft dependency of sktime and integrate it using the extension template. Handle the MrSEQL package as an estimator specific dependency

FYI @heerme, @lnthach

For completeness, also a request from discord, I've included in this discussion post the piece of documentation I could find in 0.8.1 of the old MrSEQL classifier (no longer in sktime):
#4295

@fkiraly fkiraly added feature request New feature or request module:classification classification module: time series classification enhancement Adding new functionality labels Mar 6, 2023
@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 7, 2023

I think there might also have been a problem with the license. It would appear that Mr-SEQL is copyleft, and sktime is permissive, i.e., neither copyleft nor copyright.

Publishing Mr-SEQL as part of sktime would be a license violation, since permissive is more permissive than copyleft. So, license-wise, the only way seems to be anyway to have it as a dependency with an interface adapter (as suggested above), rather than a copy-paste of code or similar.

@lnthach
Copy link
Contributor

lnthach commented Mar 7, 2023

Hi @fkiraly

Thank you for letting us know. Is there anything I can help with the process ? FYI we are also maintaining a python wrapper for MrSEQL here: https://github.com/mlgig/mrseql

Re the license issue, we think we maybe able to change our license to accommodate sktime but we need to check. We will comeback to you later re this issue.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 7, 2023

Ah, @lnthach, thanks a lot, I was not aware of that repo!

  • re license: the license in the repo is MIT, so permissive, so that is no problem - btw, another license would not be a problem either, if it's a dependency (rather than copy-pasted code), i.e., option 1 or 2 below.
  • the quickest way from that repo would be: we upgrade MrSEQLClassifier in mrseql.pyx to the post-0.10.0 classifier interface on sktime.

The final location can be either, the main options that I see and you can pick from:

  1. all in the mrseql package. "clean" way to do this: turn mrseql into a python package, ensure it's on pypi. Import check_estimator from sktime and run it on the estimator as part of the CI, this ensures compatibility as sktime versions upgrade. Sub-variant: additionally, sktime imports the class and it is registered in the estimator list and documentation (but is not otherwise tested or maintained in sktime).
  2. logic in the mrseql package, sktime provides an interface. For this: turn mrseql into a python package, ensure it's on pypi. All logic (incl C/cython) remains in mrseql, but the interface class is in sktime, and you can become its owner. It imports mrseql which becomes a soft dependency, i.e., sktime users who try to instantiate the algorithm will be asked to install mrseql if not present. Sub-variant: you can also have the class in mrseql, or import it there (you will need to import sktime anyway).
  3. mrseql becomes part of sktime - we can do this only if the algorithm no longer depends on the C source (because of the hassle in installation it would cause), so would require rewrite.

My opinion on pros/cons:

Option 1 puts more maintenance burden on mrseql (the package); in 2., the interfaces would be maintained with all other classifiers by sktime in case upgrades happen etc. In option 1 variant and option 2, the algorithm gets some exposure as it is registered with the sktime estimator list and API documentation which makes it more visible to searches (e.g,. via the all_estimators utility).
Option 3 would require a rewrite that may adversely impact the runtime efficiency of tthe algorithm, so might not be an option for you.

The precise location of subroutines and tests may have to be discussed in either setting.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 7, 2023

FYI, happy to help you turning the newer mrseql package into a package published on pypi.org, and/or setting up continuous integration and testing (required for options 1 and 2). If you can make one of the meet-ups on discord (4pm UTC every Friday), one of the developers should be able to help!

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 7, 2023

Maybe a silly question, @lnthach: were you informed that MrSEQL got removed from sktime back when it was removed?

When we handled the deprecation a while ago, we already had the algorithm ownership guidelines in place, so you should have at least gotten a heads-up or a GitHub ping since you come up as an owner when I look into the code in version 0.8.1.

The issue was around cython, and there was a longer period where algorithms were reworked, or the dependency was isolated - this could have been done along the lines I describe above for MrSEQL.

If you did not get a heads-up, I have to sincerely apologize.
As said, we're happy to help you with the package creation, dependency handling and sktime interface creation.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 15, 2023

@lnthach, @heerme, here's how it would look like:
#4337
I've written an interface for MrSQM.

You can take this PR as a template for MrSEQL if you want.

There are some interesting questions on how we handle testing - currently we do not have cython in the sktime CI framework, so testing would have to happen in mrsqm?

@lnthach
Copy link
Contributor

lnthach commented Mar 21, 2023

Maybe a silly question, @lnthach: were you informed that MrSEQL got removed from sktime back when it was removed?

When we handled the deprecation a while ago, we already had the algorithm ownership guidelines in place, so you should have at least gotten a heads-up or a GitHub ping since you come up as an owner when I look into the code in version 0.8.1.

The issue was around cython, and there was a longer period where algorithms were reworked, or the dependency was isolated - this could have been done along the lines I describe above for MrSEQL.

If you did not get a heads-up, I have to sincerely apologize. As said, we're happy to help you with the package creation, dependency handling and sktime interface creation.

Hi @fkiraly,
sorry for the late response.
We did get a heads-up so no worries. We understand the issue so we are very happy that it is possible to have MrSEQL back in sktime.

@lnthach
Copy link
Contributor

lnthach commented Mar 21, 2023

@lnthach, @heerme, here's how it would look like: #4337 I've written an interface for MrSQM.

You can take this PR as a template for MrSEQL if you want.

There are some interesting questions on how we handle testing - currently we do not have cython in the sktime CI framework, so testing would have to happen in mrsqm?

Thanks a lot. If I understand correctly, I need to:

  1. Publish MrSEQL on pypi.
  2. Write an sktime interface for MrSEQL (using MrSQM sktime template).

Re testing, I think so. Is there any standard test for sktime estimator or it's entirely up to me ?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 21, 2023

Ah, we just crossed over, I wrote this issue mlgig/mrsqm#7 without seeing your answer here first!

Is there any standard test for sktime estimator or it's entirely up to me ?

Yes, as in the issue mlgig/mrsqm#7 in mrsqm, the standard test is

from sktime.utils.estimator_checks import check_estimator

check_estimator(my_estimator, raise_exceptions=True)

Note: this requires that the estimator inherits from an sktime base class, in the cases of MrSEQL and MrSQM it would be BaseClassifier.

Also, the template from #4337 ensures it is also tested on the sktime side. I still would recommend to test it inside the mrsqm package using your own CI though.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 21, 2023

Thanks a lot. If I understand correctly, I need to:

  1. Publish MrSEQL on pypi.
  2. Write an sktime interface for MrSEQL (using MrSQM sktime template).

Correct!

To be more precise, it would be one of multiple ways to do it, but this is probably the easiest, given that #4337 works without too much hassle.

There are two (small interface compatibility) issues that also need to be solved, see mlgig/mrsqm#7 - they would probably also impact MrSEQL.

@fkiraly fkiraly added the interfacing algorithms Interfacing existing algorithms/estimators from third party packages label Mar 21, 2023
fkiraly added a commit that referenced this issue Mar 25, 2023
This PR adds a direct interface fo the MrSQM algorithm by @lnthach and @heerme, based on the `mrsqm` package on `pypi`. (fixes #4338)

This serves as a proof-of-concept for how we could interface estimators with cython dependencies, such as MrSEQL, also see #4296

Related to the feature importance question in #4336

Also adds a separate CI element which tests only the C dependent estimators, `test-cython-estimators` - this will allow addition of future cython estimators in external dependencies, leaving cython to the external dependency and isolating it from `sktime`.
@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 25, 2023

I wrote an extra guide for cython based estimators, hope that is useful.
#4388

fkiraly added a commit that referenced this issue Apr 2, 2023
This adds a small paragraph to the estimator contribution guide to cover cython based estimators.

Based on the example #4337, and a guide-like summary of the discussion in #4296.

FYI @heerme, @lnthach.
@Laprama
Copy link

Laprama commented May 16, 2023

Hi all,

I have been using MrSEQL from the sktime-0.8.1 package. I have found what seems like a subtle issue with the fit function.
The fit function does not fit correctly to new data as expected , in addition the classifier retains a memory of features & coefficients from previous fitting to data.

I think this is due to the seql_clf sub class of MrSEQL appending to self.features and self.coefficients attributes during fitting instead of creating new self.features and self.coefficients attributes.

I have outlined in the attached 'Fit Function MrSEQL pdf' document where exactly in the code I think the problem is.

I have also created a notebook (MrSEQL Behavior Example.ipynb inside zip) showing the impact of this problem with two different scenarios. This notebook runs with the sktime-0.8.1 package and shows another algorithm (ROCKET) behaving as expected.

Fit function MrSEQL.pdf
MrSEQL Behavior Example.zip

It would be good to get your thoughts on this and the fix is quite simple if you agree with what I have stated above?

Thanks,

Amarpal

@fkiraly
Copy link
Collaborator Author

fkiraly commented May 16, 2023

Hello @Laprama, it would be easy to add back MrSEQL - and fix the bug - but someone needs to maintain a separate python package, similar to MrSQM.

Probably the simplest option would be to make a PR to https://github.com/lnthach/Mr-SEQL and create a package out of it, for that one could follow the pattern for MrSQM https://github.com/mlgig/mrsqm or use pyproject etc.

Ultimately, that would require @lnthach and @heerme to assist or advise (or at least approve PR) as it is their GitHub repo.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Aug 28, 2023

Update: mrseql is now on pypi, it should be easy to take MrSQM as a template and add MrSEQL as well now.
See discussion in mlgig/mrsqm#7 (comment)

@fkiraly
Copy link
Collaborator Author

fkiraly commented Aug 29, 2023

Trying a naive replication of the MrSQM recipe here: #5178

@fkiraly
Copy link
Collaborator Author

fkiraly commented Aug 29, 2023

FYI @Laprama, the issue you were encountering should be solved, as the sktime base interface enforces clean refitting upon calling fit.

@Laprama
Copy link

Laprama commented Aug 31, 2023

That makes sense, thanks

fkiraly added a commit that referenced this issue Sep 17, 2023
Fixes #4296, adds back `MrSEQL` to `sktime`.

See discussion there regarding temporary removal and complications in
soft dependency isolation which now should all be resolved.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality feature request New feature or request interfacing algorithms Interfacing existing algorithms/estimators from third party packages module:classification classification module: time series classification
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants