Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Multioutput capability for all time series classifiers and regressors, broadcasting and tag #5408

Merged
merged 76 commits into from Dec 25, 2023

Conversation

Vasudeva-bit
Copy link
Contributor

@Vasudeva-bit Vasudeva-bit commented Oct 12, 2023

Reference Issues/PRs

Related to #5182, #5286

Depends on #5651, which should be merged first.

What does this implement/fix? Explain your changes.

Adds broadcasting support to let classifiers and regressors handle multioutput data.

What should a reviewer concentrate their feedback on?

Kindly give feedback on whether I am going in right direction.

PR checklist

For all contributions
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

@fkiraly fkiraly changed the title [ENH] Broadcasting support to handle multioutput data #5182 [ENH] Broadcasting support to handle multioutput y in time series classifiers and regressors #5182 Oct 12, 2023
@fkiraly fkiraly added module:classification classification module: time series classification module:regression regression module: time series regression enhancement Adding new functionality labels Oct 12, 2023
@Vasudeva-bit
Copy link
Contributor Author

It has been a while since I last worked on this issue, I would like to get a quick clarification before starting again.
Kindly help me know which of the below methods is better:

  1. As discussed here i.e., extend required classifiers and regressors for multioutput y.

I suppose setting false as a default has only be done in the base class.

Only classes need to be adapted that support multitarget prediction natively.

  1. Otherwise, modify only classification and regression base classes similar to sktime/transformations/base.py i.e., base class handles multioutput based on the tag "capability:multioutput" assigned in estimator (classifier/regressor) classes.
    @fkiraly

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 29, 2023

Yes - I would recommend to start with no.2 to get the framework ready, and only then add multioutput support for some classifiers, e.g., the neural networks. That way, no.1 will also be a test case for the framework extensions in no.2.

@Vasudeva-bit
Copy link
Contributor Author

Kindly review the code and help me with the following:

  1. If the format looks good, I shall extend regression base class as well.
  2. I need to set the tag "capability:multioutput" to True to all the estimators that support it. Kindly mention estimators that you know can support multioutput. I will try to figure the rest. Now, only CNNClassifier's tag is set to True.

add multioutput support for some classifiers, e.g., the neural networks

Once I am done with the above two, I shall continue with this. @fkiraly

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
This looks like it works almost.

There's something upset with the tags, can you check the tests?

Also, non-blocking, do you think we can avoid introducing an "instance" copy of every method, without making the code less DRY? I don't see it, but it might be worth thinking briefly about this.

@Vasudeva-bit
Copy link
Contributor Author

There's something upset with the tags, can you check the tests?

Also, non-blocking, do you think we can avoid introducing an "instance" copy of every method, without making the code less DRY? I don't see it, but it might be worth thinking briefly about this.

I shall look into these.

@fkiraly
Copy link
Collaborator

fkiraly commented Dec 25, 2023

I think they break. I will check that again.

Well, if they break, they should not get the multioutput tag until they are upgraded.

@fkiraly fkiraly changed the title [ENH] Broadcasting support to handle multioutput y in time series classifiers and regressors #5182 [ENH] Multioutput capability for all time series classifiers and regressors, broadcasting and tag Dec 25, 2023
fkiraly
fkiraly previously approved these changes Dec 25, 2023
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All updated now. I removed the tags from classifiers, we can do that separately.

@fkiraly fkiraly merged commit 5c6581e into sktime:main Dec 25, 2023
39 checks passed
fkiraly added a commit that referenced this pull request Dec 25, 2023
This PR relaxes the `pandas` based `Table` mtype-s to also allow object
`dtype`-s.

The reason for this is to allow programmatic support of type checking of
`y` in time series classification and regression, and since the current
implicitly defined interface supported object (e.g., string) `dtype`-s
for classification outputs.

Also required for the multioutput extension in
#5408.
fkiraly added a commit that referenced this pull request Dec 26, 2023
…ssion (#5662)

Separating refactor and cleanup of #5408 into a new PR.

This PR introduces an intemediate base class for panel tasks, including
classification and regression.

The intermediate class contains code that is otherwise duplicated across
the descendant classes `BaseClassifier` and `BaseRegressor`.

The multiplication has become especially apparent in adding multioutput
functionality to classifiers and regressor.

Depends on #5408 which should be
merged first.
@Vasudeva-bit Vasudeva-bit deleted the broadcast branch January 9, 2024 13:19
fkiraly added a commit that referenced this pull request Jan 14, 2024
…eClassifier` (#5654)

This PR removes an instance of problematic private coupling of
`IndividualBOSS` classifier to the boilerplate layer of
`BaseClassifier`, which could make `IndividualBOSS` and dependents break
in boilerplate refactor such as #5408.

The `_shorten_bags` logic internally made a "fitted clone" of `self`,
with explicit copying of attributes, that assumed specifics of the
private structure of `BaseClassifier` such as "names of all private
attributes" that were hard-coded in `IndividualBOSS`.

This coupling has been removed to the largest extent by instead using
`deepcopy` and changing only the attributes that need to change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:classification classification module: time series classification module:regression regression module: time series regression
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants