Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] k-nearest neighbors regressor: support for non-brute algorithms and non-precomputed mode to improve memory efficiency #6217

Merged
merged 3 commits into from Apr 3, 2024

Conversation

Z-Fran
Copy link
Contributor

@Z-Fran Z-Fran commented Mar 26, 2024

Reference Issues/PRs

#3429 (comment)

What does this implement/fix? Explain your changes.

adds test parameter sets for KNeighborsTimeSeriesRegressor;

adds support for non-brute algorithms and non-precomputed mode, mirroring #5937

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • Optionally, for added estimators: I've added myself and possibly to the maintainers tag - do this if you want to become the owner or maintainer of an estimator you added.
    See here for further details on the algorithm maintainer role.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

Copy link
Collaborator

@yarnabrina yarnabrina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. I think your changes are fine on their own, but you probably discovered some previously unknown bugs that's causing failures in CI.

@fkiraly can you please take a look? This seems to be coming from this line:

@Z-Fran
Copy link
Contributor Author

Z-Fran commented Mar 27, 2024

Hi @yarnabrina @fkiraly. It seems like sklearn doesn't support all tree algorithms to accept a precomputed distance matrix. (mentioned there)
And I find that #5937 fixes the same bug of KNeighborsTimeSeriesClassifier. Can I fix KNeighborsTimeSeriesRegressor just like it?

BTW, could you please tell me why the parameter weights is not used in KNeighborsTimeSeriesRegressor but used in KNeighborsTimeSeriesClassifier (added in #1998)

@fkiraly fkiraly added module:regression regression module: time series regression enhancement Adding new functionality labels Mar 30, 2024
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Z-Fran, thanks!

Well-spotted, indeed we find bugs in a couple places when we add new test parameters.
But you've already identified and fixed it, nice!

I'm happy to merge this PR, although it introduces duplication in the kneighbors estimators.

Generally, if you end up copying multiple lines of code, that is an "antipattern", violating the DRY ("don't repeat yourself") principle.
Simlar code also exists in other places, in relation to the distances, so it might be good to move it into a single location, e.g., in a new module distances._adapters.sklearn.

I'd consider that a task for a separate PR though - so you can take a stab at it, or we should track it as an issue for someone else to pick up ("resolve duplication in...")

More generally, there's some open work around the distances, which could be a nice longer project if you are interested.

@Z-Fran Z-Fran requested a review from yarnabrina April 1, 2024 15:36
@Z-Fran
Copy link
Contributor Author

Z-Fran commented Apr 3, 2024

@Z-Fran, thanks!

Well-spotted, indeed we find bugs in a couple places when we add new test parameters. But you've already identified and fixed it, nice!

I'm happy to merge this PR, although it introduces duplication in the kneighbors estimators.

Generally, if you end up copying multiple lines of code, that is an "antipattern", violating the DRY ("don't repeat yourself") principle. Simlar code also exists in other places, in relation to the distances, so it might be good to move it into a single location, e.g., in a new module distances._adapters.sklearn.

I'd consider that a task for a separate PR though - so you can take a stab at it, or we should track it as an issue for someone else to pick up ("resolve duplication in...")

More generally, there's some open work around the distances, which could be a nice longer project if you are interested.

@fkiraly I'm sorry for not getting back to you sooner. I will resolve the duplication for a separate PR after merging this PR. I will also track some open work around the distances in my free time.

@fkiraly fkiraly changed the title [ENH] add test parameter sets for KNeighborsTimeSeriesRegressor [ENH] k-nearest neighbors regressor: support for non-brute algorithms and non-precomputed mode to improve memory efficiency Apr 3, 2024
@fkiraly fkiraly merged commit b5b5f4c into sktime:main Apr 3, 2024
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:regression regression module: time series regression
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants