Skip to content

repaired KNNClassifier cleaning behaviour, tested with a simple script #1756

Merged
smastelini merged 8 commits intoonline-ml:mainfrom
jsvobo:main
Mar 27, 2026
Merged

repaired KNNClassifier cleaning behaviour, tested with a simple script #1756
smastelini merged 8 commits intoonline-ml:mainfrom
jsvobo:main

Conversation

@jsvobo
Copy link
Copy Markdown
Contributor

@jsvobo jsvobo commented Feb 19, 2026

This PR is linked to issue #1755 (resolves #1755)
Simple fix, now we go to the SWINN inner buffer to iterate and renew self.classes with relevant labels.
Script used to test:
The fun stuff at the beginning was to rewrite river logic with my updated code.

import numpy as np
import sys
from pathlib import Path

repo_root = Path(__file__).resolve().parent / "river_bugfix_knn"
if repo_root.as_posix() not in sys.path:
    sys.path.insert(0, repo_root.as_posix())

from river.neighbors import knn_classifier as neighbors

rng = np.random.default_rng()
X = rng.random((10, 4))
y = np.arange(10)

swinn_small_window = neighbors.SWINN(maxlen=3)
model = neighbors.KNNClassifier(n_neighbors=5, engine=swinn_small_window, cleanup_every=1)

for features, label in zip(X, y):
    sample = {f"x{i}": value for i, value in enumerate(features)}
    print("before cleanup: ", model.classes)
    model.learn_one(sample, int(label))
    print("after cleanup: ", model.classes)

This code should produce:

before cleanup:  set()
after cleanup:  {0}
before cleanup:  {0}
after cleanup:  {0, 1}
before cleanup:  {0, 1}
after cleanup:  {0, 1, 2}
before cleanup:  {0, 1, 2}
after cleanup:  {1, 2, 3}
before cleanup:  {1, 2, 3}
after cleanup:  {2, 3, 4}
before cleanup:  {2, 3, 4}
after cleanup:  {3, 4, 5}
before cleanup:  {3, 4, 5}
after cleanup:  {4, 5, 6}
before cleanup:  {4, 5, 6}
after cleanup:  {5, 6, 7}
before cleanup:  {5, 6, 7}
after cleanup:  {8, 6, 7}
before cleanup:  {8, 6, 7}
after cleanup:  {8, 9, 7}

@smastelini
Copy link
Copy Markdown
Member

I left a comment regarding the fix and additional needed changes. Also worth noting that SWINN is designed to deal with larger windows of data. It gains speed at the cost of reduced accuracy. Accordingly to the paper, SWINN starts to become more efficient than the exhaustive lazy approach when data windows are larger than 500.

I know the provided example is meant to be only for illustration, but I leave this comment for posterity!

@jsvobo
Copy link
Copy Markdown
Contributor Author

jsvobo commented Feb 19, 2026

Hello, thanks for the comment.

It is my understanding, that this cleanup is there to reset self.classes to be more relevant, maybe since the classes might have changed/drifted.

This windows size 3 was here just to show that it works now and doesn't fail on calling train with cleanup_every!=0 . Default cleanup every is 0, so for most people, the function isn't trigerred at all. I dug into the legacy code to see, that it IMO wouldnt work even at the start, right? since self.window was never exposed by the classifier, only the inner class handling the queries (call it engine, swinn or neighbors .. ) Is this correct observation?

I dont see the review though, if you have other comments to how this should be handled, please point me to the comments.

Copy link
Copy Markdown
Member

@smastelini smastelini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad. I forgot to actually send the review, thanks for pointing that out!

Also, I totally get your point about the toy MWE :D

Comment thread river/neighbors/knn_classifier.py Outdated

"""
self.classes = {x for x in self.window if x[0][1] is not None}
self.classes = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey! Nice catch :D

This fix should work when using the SWINN approximate nearest neighbors backbones. However, there is also a Lazy approach and other engines might be proposed in the future. My suggestion: create a new method common to all search engines that return the existing classes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is worth remembering to find a good naming scheme because of regression tasks

@jsvobo
Copy link
Copy Markdown
Contributor Author

jsvobo commented Feb 25, 2026

@smastelini pls check the new version. its in the base, and I dont make distinction between classifier and regressor (when they use the engine, they both have the access to the functions for class cleanup)

. I looked, and regressor does not have the function to call the class cleanup, so the function should not be called there unless specifically intended. In service of clarity, i called it refresh_targets, since it should work with any (x,y) item as vertex, even if y is a number for the regressor.

again, tested with the same script. cheers, keep doing the good work, I use this library for easy to use, online extendable models.

@jsvobo jsvobo requested a review from smastelini February 25, 2026 21:13
@smastelini
Copy link
Copy Markdown
Member

Hey @jsvobo, sorry for my delay! I intend to check your code this week.

Comment thread river/neighbors/lazy.py Outdated
Comment thread river/neighbors/lazy.py Outdated
Co-authored-by: Saulo Martiello Mastelini <saulomastelini@gmail.com>
@smastelini
Copy link
Copy Markdown
Member

Thanks for the changes, @jsvobo! And sorry for my delay :D

Comment thread river/neighbors/ann/swinn.py
Comment thread river/neighbors/lazy.py
Comment thread river/neighbors/base.py
format

Co-authored-by: Saulo Martiello Mastelini <saulomastelini@gmail.com>
@MaxHalford
Copy link
Copy Markdown
Member

Please add an entry to unreleased.md 🙏

@jsvobo
Copy link
Copy Markdown
Contributor Author

jsvobo commented Mar 26, 2026

Added unreleased, still there is something with the linters, but tests pass

@smastelini
Copy link
Copy Markdown
Member

@jsvobo, can you run the pre-commit actions locally to see if the error is fixed?

@jsvobo
Copy link
Copy Markdown
Contributor Author

jsvobo commented Mar 27, 2026

@smastelini ran pre-commit, pulled the latest commits with unit test changes.

@smastelini
Copy link
Copy Markdown
Member

Cool beans! Let's merge :)

Thanks, @jsvobo

@smastelini smastelini merged commit 86da54a into online-ml:main Mar 27, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KNNClassifier clean_up_classes calls self.window which does not exist.

3 participants