repaired KNNClassifier cleaning behaviour, tested with a simple script #1756
repaired KNNClassifier cleaning behaviour, tested with a simple script #1756smastelini merged 8 commits intoonline-ml:mainfrom
Conversation
|
I left a comment regarding the fix and additional needed changes. Also worth noting that SWINN is designed to deal with larger windows of data. It gains speed at the cost of reduced accuracy. Accordingly to the paper, SWINN starts to become more efficient than the exhaustive lazy approach when data windows are larger than 500. I know the provided example is meant to be only for illustration, but I leave this comment for posterity! |
|
Hello, thanks for the comment. It is my understanding, that this cleanup is there to reset self.classes to be more relevant, maybe since the classes might have changed/drifted. This windows size 3 was here just to show that it works now and doesn't fail on calling train with cleanup_every!=0 . Default cleanup every is 0, so for most people, the function isn't trigerred at all. I dug into the legacy code to see, that it IMO wouldnt work even at the start, right? since self.window was never exposed by the classifier, only the inner class handling the queries (call it engine, swinn or neighbors .. ) Is this correct observation? I dont see the review though, if you have other comments to how this should be handled, please point me to the comments. |
smastelini
left a comment
There was a problem hiding this comment.
my bad. I forgot to actually send the review, thanks for pointing that out!
Also, I totally get your point about the toy MWE :D
|
|
||
| """ | ||
| self.classes = {x for x in self.window if x[0][1] is not None} | ||
| self.classes = { |
There was a problem hiding this comment.
hey! Nice catch :D
This fix should work when using the SWINN approximate nearest neighbors backbones. However, there is also a Lazy approach and other engines might be proposed in the future. My suggestion: create a new method common to all search engines that return the existing classes.
There was a problem hiding this comment.
Is worth remembering to find a good naming scheme because of regression tasks
…pement their own versions
…pement their own versions
|
@smastelini pls check the new version. its in the base, and I dont make distinction between classifier and regressor (when they use the engine, they both have the access to the functions for class cleanup) . I looked, and regressor does not have the function to call the class cleanup, so the function should not be called there unless specifically intended. In service of clarity, i called it refresh_targets, since it should work with any (x,y) item as vertex, even if y is a number for the regressor. again, tested with the same script. cheers, keep doing the good work, I use this library for easy to use, online extendable models. |
|
Hey @jsvobo, sorry for my delay! I intend to check your code this week. |
Co-authored-by: Saulo Martiello Mastelini <saulomastelini@gmail.com>
|
Thanks for the changes, @jsvobo! And sorry for my delay :D |
format Co-authored-by: Saulo Martiello Mastelini <saulomastelini@gmail.com>
|
Please add an entry to |
|
Added unreleased, still there is something with the linters, but tests pass |
|
@jsvobo, can you run the pre-commit actions locally to see if the error is fixed? |
|
@smastelini ran pre-commit, pulled the latest commits with unit test changes. |
|
Cool beans! Let's merge :) Thanks, @jsvobo |
This PR is linked to issue #1755 (resolves #1755)
Simple fix, now we go to the SWINN inner buffer to iterate and renew self.classes with relevant labels.
Script used to test:
The fun stuff at the beginning was to rewrite river logic with my updated code.
This code should produce: