[patch:lib] Fix major kNN label scoring bug #187
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #186 and generally improves the code in the
KNNClassifier
.Split
_find_nearest
into two functions:_find_k_nearest
: Finds the label and weighting/score of the k nearest neighbors withnumpy.argpartition
._find_max_labels
: Finds the labels out of the nearest k which had the highest total score.A major bug in finding the labels with the highest total score was spotted by @manisci (thanks!). This has now been fixed, and splitting the function into the two described above made it easier to write unit tests for kNN.
Release notes for older versions will also be updated to include a warning about this bug.
Disabled progress bars for multi-processed kNN predictions due to rendering issues.
Added a
_predict
function for the logic of making a single prediction, and updated_chunk_predict
to call_predict
for each example.Renamed
_argmax
to_multi_argmax
for clarity.