Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign uptopk includes NaNs when it should always prefer numbers #1215
Comments
This comment has been minimized.
This comment has been minimized.
|
I would expect NaN to be treated as absent in this context, i.e. never include them in topk/bottomk. If there aren't enough things, it should return how ever many there are. |
This comment has been minimized.
This comment has been minimized.
|
I'd expect NaN to be included if needed - if someone asks for 10 things and there's more than 10 inputs then they should get 10 things. We've had to special case sorting elsewhere due to NaN. |
brian-brazil
added
the
enhancement
label
Dec 16, 2015
brian-brazil
referenced this issue
Dec 22, 2015
Merged
Make topk/bottomk prefer returning real numbers over NaN. #1271
brian-brazil
closed this
in
#1271
Dec 22, 2015
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
lock
bot
locked and limited conversation to collaborators
Mar 24, 2019
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
aecolley commentedNov 12, 2015
If topk is given a vector of numbers mixed with a few NaNs, it will conditionally include NaNs in the output depending on the ordering of the input vector.
For example, here's an artificial expression which delivers a mixed vector:
Now mixed is a vector with three 0s and seven NaNs. If we use sort to reorder the vector, so that the 0s are at the end, then topk will select NaNs:
Now, it's an interesting question what topk should do when it doesn't have enough numbers but does have NaNs: should it filter out the NaNs or test them as less-than every number? But I'm pretty sure it should never prefer NaNs over numbers.
Even if you don't agree with that, you surely agree that the sort order of the input shouldn't influence the output of topk.