-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems sorting NaN and undefined values. #3
Conversation
This also applies to undefined values, which is where I first noticed it (a dataset where around half the values were undefined). This quicksort implementation is unstable, so my previous attempt at an appropriate test case didn't correctly detect the issue.
Hmm, this failing test case didn't actually reproduce the problem, because this quicksort implementation isn't stable. I've pushed a new test that demonstrates the issue; namely taking ages to sort an array with 50% NaNs (does the same thing for undefined values). I think the stack limit in browsers must be much lower than in Node.js. |
Unfortunately, this slows things down a bit due to the additional pass, but perhaps it can be optimised later.
Okay, here's a fix! The issue is occurring because I also wondered if caching coerced values would be useful e.g. to avoid calling valueOf twice for every comparison for dates, but perhaps smart JS VMs already optimise for this. |
Out of curiosity, why do you want to sort NaN values? |
My test data had a field that was |
Major +1 to this. |
Thanks for this addition I really need it on my code, however I think there is still an error when you have half your numer of items or more NaNs in your data. I think this illustrates it (copying @iros jsfiddle) |
Conflicts: tesseract.min.js
This fixes dimension.filterExact(…) for incomparable values, since it uses bisect.right. Incomparable values are assumed to be at the end of the array, as implemented in sort(…). The existing bisect.left does not need modifying.
@john-guerra Thank you for noticing this; I’ve fixed the issue and included a test case for bisect.right, which was the underlying cause. I’ve also merged with the latest master, so that jsfiddle no longer works due to the name change (tesseract→crossfilter). |
@jasondavies It seems that the problem persists if the number of NaNs is big enough, check this case: Am I missing something? Thanks for looking at this, it would be great to get this fixed, I need it desperately for my code, so I offer my help in whatever I can |
@john-guerra Fixed. |
@jasondavies you sir have just won a big thank you note in my PhD dissertation! hehehe Thanks a lot! |
Previously, a group key of NaN or undefined would result in that value going to the last non-NaN group.
Awesome. :) |
Great, thank you so much! Hope to see this make its way into the master soon! |
Folded into #58. |
this still seems to happen when using
It was a bug in my code that made this callback return undefined, but the stack overflow error is definitely not very helpful in finding out what's going on. |
Returning nothing causes the function to implicitly return |
Sorry if I wasn't clear, I have code in that callback that sometimes While the callback should avoid returning undefined, it would be good if
|
Hi guys, I am a great fan of crossfilter and have been using this in my app. I hit this exact same issue but I dont see the above fix in 1.3.12. Is it not merged or am I missing something ? |
Some workarounds for incomparable values were attempted in versions v1.1.1-v1.1.3 but they were reverted as they caused too many issues: see the v1.2.0 release notes. Better to simply avoid using mixed types or incomparable values (undefined or NaN). |
…t handling NaN values. See square/crossfilter#3
Prompted by issue square#3
As requested from issue square#3
I've included a simple failing test case for NaN. However, it seems it can be more severe as my real dataset causes the call stack to overflow with "Uncaught RangeError: Maximum call stack size exceeded". I believe this only occurs when using the impressive dual-pivot quicksort i.e. arrays larger than the threshold of 32.
This may be of use. I'll have a closer look when I get time.