Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems sorting NaN and undefined values. #3

Closed
wants to merge 11 commits into from
Closed

Problems sorting NaN and undefined values. #3

wants to merge 11 commits into from

Conversation

jasondavies
Copy link
Collaborator

I've included a simple failing test case for NaN. However, it seems it can be more severe as my real dataset causes the call stack to overflow with "Uncaught RangeError: Maximum call stack size exceeded". I believe this only occurs when using the impressive dual-pivot quicksort i.e. arrays larger than the threshold of 32.

This may be of use. I'll have a closer look when I get time.

This also applies to undefined values, which is where I first noticed it
(a dataset where around half the values were undefined). This quicksort
implementation is unstable, so my previous attempt at an appropriate
test case didn't correctly detect the issue.
@jasondavies
Copy link
Collaborator Author

Hmm, this failing test case didn't actually reproduce the problem, because this quicksort implementation isn't stable.

I've pushed a new test that demonstrates the issue; namely taking ages to sort an array with 50% NaNs (does the same thing for undefined values). I think the stack limit in browsers must be much lower than in Node.js.

Unfortunately, this slows things down a bit due to the additional pass,
but perhaps it can be optimised later.
@jasondavies
Copy link
Collaborator Author

Okay, here's a fix! The issue is occurring because !(NaN <= NaN || NaN >= NaN) (likewise for undefined). Native sort moves these values to the end of the array, so for consistency it's simple enough to make a single pass and move such values to the end. Unfortunately it affects performance a bit, but perhaps it can be optimised by effectively performing this check on-the-fly elsewhere in the code.

I also wondered if caching coerced values would be useful e.g. to avoid calling valueOf twice for every comparison for dates, but perhaps smart JS VMs already optimise for this.

@eventualbuddha
Copy link

Out of curiosity, why do you want to sort NaN values?

@jasondavies
Copy link
Collaborator Author

My test data had a field that was undefined for a large number of cases, which has the same comparative qualities as NaN. It's simple enough to work around but I can imagine it happening a lot in practice as real-world data doesn't always have consistent fields.

@iros
Copy link

iros commented Jun 24, 2012

Major +1 to this.
Here is an example of how the filtering breaks if there are NaNs in the data to begin with:
http://jsfiddle.net/iros/wc8ba/3/

@john-guerra
Copy link
Contributor

Thanks for this addition I really need it on my code, however I think there is still an error when you have half your numer of items or more NaNs in your data. I think this illustrates it (copying @iros jsfiddle)

http://jsfiddle.net/wc8ba/53/

Conflicts:
	tesseract.min.js
This fixes dimension.filterExact(…) for incomparable values, since it
uses bisect.right.  Incomparable values are assumed to be at the end of
the array, as implemented in sort(…).  The existing bisect.left does not
need modifying.
@jasondavies
Copy link
Collaborator Author

@john-guerra Thank you for noticing this; I’ve fixed the issue and included a test case for bisect.right, which was the underlying cause. I’ve also merged with the latest master, so that jsfiddle no longer works due to the name change (tesseract→crossfilter).

@john-guerra
Copy link
Contributor

@jasondavies It seems that the problem persists if the number of NaNs is big enough, check this case:

http://jsfiddle.net/wc8ba/94/

Am I missing something?

Thanks for looking at this, it would be great to get this fixed, I need it desperately for my code, so I offer my help in whatever I can

@jasondavies
Copy link
Collaborator Author

@john-guerra Fixed.

@john-guerra
Copy link
Contributor

@jasondavies you sir have just won a big thank you note in my PhD dissertation! hehehe Thanks a lot!

Previously, a group key of NaN or undefined would result in that value
going to the last non-NaN group.
@jasondavies
Copy link
Collaborator Author

Awesome. :)

@jeroen
Copy link

jeroen commented Feb 24, 2013

Great, thank you so much! Hope to see this make its way into the master soon!

@jasondavies
Copy link
Collaborator Author

Folded into #58.

@jasondavies jasondavies deleted the sort-nan branch February 28, 2013 22:48
@Mortimerp9
Copy link

this still seems to happen when using

crossfilter.dimension(function callback(pt) {
    //return undefined
});

It was a bug in my code that made this callback return undefined, but the stack overflow error is definitely not very helpful in finding out what's going on.

@TomNeyland
Copy link

@Mortimerp9

crossfilter.dimension(function callback(pt) {
});

Returning nothing causes the function to implicitly return undefined

@Mortimerp9
Copy link

Sorry if I wasn't clear, I have code in that callback that sometimes
returns undefined, this makes the quicksort call go in a infinite recursion
that blows the stack.

While the callback should avoid returning undefined, it would be good if
crossfilter guarded against it and threw a better error /warning as
debugging the stack overflow deep in quicksort is not the most
straightforward thing.
On Apr 21, 2015 4:14 PM, "Tom Neyland" notifications@github.com wrote:

@Mortimerp9 https://github.com/Mortimerp9

crossfilter.dimension(function callback(pt) {
});

Returning nothing causes the function to implicitly return 'undefined'


Reply to this email directly or view it on GitHub
#3 (comment).

esjewett added a commit to esjewett/crossfilter that referenced this pull request Jul 3, 2015
@manojgn
Copy link

manojgn commented Mar 8, 2016

Hi guys, I am a great fan of crossfilter and have been using this in my app. I hit this exact same issue but I dont see the above fix in 1.3.12. Is it not merged or am I missing something ?

@jasondavies
Copy link
Collaborator Author

Some workarounds for incomparable values were attempted in versions v1.1.1-v1.1.3 but they were reverted as they caused too many issues: see the v1.2.0 release notes. Better to simply avoid using mixed types or incomparable values (undefined or NaN).

marc-marquez added a commit to marc-marquez/football-analyst-heroku that referenced this pull request Aug 19, 2018
jdar pushed a commit to jdar/crossfilter that referenced this pull request Nov 19, 2019
jdar pushed a commit to jdar/crossfilter that referenced this pull request Nov 19, 2019
As requested from issue square#3
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants