Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cast strings to integers so as to sort as expected #42

Closed
wants to merge 1 commit into from

Conversation

harshavmb
Copy link

@harshavmb harshavmb commented Mar 1, 2022

Hi,

Thanks for this code. I could sort billions of lines. However, I identified a tiny issue while sorting integers of variable length like below..

1000000000 is smaller than 100000001
100000010 is smaller than 10000001
100000020 is smaller than 10000002
and so on.,

This is because of r1.compareTo(r2) method call in defaultcomparator.
When they are compared as strings, 1000000000 is indeed smaller than 100000001. But it's incorrect when it is an integer.

This PR checks whether input contain all numerical digits & sort them as integers.

Looking forward to hear from you.

@lemire
Copy link
Owner

lemire commented Mar 1, 2022

I do not think we want to do it this way by default. Why don’t you pass your own comparator?

@harshavmb
Copy link
Author

harshavmb commented Mar 2, 2022

I do not think we want to do it this way by default. Why don’t you pass your own comparator?

Hmm., Could you please share some documentation or few tips how to do that?

I'm not having this dependency in my code, using this jar as-is for some adhoc sorts. Have your say!

@harshavmb
Copy link
Author

harshavmb commented Mar 2, 2022

I do not think we want to do it this way by default. Why don’t you pass your own comparator?

Hmm., Could you please share some documentation or few tips how to do that?

I'm not having this dependency in my code, using this jar as-is for some adhoc sorts. Have your say!

Having checked documentation again, I see I need to write some code extending this to pass custom comparator.
I would love to use this jar as-is for simple string, integer comparisons. Complex data may need obviously a custom comparator objects. WDYT?

@lemire
Copy link
Owner

lemire commented Mar 2, 2022

A custom comparator can be provided using effectively a single line of code. Providing custom comparators is standard in Java.

It is possible that this comparator you wrote suits your needs well, but I do not think it is a reasonable default. For one thing, it requires repeatedly classifying the strings using a regular expression, a potentially expensive operation. The string-based comparator we provide is a sane default, I believe.

@lemire lemire closed this Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants