Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement to the Natural Order Sort #3276

Merged
merged 11 commits into from
Feb 16, 2022

Conversation

jdunkerley
Copy link
Member

@jdunkerley jdunkerley commented Feb 14, 2022

Pull Request Description

Moving away from RegEx based comparing of 2 texts to using break iterators to progress piecemeal.

  • Also added an initial Faker class to the test library allowing construction of mock string.

Important Notes

Significant performance gains.

Old New
7372.4 409.5

Test results for 10,000 values being sorted in an vector.
Average over 10 runs
Running on an r5a.xlarge EC2

Checklist

Please include the following checklist in your PR:

  • The documentation has been updated if necessary.
  • All code conforms to the Scala, Java, and Rust style guides.
  • All documentation and configuration conforms to the markdown and YAML style guides.
  • All code has been tested where possible.

@jdunkerley jdunkerley marked this pull request as ready for review February 14, 2022 19:48
Copy link
Member

@wdanilo wdanilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance improvements are fantastic!

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but I have a few comments.

The main function is quite complex - it is expected to some extent because the performed operation is not trivial. But I'm thinking if we could simplify it a bit - one idea would be to separate the lazy splitting of text into texts/numbers into a separate construct (either with a stateful iterator or ideally, in a bit more Enso-way, with a lazy list) - but the only reasonable way to do this that I see would not be efficient on pairs like 10...0 (a very long number) vs a (just a letter) - because it would still parse the very long number unnecessarily.
So I think your approach is indeed best, in terms of performance (because it looks at the minimum number of characters to resolve the comparison).
I think it may be worth to add comments with a few words of explanation of the intent behind the helper functions (get_number and order). Explaining in more detail what the arguments represent and what the functions return could make it easier to understand what is going on here.

jdunkerley and others added 5 commits February 16, 2022 09:08
Data generator for benchmarking
Benchmark script
Restore missing ToDo
@jdunkerley jdunkerley enabled auto-merge (squash) February 16, 2022 13:00
@jdunkerley jdunkerley merged commit 68b85de into develop Feb 16, 2022
@jdunkerley jdunkerley deleted the wip/jd/natural-order-181176589 branch February 16, 2022 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants