Arrays.sort #3919

Closed
sri opened this Issue May 24, 2016 · 13 comments

Comments

Projects
None yet
4 participants
@sri
Contributor

sri commented May 24, 2016

Is there a reason that JRuby's RubyArray uses it own hand-rolled sort (in Qsort.java) instead of using Java's builtin Arrays.sort?

I'm looking at these files:

The reason I'm asking: I encountered a bug in a piece of code. A class was overriding the <=> method, but due to a typo, it was doing this comparision:

def <=>(other); self.a <=> other.b; end

instead of

def <=>(other); self.a <=> other.a; end.

Obviously, that is a bug in the application. I didn't notice the bug because a == b most of the time. However, recently, there were cases where a != b and that's when the bug manifisted. And JRuby's sort got stuck in an infinite loop.

When I converted JRuby to use Arrays.sort instead of Qsort and tried it with the bad data, it threw an exception: "Comparison method violates its general contract!".

Also, there are other benefits of using Arrays.sort:
https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/java/util/TimSort.java#L29

Is there interest in converting RubyArray to use Arrays.sort?

Thanks!
-Sriram

@sri

This comment has been minimized.

Show comment
Hide comment
@sri

sri May 25, 2016

Contributor

Ah, I see according to this

Improved sort for JRUBY-2198: Array#sort is slower than MRI
4c38d86

looks like, Arrays.sort was pulled out because it was slow. But that was in 2008. Looks like Arrays.sort was rewritten in 2009 to use TimSort:

openjdk-mirror/jdk7u-jdk@cdf72de

Is there an easy way to benchmark the current Qsort.java with the current Arrays.sort?

Contributor

sri commented May 25, 2016

Ah, I see according to this

Improved sort for JRUBY-2198: Array#sort is slower than MRI
4c38d86

looks like, Arrays.sort was pulled out because it was slow. But that was in 2008. Looks like Arrays.sort was rewritten in 2009 to use TimSort:

openjdk-mirror/jdk7u-jdk@cdf72de

Is there an easy way to benchmark the current Qsort.java with the current Arrays.sort?

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius May 25, 2016

Member

@sri Yes, we introduced our own sort both for performance and to match MRI behaviorally.

It would definitely be worth examining whether the rewritten JDK-provided Arrays.sort is now as fast or faster, since we always like deleting code. Are you interested in working on a test patch and benchmarking it?

Member

headius commented May 25, 2016

@sri Yes, we introduced our own sort both for performance and to match MRI behaviorally.

It would definitely be worth examining whether the rewritten JDK-provided Arrays.sort is now as fast or faster, since we always like deleting code. Are you interested in working on a test patch and benchmarking it?

@sri

This comment has been minimized.

Show comment
Hide comment
@sri

sri May 25, 2016

Contributor

@headius Yes, I am totally interested. The deleting part seems easy enough: delete Qsort.java and replace calls to it with Arrays.sort. I can get started on that.

Is there a benchmarking data that I can use to benchmark the sorts? Or am I coming up with my own?

Contributor

sri commented May 25, 2016

@headius Yes, I am totally interested. The deleting part seems easy enough: delete Qsort.java and replace calls to it with Arrays.sort. I can get started on that.

Is there a benchmarking data that I can use to benchmark the sorts? Or am I coming up with my own?

@sri

This comment has been minimized.

Show comment
Hide comment
@sri

sri May 27, 2016

Contributor

@headius Here is my first attempt at a sort benchmark: https://github.com/sri/jruby-sort-benchmark.

They are pretty comparable. The major difference in the total timing is Java's native sort is that it doesn't do much work when the array is sorted or mostly sorted. I recall that was TimSort's benefits.

Btw, these benchmarks were run inside of a Ubuntu VM on my Macbook laptop.

Here is another run on my machine:

JRuby's QSort
Iterations: 5000000
JRUBY_VERSION: 9.1.3.0-SNAPSHOT
RUBY_VERSION: 2.3.0
Rehearsal -------------------------------------------------
sorted_ints     0.050000   0.010000   0.060000 (  0.029992)
random_ints     2.150000   0.010000   2.160000 (  1.918142)
random_floats   4.850000   0.000000   4.850000 (  4.485699)
random_objs    12.870000   0.010000  12.880000 ( 12.627390)
--------------------------------------- total: 19.950000sec

                    user     system      total        real
sorted_ints     0.030000   0.000000   0.030000 (  0.028930)
random_ints     2.310000   0.000000   2.310000 (  2.299806)
random_floats   5.270000   0.000000   5.270000 (  5.253322)
random_objs    19.450000   0.000000  19.450000 ( 16.056569)
==================================================================
Java's native Arrays.sort
Iterations: 5000000
JRUBY_VERSION: 9.1.3.0-SNAPSHOT
RUBY_VERSION: 2.3.0
Rehearsal -------------------------------------------------
sorted_ints     0.040000   0.000000   0.040000 (  0.030032)
random_ints     3.060000   0.020000   3.080000 (  2.494412)
random_floats   4.290000   0.000000   4.290000 (  4.047059)
random_objs    10.930000   0.000000  10.930000 ( 10.746148)
--------------------------------------- total: 18.340000sec

                    user     system      total        real
sorted_ints     0.020000   0.000000   0.020000 (  0.021787)
random_ints     6.640000   0.020000   6.660000 (  4.840657)
random_floats   4.410000   0.010000   4.420000 (  4.413161)
random_objs    11.040000   0.010000  11.050000 ( 11.017320)

Looking at the overall timing, Arrays.sort seems a bit faster, but not sure I understand this part on with Arrays.sort does pretty badly on the random_ints part:

sorted_ints     0.020000   0.000000   0.020000 (  0.021787)
random_ints     6.640000   0.020000   6.660000 (  4.840657)
random_floats   4.410000   0.010000   4.420000 (  4.413161)
random_objs    11.040000   0.010000  11.050000 ( 11.017320)

Any suggestions on what I should do next?

Thanks!

Contributor

sri commented May 27, 2016

@headius Here is my first attempt at a sort benchmark: https://github.com/sri/jruby-sort-benchmark.

They are pretty comparable. The major difference in the total timing is Java's native sort is that it doesn't do much work when the array is sorted or mostly sorted. I recall that was TimSort's benefits.

Btw, these benchmarks were run inside of a Ubuntu VM on my Macbook laptop.

Here is another run on my machine:

JRuby's QSort
Iterations: 5000000
JRUBY_VERSION: 9.1.3.0-SNAPSHOT
RUBY_VERSION: 2.3.0
Rehearsal -------------------------------------------------
sorted_ints     0.050000   0.010000   0.060000 (  0.029992)
random_ints     2.150000   0.010000   2.160000 (  1.918142)
random_floats   4.850000   0.000000   4.850000 (  4.485699)
random_objs    12.870000   0.010000  12.880000 ( 12.627390)
--------------------------------------- total: 19.950000sec

                    user     system      total        real
sorted_ints     0.030000   0.000000   0.030000 (  0.028930)
random_ints     2.310000   0.000000   2.310000 (  2.299806)
random_floats   5.270000   0.000000   5.270000 (  5.253322)
random_objs    19.450000   0.000000  19.450000 ( 16.056569)
==================================================================
Java's native Arrays.sort
Iterations: 5000000
JRUBY_VERSION: 9.1.3.0-SNAPSHOT
RUBY_VERSION: 2.3.0
Rehearsal -------------------------------------------------
sorted_ints     0.040000   0.000000   0.040000 (  0.030032)
random_ints     3.060000   0.020000   3.080000 (  2.494412)
random_floats   4.290000   0.000000   4.290000 (  4.047059)
random_objs    10.930000   0.000000  10.930000 ( 10.746148)
--------------------------------------- total: 18.340000sec

                    user     system      total        real
sorted_ints     0.020000   0.000000   0.020000 (  0.021787)
random_ints     6.640000   0.020000   6.660000 (  4.840657)
random_floats   4.410000   0.010000   4.420000 (  4.413161)
random_objs    11.040000   0.010000  11.050000 ( 11.017320)

Looking at the overall timing, Arrays.sort seems a bit faster, but not sure I understand this part on with Arrays.sort does pretty badly on the random_ints part:

sorted_ints     0.020000   0.000000   0.020000 (  0.021787)
random_ints     6.640000   0.020000   6.660000 (  4.840657)
random_floats   4.410000   0.010000   4.420000 (  4.413161)
random_objs    11.040000   0.010000  11.050000 ( 11.017320)

Any suggestions on what I should do next?

Thanks!

@kares

This comment has been minimized.

Show comment
Hide comment
@kares

kares May 27, 2016

Member

najs work ... you should also include Java version as the numbers might be different on Java 7 vs. 8

Member

kares commented May 27, 2016

najs work ... you should also include Java version as the numbers might be different on Java 7 vs. 8

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo May 27, 2016

Member

@sri I think too you may want to add some <16 element data in here as well. Our impl did insertion sort for smaller lists.

I expect this new improved code in Java will be as good as us. I think this impl of Sort happened way back in the 1.4 days when it was trivial to micro opt anything to perform better than what they shipped.

Member

enebo commented May 27, 2016

@sri I think too you may want to add some <16 element data in here as well. Our impl did insertion sort for smaller lists.

I expect this new improved code in Java will be as good as us. I think this impl of Sort happened way back in the 1.4 days when it was trivial to micro opt anything to perform better than what they shipped.

@sri

This comment has been minimized.

Show comment
Hide comment
@sri

sri May 27, 2016

Contributor

@enebo OK added that (sri/jruby-sort-benchmark@031e618#diff-40e9e3b314e488cae33dd0064545543eR19) and updated the Readme with the latest report: https://github.com/sri/jruby-sort-benchmark/blob/master/README.md. Seems comparable. I varies a little bit each time I run it. I wonder if there is a better way to capture the results.

Contributor

sri commented May 27, 2016

@enebo OK added that (sri/jruby-sort-benchmark@031e618#diff-40e9e3b314e488cae33dd0064545543eR19) and updated the Readme with the latest report: https://github.com/sri/jruby-sort-benchmark/blob/master/README.md. Seems comparable. I varies a little bit each time I run it. I wonder if there is a better way to capture the results.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius May 27, 2016

Member

Probably would get better results if you ran the whole benchmark in a loop, so the JVM can settle in and JIT/GC things properly.

Thanks for looking into this! If the numbers look mostly the same after all our benchmarking, I have no problem making this move. No code is good code :-)

Member

headius commented May 27, 2016

Probably would get better results if you ran the whole benchmark in a loop, so the JVM can settle in and JIT/GC things properly.

Thanks for looking into this! If the numbers look mostly the same after all our benchmarking, I have no problem making this move. No code is good code :-)

@sri

This comment has been minimized.

Show comment
Hide comment
@sri

sri May 28, 2016

Contributor

@headius I've update the Readme (https://github.com/sri/jruby-sort-benchmark/blob/master/README.md) as you suggested (https://github.com/sri/jruby-sort-benchmark/blob/master/sort_benchmark.rb#L31).

It seems to be all over the place. Maybe I should run the benkmarks on the same data? I can try that next. Currently, it generates random data for Qsort and Arrays.sort.

Also, you mentioned about matching MRI behavior -- how can I test that out?

Contributor

sri commented May 28, 2016

@headius I've update the Readme (https://github.com/sri/jruby-sort-benchmark/blob/master/README.md) as you suggested (https://github.com/sri/jruby-sort-benchmark/blob/master/sort_benchmark.rb#L31).

It seems to be all over the place. Maybe I should run the benkmarks on the same data? I can try that next. Currently, it generates random data for Qsort and Arrays.sort.

Also, you mentioned about matching MRI behavior -- how can I test that out?

@sri

This comment has been minimized.

Show comment
Hide comment
@sri

sri May 29, 2016

Contributor

OK I modified the script to load the same data for both the sorts (see the Readme).
The results look the same:

  • QSort is slightly better with random_ints (2.43s vs 2.75s) and random_floats (4.10s vs 4.69s).
  • Arrays.sort is slight better with random_objs: (10.79s vs 12.47s).

This is from the 5th run:

5th run of Qsort

                    user     system      total        real
sorted_ints     0.030000   0.000000   0.030000 (  0.026600)
random_ints     2.410000   0.000000   2.410000 (  2.411838)
random_floats   3.940000   0.000000   3.940000 (  3.941184)
random_objs    12.470000   0.000000  12.470000 ( 12.478023)
small_arrays    0.150000   0.000000   0.150000 (  0.148153)
Rehearsal -------------------------------------------------
sorted_ints     0.020000   0.000000   0.020000 (  0.023694)
random_ints     2.440000   0.000000   2.440000 (  2.434369)
random_floats   4.100000   0.000000   4.100000 (  4.103881)
random_objs    12.470000   0.000000  12.470000 ( 12.470829)
small_arrays    0.140000   0.000000   0.140000 (  0.149050)
--------------------------------------- total: 19.170000sec

5th run of Arrays.sort

                    user     system      total        real
sorted_ints     0.030000   0.000000   0.030000 (  0.022867)
random_ints     2.710000   0.000000   2.710000 (  2.719829)
random_floats   4.700000   0.010000   4.710000 (  4.711551)
random_objs    10.820000   0.000000  10.820000 ( 10.837966)
small_arrays    0.150000   0.000000   0.150000 (  0.150290)
Rehearsal -------------------------------------------------
sorted_ints     0.030000   0.000000   0.030000 (  0.025033)
random_ints     2.750000   0.000000   2.750000 (  2.756032)
random_floats   4.680000   0.010000   4.690000 (  4.691306)
random_objs    10.790000   0.000000  10.790000 ( 10.795925)
small_arrays    0.150000   0.000000   0.150000 (  0.147779)
--------------------------------------- total: 18.410000sec

It'll be great if someone else can run and confirm the results!

Contributor

sri commented May 29, 2016

OK I modified the script to load the same data for both the sorts (see the Readme).
The results look the same:

  • QSort is slightly better with random_ints (2.43s vs 2.75s) and random_floats (4.10s vs 4.69s).
  • Arrays.sort is slight better with random_objs: (10.79s vs 12.47s).

This is from the 5th run:

5th run of Qsort

                    user     system      total        real
sorted_ints     0.030000   0.000000   0.030000 (  0.026600)
random_ints     2.410000   0.000000   2.410000 (  2.411838)
random_floats   3.940000   0.000000   3.940000 (  3.941184)
random_objs    12.470000   0.000000  12.470000 ( 12.478023)
small_arrays    0.150000   0.000000   0.150000 (  0.148153)
Rehearsal -------------------------------------------------
sorted_ints     0.020000   0.000000   0.020000 (  0.023694)
random_ints     2.440000   0.000000   2.440000 (  2.434369)
random_floats   4.100000   0.000000   4.100000 (  4.103881)
random_objs    12.470000   0.000000  12.470000 ( 12.470829)
small_arrays    0.140000   0.000000   0.140000 (  0.149050)
--------------------------------------- total: 19.170000sec

5th run of Arrays.sort

                    user     system      total        real
sorted_ints     0.030000   0.000000   0.030000 (  0.022867)
random_ints     2.710000   0.000000   2.710000 (  2.719829)
random_floats   4.700000   0.010000   4.710000 (  4.711551)
random_objs    10.820000   0.000000  10.820000 ( 10.837966)
small_arrays    0.150000   0.000000   0.150000 (  0.150290)
Rehearsal -------------------------------------------------
sorted_ints     0.030000   0.000000   0.030000 (  0.025033)
random_ints     2.750000   0.000000   2.750000 (  2.756032)
random_floats   4.680000   0.010000   4.690000 (  4.691306)
random_objs    10.790000   0.000000  10.790000 ( 10.795925)
small_arrays    0.150000   0.000000   0.150000 (  0.147779)
--------------------------------------- total: 18.410000sec

It'll be great if someone else can run and confirm the results!

@sri

This comment has been minimized.

Show comment
Hide comment
@sri

sri Jun 1, 2016

Contributor

Oh, btw, I had to add this option to the JVM to get it to run without memory errors: -J-Xmx5000M (for both Arrays.sort and Qsort)

Contributor

sri commented Jun 1, 2016

Oh, btw, I had to add this option to the JVM to get it to run without memory errors: -J-Xmx5000M (for both Arrays.sort and Qsort)

sri added a commit to sri/jruby that referenced this issue Jun 9, 2016

Use Java's Arrays.sort
Java's Arrays.sort more robust and is faster is certain cases.

See jruby#3919 and
https://github.com/sri/jruby-sort-benchmark.
@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Jul 27, 2016

Member

Fixed by PR. closing.

Member

enebo commented Jul 27, 2016

Fixed by PR. closing.

@enebo enebo closed this Jul 27, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment