Skip to content

Conversation

adriaanm
Copy link
Contributor

Removed code based on Sun JDK sorts and implemented new (basic) sorts from
scratch. Deferred to Java Arrays.sort whenever practical. Behavior of
scala.util.Sorting should be unchanged, but changed documentation to
specify when the Java methods are being used (as they're typically very fast).

A JUnit test is provided.

Performance is important for sorts. Everything is better with this patch,
though it could be better yet, as described below.

Below are sort times (in microseconds, SEM < 5%) for various 1024-element
arrays of small case classes that compare on an int field (quickSort), or
int arrays that use custom ordering (stableSort). Note: "degenerate"
means there are only 16 values possible, so there are lots of ties.
Times are all with fresh data (no re-using cache from run to run).

Results:

                        random  sorted  reverse  degenerate  big:64k  tiny:16
Old Sorting.quickSort     234     181      178         103    25,700      1.4
New Sorting.quickSort     170      27      115          74    18,600      0.8

Old Sorting.stableSort    321     234      236         282    32,600      2.1
New Sorting.stableSort    239      16      194         194    25,100      1.2

java.util.Arrays.sort     124       4        8         105    13,500      0.8
java.util.Arrays.sort|Box 126      15       13         112    13,200      0.9

The new versions are uniformly faster, but uniformly slower than Java sorting. scala.util.Sorting has use cases that don't map easily in to Java unless everything is pre-boxed, but the overhead of pre-boxing is minimal compared to the sort.

A snapshot of some of my benchmarking code is below.

(Yes, lots of repeating myself--it's dangerous not to when trying to get
somewhat accurate benchmarks.)

import java.util.Arrays
import java.util.Comparator
import math.Ordering
import util.Sorting
import reflect.ClassTag

val th = ichi.bench.Thyme.warmed()

case class N(i: Int, j: Int) {}
val a = Array.fill(1024)( Array.tabulate(1024)(i => N(util.Random.nextInt, i)) )
var ai = 0
val b = Array.fill(1024)( Array.tabulate(1024)(i => N(i, i)) )
var bi = 0
val c = Array.fill(1024)( Array.tabulate(1024)(i => N(1024-i, i)) )
var ci = 0
val d = Array.fill(1024)( Array.tabulate(1024)(i => N(util.Random.nextInt(16), i)) )
var di = 0
val e = Array.fill(16)( Array.tabulate(65536)(i => N(util.Random.nextInt, i)) )
var ei = 0
val f = Array.fill(65535)( Array.tabulate(16)(i => N(util.Random.nextInt, i)) )
var fi = 0

val o = new Ordering[N]{ def compare(a: N, b: N) = if (a.i < b.i) -1 else if (a.i > b.i) 1 else 0 }
for (s <- Seq("one", "two", "three")) {
  println(s)
  th.pbench{ val x = a(ai).clone; ai = (ai+1)%a.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = b(bi).clone; bi = (bi+1)%b.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = c(ci).clone; ci = (ci+1)%c.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = d(di).clone; di = (di+1)%d.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = e(ei).clone; ei = (ei+1)%e.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = f(fi).clone; fi = (fi+1)%f.length; Sorting.quickSort(x)(o); x(x.length/3) }
}

def ix(ns: Array[N]) = {
  val is = new Array[Int](ns.length)
  var i = 0
  while (i < ns.length) {
    is(i) = ns(i).i
    i += 1
  }
  is
}
val p = new Ordering[Int]{ def compare(a: Int, b: Int) = if (a > b) 1 else if (a < b) -1 else 0 }
for (s <- Seq("one", "two", "three")) {
  println(s)
  val tag: ClassTag[Int] = implicitly[ClassTag[Int]]
  th.pbench{ val x = ix(a(ai)); ai = (ai+1)%a.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(b(bi)); bi = (bi+1)%b.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(c(ci)); ci = (ci+1)%c.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(d(di)); di = (di+1)%d.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(e(ei)); ei = (ei+1)%e.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(f(fi)); fi = (fi+1)%f.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
}

for (s <- Seq("one", "two", "three")) {
  println(s)
  th.pbench{ val x = a(ai).clone; ai = (ai+1)%a.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = b(bi).clone; bi = (bi+1)%b.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = c(ci).clone; ci = (ci+1)%c.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = d(di).clone; di = (di+1)%d.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = e(ei).clone; ei = (ei+1)%e.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = f(fi).clone; fi = (fi+1)%f.length; Arrays.sort(x, o); x(x.length/3) }
}

def bx(is: Array[Int]): Array[java.lang.Integer] = {
  val Is = new Array[java.lang.Integer](is.length)
  var i = 0
  while (i < is.length) {
    Is(i) = java.lang.Integer.valueOf(is(i))
    i += 1
  }
  Is
}
def xb(Is: Array[java.lang.Integer]): Array[Int] = {
  val is = new Array[Int](Is.length)
  var i = 0
  while (i < is.length) {
    is(i) = Is(i).intValue
    i += 1
  }
  is
}
val q = new Comparator[java.lang.Integer]{
  def compare(a: java.lang.Integer, b: java.lang.Integer) = o.compare(a.intValue, b.intValue)
}
for (s <- Seq("one", "two", "three")) {
  println(s)
  val tag: ClassTag[Int] = implicitly[ClassTag[Int]]
  th.pbench{ val x = bx(ix(a(ai))); ai = (ai+1)%a.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(b(bi))); bi = (bi+1)%b.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(c(ci))); ci = (ci+1)%c.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(d(di))); di = (di+1)%d.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(e(ei))); ei = (ei+1)%e.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(f(fi))); fi = (fi+1)%f.length; Arrays.sort(x, q); xb(x)(x.length/3) }
}

Removed code based on Sun JDK sorts and implemented new (basic) sorts from
scratch.  Deferred to Java Arrays.sort whenever practical.  Behavior of
`scala.util.Sorting` should be unchanged, but changed documentation to
specify when the Java methods are being used (as they're typically very fast).

A JUnit test is provided.

Performance is important for sorts.  Everything is better with this patch,
though it could be better yet, as described below.

Below are sort times (in microseconds, SEM < 5%) for various 1024-element
arrays of small case classes that compare on an int field (quickSort), or
int arrays that use custom ordering (stableSort).  Note: "degenerate"
means there are only 16 values possible, so there are lots of ties.
Times are all with fresh data (no re-using cache from run to run).

Results:

```
                        random  sorted  reverse  degenerate  big:64k  tiny:16
Old Sorting.quickSort     234     181      178         103    25,700      1.4
New Sorting.quickSort     170      27      115          74    18,600      0.8

Old Sorting.stableSort    321     234      236         282    32,600      2.1
New Sorting.stableSort    239      16      194         194    25,100      1.2

java.util.Arrays.sort     124       4        8         105    13,500      0.8
java.util.Arrays.sort|Box 126      15       13         112    13,200      0.9
```

The new versions are uniformly faster, but uniformly slower than Java sorting.  scala.util.Sorting has use cases that don't map easily in to Java unless everything is pre-boxed, but the overhead of pre-boxing is minimal compared to the sort.

A snapshot of some of my benchmarking code is below.

(Yes, lots of repeating myself--it's dangerous not to when trying to get
somewhat accurate benchmarks.)

```
import java.util.Arrays
import java.util.Comparator
import math.Ordering
import util.Sorting
import reflect.ClassTag

val th = ichi.bench.Thyme.warmed()

case class N(i: Int, j: Int) {}
val a = Array.fill(1024)( Array.tabulate(1024)(i => N(util.Random.nextInt, i)) )
var ai = 0
val b = Array.fill(1024)( Array.tabulate(1024)(i => N(i, i)) )
var bi = 0
val c = Array.fill(1024)( Array.tabulate(1024)(i => N(1024-i, i)) )
var ci = 0
val d = Array.fill(1024)( Array.tabulate(1024)(i => N(util.Random.nextInt(16), i)) )
var di = 0
val e = Array.fill(16)( Array.tabulate(65536)(i => N(util.Random.nextInt, i)) )
var ei = 0
val f = Array.fill(65535)( Array.tabulate(16)(i => N(util.Random.nextInt, i)) )
var fi = 0

val o = new Ordering[N]{ def compare(a: N, b: N) = if (a.i < b.i) -1 else if (a.i > b.i) 1 else 0 }
for (s <- Seq("one", "two", "three")) {
  println(s)
  th.pbench{ val x = a(ai).clone; ai = (ai+1)%a.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = b(bi).clone; bi = (bi+1)%b.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = c(ci).clone; ci = (ci+1)%c.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = d(di).clone; di = (di+1)%d.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = e(ei).clone; ei = (ei+1)%e.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = f(fi).clone; fi = (fi+1)%f.length; Sorting.quickSort(x)(o); x(x.length/3) }
}

def ix(ns: Array[N]) = {
  val is = new Array[Int](ns.length)
  var i = 0
  while (i < ns.length) {
    is(i) = ns(i).i
    i += 1
  }
  is
}
val p = new Ordering[Int]{ def compare(a: Int, b: Int) = if (a > b) 1 else if (a < b) -1 else 0 }
for (s <- Seq("one", "two", "three")) {
  println(s)
  val tag: ClassTag[Int] = implicitly[ClassTag[Int]]
  th.pbench{ val x = ix(a(ai)); ai = (ai+1)%a.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(b(bi)); bi = (bi+1)%b.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(c(ci)); ci = (ci+1)%c.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(d(di)); di = (di+1)%d.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(e(ei)); ei = (ei+1)%e.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(f(fi)); fi = (fi+1)%f.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
}

for (s <- Seq("one", "two", "three")) {
  println(s)
  th.pbench{ val x = a(ai).clone; ai = (ai+1)%a.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = b(bi).clone; bi = (bi+1)%b.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = c(ci).clone; ci = (ci+1)%c.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = d(di).clone; di = (di+1)%d.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = e(ei).clone; ei = (ei+1)%e.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = f(fi).clone; fi = (fi+1)%f.length; Arrays.sort(x, o); x(x.length/3) }
}

def bx(is: Array[Int]): Array[java.lang.Integer] = {
  val Is = new Array[java.lang.Integer](is.length)
  var i = 0
  while (i < is.length) {
    Is(i) = java.lang.Integer.valueOf(is(i))
    i += 1
  }
  Is
}
def xb(Is: Array[java.lang.Integer]): Array[Int] = {
  val is = new Array[Int](Is.length)
  var i = 0
  while (i < is.length) {
    is(i) = Is(i).intValue
    i += 1
  }
  is
}
val q = new Comparator[java.lang.Integer]{
  def compare(a: java.lang.Integer, b: java.lang.Integer) = o.compare(a.intValue, b.intValue)
}
for (s <- Seq("one", "two", "three")) {
  println(s)
  val tag: ClassTag[Int] = implicitly[ClassTag[Int]]
  th.pbench{ val x = bx(ix(a(ai))); ai = (ai+1)%a.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(b(bi))); bi = (bi+1)%b.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(c(ci))); ci = (ci+1)%c.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(d(di))); di = (di+1)%d.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(e(ei))); ei = (ei+1)%e.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(f(fi))); fi = (fi+1)%f.length; Arrays.sort(x, q); xb(x)(x.length/3) }
}
```
@adriaanm
Copy link
Contributor Author

Backport of #4534

adriaanm added a commit that referenced this pull request Jun 16, 2015
Clean implementation of sorts for scala.util.Sorting.
@adriaanm adriaanm merged commit 40b28c4 into scala:2.10.x Jun 16, 2015
@adriaanm adriaanm deleted the mcpasterton-2.10 branch August 5, 2015 01:29
@adriaanm adriaanm added this to the 2.10.6 milestone Sep 18, 2015
@adriaanm adriaanm changed the title Clean implementation of sorts for scala.util.Sorting. [Backport] Clean implementation of sorts for scala.util.Sorting. Sep 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants