New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean implementation of sorts for scala.util.Sorting. #4534

Merged
merged 1 commit into from Jun 16, 2015

Conversation

Projects
None yet
6 participants
@Ichoran
Contributor

Ichoran commented May 31, 2015

Removed code based on Sun JDK sorts and implemented new (basic) sorts from
scratch. Deferred to Java Arrays.sort whenever practical. Behavior of
scala.util.Sorting should be unchanged, but changed documentation to
specify when the Java methods are being used (as they're typically very fast).

A JUnit test is provided.

Performance is important for sorts. Everything is better with this patch,
though it could be better yet, as described below.

Below are sort times (in microseconds, SEM < 5%) for various 1024-element
arrays of small case classes that compare on an int field (quickSort), or
int arrays that use custom ordering (stableSort). Note: "degenerate"
means there are only 16 values possible, so there are lots of ties.
Times are all with fresh data (no re-using cache from run to run).

Results:

                        random  sorted  reverse  degenerate  big:64k  tiny:16
Old Sorting.quickSort     234     181      178         103    25,700      1.4
New Sorting.quickSort     170      27      115          74    18,600      0.8

Old Sorting.stableSort    321     234      236         282    32,600      2.1
New Sorting.stableSort    239      16      194         194    25,100      1.2

java.util.Arrays.sort     124       4        8         105    13,500      0.8
java.util.Arrays.sort|Box 126      15       13         112    13,200      0.9

The new versions are uniformly faster, but uniformly slower than Java sorting. scala.util.Sorting has use cases that don't map easily in to Java unless everything is pre-boxed, but the overhead of pre-boxing is minimal compared to the sort.

A snapshot of some of my benchmarking code is below.

(Yes, lots of repeating myself--it's dangerous not to when trying to get
somewhat accurate benchmarks.)

import java.util.Arrays
import java.util.Comparator
import math.Ordering
import util.Sorting
import reflect.ClassTag

val th = ichi.bench.Thyme.warmed()

case class N(i: Int, j: Int) {}
val a = Array.fill(1024)( Array.tabulate(1024)(i => N(util.Random.nextInt, i)) )
var ai = 0
val b = Array.fill(1024)( Array.tabulate(1024)(i => N(i, i)) )
var bi = 0
val c = Array.fill(1024)( Array.tabulate(1024)(i => N(1024-i, i)) )
var ci = 0
val d = Array.fill(1024)( Array.tabulate(1024)(i => N(util.Random.nextInt(16), i)) )
var di = 0
val e = Array.fill(16)( Array.tabulate(65536)(i => N(util.Random.nextInt, i)) )
var ei = 0
val f = Array.fill(65535)( Array.tabulate(16)(i => N(util.Random.nextInt, i)) )
var fi = 0

val o = new Ordering[N]{ def compare(a: N, b: N) = if (a.i < b.i) -1 else if (a.i > b.i) 1 else 0 }
for (s <- Seq("one", "two", "three")) {
  println(s)
  th.pbench{ val x = a(ai).clone; ai = (ai+1)%a.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = b(bi).clone; bi = (bi+1)%b.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = c(ci).clone; ci = (ci+1)%c.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = d(di).clone; di = (di+1)%d.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = e(ei).clone; ei = (ei+1)%e.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = f(fi).clone; fi = (fi+1)%f.length; Sorting.quickSort(x)(o); x(x.length/3) }
}

def ix(ns: Array[N]) = {
  val is = new Array[Int](ns.length)
  var i = 0
  while (i < ns.length) {
    is(i) = ns(i).i
    i += 1
  }
  is
}
val p = new Ordering[Int]{ def compare(a: Int, b: Int) = if (a > b) 1 else if (a < b) -1 else 0 }
for (s <- Seq("one", "two", "three")) {
  println(s)
  val tag: ClassTag[Int] = implicitly[ClassTag[Int]]
  th.pbench{ val x = ix(a(ai)); ai = (ai+1)%a.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(b(bi)); bi = (bi+1)%b.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(c(ci)); ci = (ci+1)%c.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(d(di)); di = (di+1)%d.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(e(ei)); ei = (ei+1)%e.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(f(fi)); fi = (fi+1)%f.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
}

for (s <- Seq("one", "two", "three")) {
  println(s)
  th.pbench{ val x = a(ai).clone; ai = (ai+1)%a.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = b(bi).clone; bi = (bi+1)%b.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = c(ci).clone; ci = (ci+1)%c.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = d(di).clone; di = (di+1)%d.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = e(ei).clone; ei = (ei+1)%e.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = f(fi).clone; fi = (fi+1)%f.length; Arrays.sort(x, o); x(x.length/3) }
}

def bx(is: Array[Int]): Array[java.lang.Integer] = {
  val Is = new Array[java.lang.Integer](is.length)
  var i = 0
  while (i < is.length) {
    Is(i) = java.lang.Integer.valueOf(is(i))
    i += 1
  }
  Is
}
def xb(Is: Array[java.lang.Integer]): Array[Int] = {
  val is = new Array[Int](Is.length)
  var i = 0
  while (i < is.length) {
    is(i) = Is(i).intValue
    i += 1
  }
  is
}
val q = new Comparator[java.lang.Integer]{
  def compare(a: java.lang.Integer, b: java.lang.Integer) = o.compare(a.intValue, b.intValue)
}
for (s <- Seq("one", "two", "three")) {
  println(s)
  val tag: ClassTag[Int] = implicitly[ClassTag[Int]]
  th.pbench{ val x = bx(ix(a(ai))); ai = (ai+1)%a.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(b(bi))); bi = (bi+1)%b.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(c(ci))); ci = (ci+1)%c.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(d(di))); di = (di+1)%d.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(e(ei))); ei = (ei+1)%e.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(f(fi))); fi = (fi+1)%f.length; Arrays.sort(x, q); xb(x)(x.length/3) }
}

@scala-jenkins scala-jenkins added this to the 2.11.7 milestone May 31, 2015

@adriaanm

This comment has been minimized.

Show comment
Hide comment
@adriaanm

adriaanm Jun 1, 2015

Member

Excellent!

Looks like most of the binary incompatibilities are because the private specialized methods are being made protected:

        // specialized members have to be overridable.
        if (m.isPrivate)
          m.resetFlag(PRIVATE).setFlag(PROTECTED)

It sounds like we should be able to restrict this to members that are actually overridable? (if (m.isOverridableMember)) /cc @retronym, @lrytz, @VladUreche

EDIT: well, something inspired by isOverridableMember, since it's not exactly what we need here (maybe m.isPrivate && !((m hasFlag FINAL) || m.owner.isEffectivelyFinal)?)

Member

adriaanm commented Jun 1, 2015

Excellent!

Looks like most of the binary incompatibilities are because the private specialized methods are being made protected:

        // specialized members have to be overridable.
        if (m.isPrivate)
          m.resetFlag(PRIVATE).setFlag(PROTECTED)

It sounds like we should be able to restrict this to members that are actually overridable? (if (m.isOverridableMember)) /cc @retronym, @lrytz, @VladUreche

EDIT: well, something inspired by isOverridableMember, since it's not exactly what we need here (maybe m.isPrivate && !((m hasFlag FINAL) || m.owner.isEffectivelyFinal)?)

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Jun 1, 2015

Member

@adriaanm IIUC, the specialized subclass itself needs to override the formerly-private, ala: https://gist.github.com/retronym/69d44047de3ded4fae57

Member

retronym commented Jun 1, 2015

@adriaanm IIUC, the specialized subclass itself needs to override the formerly-private, ala: https://gist.github.com/retronym/69d44047de3ded4fae57

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran Jun 1, 2015

Contributor

@adriaanm @retronym - Can I just mark everything as whitelisted since it's not part of the public API? That's the usual strategy isn't it?

Contributor

Ichoran commented Jun 1, 2015

@adriaanm @retronym - Can I just mark everything as whitelisted since it's not part of the public API? That's the usual strategy isn't it?

@adriaanm

This comment has been minimized.

Show comment
Hide comment
@adriaanm

adriaanm Jun 1, 2015

Member

Yep, I think it's safe to whitelist these synthetics. They're in an object. I still think we could keep the specialized members private since there can't be a subclass for the module class of which they are a member.

Member

adriaanm commented Jun 1, 2015

Yep, I think it's safe to whitelist these synthetics. They're in an object. I still think we could keep the specialized members private since there can't be a subclass for the module class of which they are a member.

@adriaanm

This comment has been minimized.

Show comment
Hide comment
@adriaanm

adriaanm Jun 1, 2015

Member

I guess we could go from 13 whitelisted to 12 if we skipped the default argument for mergeSort, but I don't think it's worth changing.

Member

adriaanm commented Jun 1, 2015

I guess we could go from 13 whitelisted to 12 if we skipped the default argument for mergeSort, but I don't think it's worth changing.

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran Jun 1, 2015

Contributor

@adriaanm - If we change the compiler to be more private here, it will probably cause other nominal binary compatibility issues as things that were visible disappear. It's something to do for 2.12, I guess; I agree that there's no reason for them to be protected.

Contributor

Ichoran commented Jun 1, 2015

@adriaanm - If we change the compiler to be more private here, it will probably cause other nominal binary compatibility issues as things that were visible disappear. It's something to do for 2.12, I guess; I agree that there's no reason for them to be protected.

@adriaanm

This comment has been minimized.

Show comment
Hide comment
@adriaanm

adriaanm Jun 1, 2015

Member

@Ichoran, agreed -- we can't do this in 2.11, but would be small win for binary compatibility in objects in 2.12.

Member

adriaanm commented Jun 1, 2015

@Ichoran, agreed -- we can't do this in 2.11, but would be small win for binary compatibility in objects in 2.12.

Clean implementation of sorts for scala.util.Sorting.
Removed code based on Sun JDK sorts and implemented new (basic) sorts from
scratch.  Deferred to Java Arrays.sort whenever practical.  Behavior of
`scala.util.Sorting` should be unchanged, but changed documentation to
specify when the Java methods are being used (as they're typically very fast).

A JUnit test is provided.

Performance is important for sorts.  Everything is better with this patch,
though it could be better yet, as described below.

Below are sort times (in microseconds, SEM < 5%) for various 1024-element
arrays of small case classes that compare on an int field (quickSort), or
int arrays that use custom ordering (stableSort).  Note: "degenerate"
means there are only 16 values possible, so there are lots of ties.
Times are all with fresh data (no re-using cache from run to run).

Results:

```
                        random  sorted  reverse  degenerate  big:64k  tiny:16
Old Sorting.quickSort     234     181      178         103    25,700      1.4
New Sorting.quickSort     170      27      115          74    18,600      0.8

Old Sorting.stableSort    321     234      236         282    32,600      2.1
New Sorting.stableSort    239      16      194         194    25,100      1.2

java.util.Arrays.sort     124       4        8         105    13,500      0.8
java.util.Arrays.sort|Box 126      15       13         112    13,200      0.9
```

The new versions are uniformly faster, but uniformly slower than Java sorting.  scala.util.Sorting has use cases that don't map easily in to Java unless everything is pre-boxed, but the overhead of pre-boxing is minimal compared to the sort.

A snapshot of some of my benchmarking code is below.

(Yes, lots of repeating myself--it's dangerous not to when trying to get
somewhat accurate benchmarks.)

```
import java.util.Arrays
import java.util.Comparator
import math.Ordering
import util.Sorting
import reflect.ClassTag

val th = ichi.bench.Thyme.warmed()

case class N(i: Int, j: Int) {}
val a = Array.fill(1024)( Array.tabulate(1024)(i => N(util.Random.nextInt, i)) )
var ai = 0
val b = Array.fill(1024)( Array.tabulate(1024)(i => N(i, i)) )
var bi = 0
val c = Array.fill(1024)( Array.tabulate(1024)(i => N(1024-i, i)) )
var ci = 0
val d = Array.fill(1024)( Array.tabulate(1024)(i => N(util.Random.nextInt(16), i)) )
var di = 0
val e = Array.fill(16)( Array.tabulate(65536)(i => N(util.Random.nextInt, i)) )
var ei = 0
val f = Array.fill(65535)( Array.tabulate(16)(i => N(util.Random.nextInt, i)) )
var fi = 0

val o = new Ordering[N]{ def compare(a: N, b: N) = if (a.i < b.i) -1 else if (a.i > b.i) 1 else 0 }
for (s <- Seq("one", "two", "three")) {
  println(s)
  th.pbench{ val x = a(ai).clone; ai = (ai+1)%a.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = b(bi).clone; bi = (bi+1)%b.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = c(ci).clone; ci = (ci+1)%c.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = d(di).clone; di = (di+1)%d.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = e(ei).clone; ei = (ei+1)%e.length; Sorting.quickSort(x)(o); x(x.length/3) }
  th.pbench{ val x = f(fi).clone; fi = (fi+1)%f.length; Sorting.quickSort(x)(o); x(x.length/3) }
}

def ix(ns: Array[N]) = {
  val is = new Array[Int](ns.length)
  var i = 0
  while (i < ns.length) {
    is(i) = ns(i).i
    i += 1
  }
  is
}
val p = new Ordering[Int]{ def compare(a: Int, b: Int) = if (a > b) 1 else if (a < b) -1 else 0 }
for (s <- Seq("one", "two", "three")) {
  println(s)
  val tag: ClassTag[Int] = implicitly[ClassTag[Int]]
  th.pbench{ val x = ix(a(ai)); ai = (ai+1)%a.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(b(bi)); bi = (bi+1)%b.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(c(ci)); ci = (ci+1)%c.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(d(di)); di = (di+1)%d.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(e(ei)); ei = (ei+1)%e.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
  th.pbench{ val x = ix(f(fi)); fi = (fi+1)%f.length; Sorting.stableSort(x)(tag, p); x(x.length/3) }
}

for (s <- Seq("one", "two", "three")) {
  println(s)
  th.pbench{ val x = a(ai).clone; ai = (ai+1)%a.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = b(bi).clone; bi = (bi+1)%b.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = c(ci).clone; ci = (ci+1)%c.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = d(di).clone; di = (di+1)%d.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = e(ei).clone; ei = (ei+1)%e.length; Arrays.sort(x, o); x(x.length/3) }
  th.pbench{ val x = f(fi).clone; fi = (fi+1)%f.length; Arrays.sort(x, o); x(x.length/3) }
}

def bx(is: Array[Int]): Array[java.lang.Integer] = {
  val Is = new Array[java.lang.Integer](is.length)
  var i = 0
  while (i < is.length) {
    Is(i) = java.lang.Integer.valueOf(is(i))
    i += 1
  }
  Is
}
def xb(Is: Array[java.lang.Integer]): Array[Int] = {
  val is = new Array[Int](Is.length)
  var i = 0
  while (i < is.length) {
    is(i) = Is(i).intValue
    i += 1
  }
  is
}
val q = new Comparator[java.lang.Integer]{
  def compare(a: java.lang.Integer, b: java.lang.Integer) = o.compare(a.intValue, b.intValue)
}
for (s <- Seq("one", "two", "three")) {
  println(s)
  val tag: ClassTag[Int] = implicitly[ClassTag[Int]]
  th.pbench{ val x = bx(ix(a(ai))); ai = (ai+1)%a.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(b(bi))); bi = (bi+1)%b.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(c(ci))); ci = (ci+1)%c.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(d(di))); di = (di+1)%d.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(e(ei))); ei = (ei+1)%e.length; Arrays.sort(x, q); xb(x)(x.length/3) }
  th.pbench{ val x = bx(ix(f(fi))); fi = (fi+1)%f.length; Arrays.sort(x, q); xb(x)(x.length/3) }
}
```
private def booleanSort(a: Array[Boolean]): Unit = {
var i = 0
var n = 0
while (i < a.length) {

This comment has been minimized.

@Arneball

Arneball Jun 2, 2015

Contributor

Wouldn't it be beneficial to pull out a.length in a local variable so that we don't need to do arraylength in every iteration?

@Arneball

Arneball Jun 2, 2015

Contributor

Wouldn't it be beneficial to pull out a.length in a local variable so that we don't need to do arraylength in every iteration?

This comment has been minimized.

@Ichoran

Ichoran Jun 3, 2015

Contributor

Not really. The JVM is very good at optimizing this pattern. It knows it's allowed, as array.length is immutable.

@Ichoran

Ichoran Jun 3, 2015

Contributor

Not really. The JVM is very good at optimizing this pattern. It knows it's allowed, as array.length is immutable.

@adriaanm

This comment has been minimized.

Show comment
Hide comment
@adriaanm

adriaanm Jun 16, 2015

Member

LGTM -- @Ichoran, you should probably sign the CLA online at some point :-)

Member

adriaanm commented Jun 16, 2015

LGTM -- @Ichoran, you should probably sign the CLA online at some point :-)

adriaanm added a commit that referenced this pull request Jun 16, 2015

Merge pull request #4534 from Ichoran/sorting-reimpl
Clean implementation of sorts for scala.util.Sorting.

@adriaanm adriaanm merged commit 43a56fb into scala:2.11.x Jun 16, 2015

4 of 5 checks passed

cla @Ichoran, please sign the Scala CLA by clicking on 'Details' -->
Details
integrate-ide [997] SUCCESS. Took 88 min.
Details
validate-main [1329] SUCCESS. Took 105 min.
Details
validate-publish-core [1269] SUCCESS. Took 11 min.
Details
validate-test [824] SUCCESS. Took 50 min.
Details
@paulp

This comment has been minimized.

Show comment
Hide comment
@paulp

paulp Jun 19, 2015

Contributor

Since I left this comment on an "outdated diff", once more for general consumption:

array pattern must respect array invariance

In which you (adriaan) yourself state in the category of things to be done:

implement invariant matching for arrays (including updating the analyses)

Contributor

paulp commented Jun 19, 2015

Since I left this comment on an "outdated diff", once more for general consumption:

array pattern must respect array invariance

In which you (adriaan) yourself state in the category of things to be done:

implement invariant matching for arrays (including updating the analyses)

@paulp

This comment has been minimized.

Show comment
Hide comment
@paulp

paulp Jun 19, 2015

Contributor

That's a good candidate ticket to be abandoned when you declare jira bankruptcy. Correctness deteriorates over time.

Contributor

paulp commented Jun 19, 2015

That's a good candidate ticket to be abandoned when you declare jira bankruptcy. Correctness deteriorates over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment