Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not rely on `ArrayIndexOOBE` to throw `IndexOOBE` in `Vector`. #8828

Closed
wants to merge 1 commit into from

Conversation

@sjrd
Copy link
Member

sjrd commented Mar 26, 2020

This is necessary for Scala.js, where ArrayIndexOutOfBoundsExceptions are Undefined Behavior, and cannot be reliably caught.

I have not checked the consequences in terms of performance for the JVM. We may have to tweak whether we should inline checkIOOB, or only the test part of it but not the throwing, etc. Even the best check may be worse than the try..catch on the JVM.


This combines with #8827 to fix scala/bug#11920. This PR may have more JVM impact than #8827.

This is necessary for Scala.js, where
`ArrayIndexOutOfBoundsException`s are Undefined Behavior, and
cannot be reliably caught.

I have not checked the consequences in terms of performance for the
JVM. We may have to tweak whether we should inline `checkIOOB`, or
only the test part of it but not the throwing, etc. Even the best
check may be worse than the `try..catch` on the JVM.
@szeiger

This comment has been minimized.

Copy link
Contributor

szeiger commented Mar 26, 2020

Any check will be worse (because the exception-catching version is free). I expect we need to inline the check but not the throwing for best performance and the effect on apply will be in the lower single-digit percents. I'll run some benchmarks for this and a possible further optimization that I discussed with @retronym on the JVM.

@szeiger

This comment has been minimized.

Copy link
Contributor

szeiger commented Mar 26, 2020

First results:

2.13.x baseline:

[info] Benchmark                           (size)  Mode  Cnt        Score        Error  Units
[info] VectorBenchmark2.nvApplySequential       1  avgt   20  6016713.593 ± 123096.822  ns/op
[info] VectorBenchmark2.nvApplySequential      10  avgt   20  6209563.860 ± 121489.124  ns/op
[info] VectorBenchmark2.nvApplySequential     100  avgt   20  7114981.710 ±  84128.685  ns/op
[info] VectorBenchmark2.nvApplySequential    1000  avgt   20  7174637.194 ±  30391.504  ns/op
[info] VectorBenchmark2.nvApplySequential   10000  avgt   20  7849462.676 ±  61830.879  ns/op
[info] VectorBenchmark2.nvApplySequential   50000  avgt   20  8016510.942 ±  11357.268  ns/op

This PR:

[info] Benchmark                           (size)  Mode  Cnt        Score        Error  Units
[info] VectorBenchmark2.nvApplySequential       1  avgt   20  6350917.256 ± 178890.863  ns/op
[info] VectorBenchmark2.nvApplySequential      10  avgt   20  6436052.663 ± 145455.158  ns/op
[info] VectorBenchmark2.nvApplySequential     100  avgt   20  7549449.738 ±  95150.794  ns/op
[info] VectorBenchmark2.nvApplySequential    1000  avgt   20  7574626.497 ±  52598.701  ns/op
[info] VectorBenchmark2.nvApplySequential   10000  avgt   20  8224562.928 ±  28936.503  ns/op
[info] VectorBenchmark2.nvApplySequential   50000  avgt   20  8379022.801 ±   4142.129  ns/op

Changing back from throwIOOB to ioob with external throws, inlining the checks and calling ioob in the inlined methods yields exactly the same performance.

@sjrd

This comment has been minimized.

Copy link
Member Author

sjrd commented Mar 26, 2020

Changing back from throwIOOB to ioob with external throws, inlining the checks and calling ioob in the inlined methods yields exactly the same performance.

The same performance as "This PR", I assume?

Overall a 5% performance hit :(

@szeiger

This comment has been minimized.

Copy link
Contributor

szeiger commented Mar 26, 2020

Yes, same as this PR. The result is pretty much what I expected. Getting HotSpot to elide its own range check looks promising in Vector1:

  @inline def apply(index: Int): A = {
    if(index >= 0 && index < prefix1.length)
      prefix1(index).asInstanceOf[A]
    else throw ioob(index)
  }

Still worse than the current version but better than the fancy range check in this PR (which may be faster on its own but doesn't allow HotSpot to recognize that the index is valid):

[info] Benchmark                           (size)  Mode  Cnt        Score        Error  Units
[info] VectorBenchmark2.nvApplySequential       1  avgt   20  6226291.418 ± 123640.947  ns/op
[info] VectorBenchmark2.nvApplySequential      10  avgt   20  6335493.456 ±  98708.392  ns/op

Currently running more benchmarks to try the masking trick (https://mail.openjdk.java.net/pipermail/valhalla-dev/2015-August/001269.html) for some of the arrays in Vector2 and Vector3.

@szeiger

This comment has been minimized.

Copy link
Contributor

szeiger commented Mar 26, 2020

I couldn't get the masking to work, it only made things slower. This is the best I managed to do:

[info] Benchmark                           (size)  Mode  Cnt        Score        Error  Units
[info] VectorBenchmark2.nvApplySequential       1  avgt   20  6235825.540 ± 130580.064  ns/op
[info] VectorBenchmark2.nvApplySequential      10  avgt   20  6334201.526 ±  97422.230  ns/op
[info] VectorBenchmark2.nvApplySequential     100  avgt   20  7369466.304 ±  58009.546  ns/op
[info] VectorBenchmark2.nvApplySequential    1000  avgt   20  7350811.983 ±  45198.280  ns/op
[info] VectorBenchmark2.nvApplySequential   10000  avgt   20  8015893.183 ±  36771.418  ns/op
[info] VectorBenchmark2.nvApplySequential   50000  avgt   20  8423561.511 ±  10278.333  ns/op
@retronym

This comment has been minimized.

Copy link
Member

retronym commented Mar 26, 2020

@szeiger Could you try the variations with the array field first being assigned to a local in the method? I'm not sure if HotSpot relates this.arrayField.length with a later invocation of this.arrayField[posititiveIStaticallyLessThanLength]. Java enhanced for loops over arrays are desugared to use a local temporary val for this reason.

Although, this suggests, albeit in a slightly different context of loop transforms, that:

A loop-invariant value can be an object field, as long as the compiler can prove the field isn't changed

@viktorklang

This comment has been minimized.

Copy link
Contributor

viktorklang commented Mar 26, 2020

@retronym Good call. Assigning to a local final should make life easier for the optimizer, and I'm curious to see if that is the case here.

@szeiger

This comment has been minimized.

Copy link
Contributor

szeiger commented Mar 27, 2020

It was done with locals. The optimization doesn't apply to the arrays in the fields anyway because they are not power of 2 sizes. It would only work for the nested arrays.

For example, the changes for this in Vector2.apply are:

        val io = index - len1
        val i2 = io >>> BITS
        //val i1 = io & MASK
        //if(i2 != data2.length) data2(i2)(i1)
        if(i2 != data2.length) {
          val d2i2 = data2(i2)
          d2i2(io & (d2i2.length-1))
        }
        else suffix1(i1)
@szeiger

This comment has been minimized.

Copy link
Contributor

szeiger commented Mar 27, 2020

And in case you were wondering about Vector1 which has the if(i >= 0 && i < a.length) check that HotSpot recognizes: it works fine with the final field, using a local variable does not make any difference in the performance (but switching to the unrecognized unsigned comparison does!)

@szeiger

This comment has been minimized.

Copy link
Contributor

szeiger commented Mar 31, 2020

Closing in favor of #8829

@szeiger szeiger closed this Mar 31, 2020
@sjrd sjrd deleted the sjrd:fix-ioob-vector-scalajs branch Mar 31, 2020
@SethTisue SethTisue removed this from the 2.13.2 milestone Mar 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

6 participants
You can’t perform that action at this time.