Skip to content

Scala perf optimizations#23

Merged
ornicar merged 4 commits intolichess-org:masterfrom
isaacl:jmhPerf
Nov 1, 2025
Merged

Scala perf optimizations#23
ornicar merged 4 commits intolichess-org:masterfrom
isaacl:jmhPerf

Conversation

@isaacl
Copy link
Copy Markdown
Member

@isaacl isaacl commented Oct 30, 2025

Major improvement for Clock encode, some small speed up for move encoding as well.

  • Remove Array.tabulate and .foreach in hot paths as they require Autoboxing.
  • Adjustments based on javap source inspection and JMH benchmarks.
                Java    Master   This
Huffman.Encode  8.57ms  8.68ms   8.27ms
Huffman.Decode  7.37ms  5.64ms   5.62ms
Clock.Encode    185ns   312ns    192ns
Clock.Decode    206ns   231ns    220ns

@isaacl isaacl marked this pull request as draft October 30, 2025 18:25
- Remove Array.tabulate and .foreach in hot paths
  as they require Autoboxing.
- Adjustments based on javap source inspection and
  JMH benchmarks.

```text
                Java    Master   This
Huffman.Encode  8.57ms  8.68ms   8.27ms
Huffman.Decode  7.37ms  5.64ms   5.62ms
Clock.Encode    185ns   312ns    192ns
Clock.Decode    206ns   231ns    220ns
```
@isaacl isaacl marked this pull request as ready for review October 30, 2025 19:28
object Encoder:

def encode(centis: Array[Int], startTime: Int): Array[Byte] =
if centis.isEmpty then return Array.emptyByteArray
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for optimizing hot paths, but that's not one. It's invoked only once per game. So I think we should use the idiomatic isEmpty, which btw is supposed to be inlined:

@`inline` def isEmpty: Boolean = xs.length == 0

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once per ply

var i = 0
while i < len do
writeSigned(values(i), writer)
i += 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a hot path, it runs once per game. I don't think we should be re-implementing scala's foreach where it can't possibly make a difference.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once per ply -- and yes, it does make a difference in the OverallEncodingTest.testEncode bench. (260->190ns/op)

@ornicar
Copy link
Copy Markdown
Contributor

ornicar commented Oct 31, 2025

b733ef4 sbt 'benchmarks/jmh:run -i 5 -wi 3 -f1 -t1 org.lichess.compression.benchmark.*'

[info] Benchmark                       Mode  Cnt     Score    Error  Units
[info] BitOpsTest.testRead             avgt    5   106.057 ±  0.401  ns/op
[info] HuffmanPgnBench.decode          avgt    5  4176.123 ± 40.581  us/op
[info] HuffmanPgnBench.encode          avgt    5  7065.805 ± 28.124  us/op
[info] LinearEstimateTest.testEncode   avgt    5    32.803 ±  0.167  ns/op
[info] LowBitTruncTest.testEncode      avgt    5     3.645 ±  0.011  ns/op
[info] OverallEncodingTest.testDecode  avgt    5   129.727 ±  1.060  ns/op
[info] OverallEncodingTest.testEncode  avgt    5   159.204 ±  3.736  ns/op
[info] VarIntEncodingTest.testDecode   avgt    5    83.967 ±  0.284  ns/op
[info] VarIntEncodingTest.testEncode   avgt    5   125.071 ±  7.903  ns/op

@ornicar
Copy link
Copy Markdown
Contributor

ornicar commented Oct 31, 2025

ffe3e19 sbt 'benchmarks/jmh:run -i 5 -wi 3 -f1 -t1 org.lichess.compression.benchmark.OverallEncodingTest'

[info] Benchmark                       Mode  Cnt    Score   Error  Units
[info] OverallEncodingTest.testDecode  avgt    5  130.638 ± 3.088  ns/op
[info] OverallEncodingTest.testEncode  avgt    5  157.204 ± 2.064  ns/op

as expected within statistical error

@ornicar
Copy link
Copy Markdown
Contributor

ornicar commented Oct 31, 2025

Also I'm rather confused because I just tried benchmarking master 877666f and got this:

877666f sbt 'benchmarks/jmh:run -i 5 -wi 3 -f1 -t1 org.lichess.compression.benchmark.OverallEncodingTest'

[info] Benchmark                       Mode  Cnt    Score   Error  Units
[info] OverallEncodingTest.testDecode  avgt    5  128.926 ± 2.996  ns/op
[info] OverallEncodingTest.testEncode  avgt    5  141.193 ± 1.757  ns/op

Showing similar/better perf than this PR.

Using openjdk 21.0.9 / Linux 6.17.2.

I'm sure this PR should be better, I just don't understand how it can't be measured on my machine.

@isaacl
Copy link
Copy Markdown
Member Author

isaacl commented Oct 31, 2025

That's odd -- I get consistently better benchmarks on my machine, most of the changes here I benchmarked individually, with the exception of a couple foreach array replacements that I did in one go.

I did see a separate odd quirk in jmh, where Huffman.encode sometimes runs with 8.5ms/run and sometimes with 5.6ms/run and restarting the run seems to flip a coin of which one it's going to be.

- remove @inline annotations which do nothing
- better line spacing for BitMask field
According to my JMH benches, these do matter.
idk

object LinearEstimator:

@inline def encode(dest: Array[Int], startTime: Int): Unit =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since scala3 we have an inline keyword that guarantees inlining https://docs.scala-lang.org/scala3/guides/macros/inline.html#inline-methods

@ornicar ornicar merged commit 9472bdb into lichess-org:master Nov 1, 2025
1 check passed
@isaacl
Copy link
Copy Markdown
Member Author

isaacl commented Nov 6, 2025

@ornicar will investigate. Definitely not expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants