Scala perf optimizations by isaacl · Pull Request #23 · lichess-org/compression

isaacl · 2025-10-30T18:20:16Z

Major improvement for Clock encode, some small speed up for move encoding as well.

Remove Array.tabulate and .foreach in hot paths as they require Autoboxing.
Adjustments based on javap source inspection and JMH benchmarks.

                Java    Master   This
Huffman.Encode  8.57ms  8.68ms   8.27ms
Huffman.Decode  7.37ms  5.64ms   5.62ms
Clock.Encode    185ns   312ns    192ns
Clock.Decode    206ns   231ns    220ns

- Remove Array.tabulate and .foreach in hot paths as they require Autoboxing. - Adjustments based on javap source inspection and JMH benchmarks. ```text Java Master This Huffman.Encode 8.57ms 8.68ms 8.27ms Huffman.Decode 7.37ms 5.64ms 5.62ms Clock.Encode 185ns 312ns 192ns Clock.Decode 206ns 231ns 220ns ```

ornicar · 2025-10-31T07:23:04Z

 object Encoder:

  def encode(centis: Array[Int], startTime: Int): Array[Byte] =
-    if centis.isEmpty then return Array.emptyByteArray


I'm all for optimizing hot paths, but that's not one. It's invoked only once per game. So I think we should use the idiomatic isEmpty, which btw is supposed to be inlined:

@`inline` def isEmpty: Boolean = xs.length == 0

Once per ply

ornicar · 2025-10-31T07:31:07Z

+    var i   = 0
+    while i < len do
+      writeSigned(values(i), writer)
+      i += 1


This is not a hot path, it runs once per game. I don't think we should be re-implementing scala's foreach where it can't possibly make a difference.

Once per ply -- and yes, it does make a difference in the OverallEncodingTest.testEncode bench. (260->190ns/op)

ornicar · 2025-10-31T07:32:48Z

b733ef4 sbt 'benchmarks/jmh:run -i 5 -wi 3 -f1 -t1 org.lichess.compression.benchmark.*'

[info] Benchmark                       Mode  Cnt     Score    Error  Units
[info] BitOpsTest.testRead             avgt    5   106.057 ±  0.401  ns/op
[info] HuffmanPgnBench.decode          avgt    5  4176.123 ± 40.581  us/op
[info] HuffmanPgnBench.encode          avgt    5  7065.805 ± 28.124  us/op
[info] LinearEstimateTest.testEncode   avgt    5    32.803 ±  0.167  ns/op
[info] LowBitTruncTest.testEncode      avgt    5     3.645 ±  0.011  ns/op
[info] OverallEncodingTest.testDecode  avgt    5   129.727 ±  1.060  ns/op
[info] OverallEncodingTest.testEncode  avgt    5   159.204 ±  3.736  ns/op
[info] VarIntEncodingTest.testDecode   avgt    5    83.967 ±  0.284  ns/op
[info] VarIntEncodingTest.testEncode   avgt    5   125.071 ±  7.903  ns/op

ornicar · 2025-10-31T07:56:41Z

ffe3e19 sbt 'benchmarks/jmh:run -i 5 -wi 3 -f1 -t1 org.lichess.compression.benchmark.OverallEncodingTest'

[info] Benchmark                       Mode  Cnt    Score   Error  Units
[info] OverallEncodingTest.testDecode  avgt    5  130.638 ± 3.088  ns/op
[info] OverallEncodingTest.testEncode  avgt    5  157.204 ± 2.064  ns/op

as expected within statistical error

ornicar · 2025-10-31T08:07:01Z

Also I'm rather confused because I just tried benchmarking master 877666f and got this:

877666f sbt 'benchmarks/jmh:run -i 5 -wi 3 -f1 -t1 org.lichess.compression.benchmark.OverallEncodingTest'

[info] Benchmark                       Mode  Cnt    Score   Error  Units
[info] OverallEncodingTest.testDecode  avgt    5  128.926 ± 2.996  ns/op
[info] OverallEncodingTest.testEncode  avgt    5  141.193 ± 1.757  ns/op

Showing similar/better perf than this PR.

Using openjdk 21.0.9 / Linux 6.17.2.

I'm sure this PR should be better, I just don't understand how it can't be measured on my machine.

isaacl · 2025-10-31T17:01:07Z

That's odd -- I get consistently better benchmarks on my machine, most of the changes here I benchmarked individually, with the exception of a couple foreach array replacements that I did in one go.

I did see a separate odd quirk in jmh, where Huffman.encode sometimes runs with 8.5ms/run and sometimes with 5.6ms/run and restarting the run seems to flip a coin of which one it's going to be.

@inline

- remove @inline annotations which do nothing - better line spacing for BitMask field

According to my JMH benches, these do matter. idk

ornicar · 2025-11-01T07:22:11Z


 object LinearEstimator:

-  @inline def encode(dest: Array[Int], startTime: Int): Unit =


since scala3 we have an inline keyword that guarantees inlining https://docs.scala-lang.org/scala3/guides/macros/inline.html#inline-methods

ornicar · 2025-11-06T07:49:23Z

https://monitor.lichess.ovh/d/nAXOnXrWz/game-pgn-encoder?orgId=1&from=now-7d&to=now&timezone=utc&var-instance=lila@ocean&var-field=mean&refresh=5m&viewPanel=panel-2

isaacl · 2025-11-06T16:23:58Z

@ornicar will investigate. Definitely not expected.

isaacl marked this pull request as draft October 30, 2025 18:25

isaacl force-pushed the jmhPerf branch from c8ce4d0 to b733ef4 Compare October 30, 2025 19:28

isaacl marked this pull request as ready for review October 30, 2025 19:28

ornicar reviewed Oct 31, 2025

View reviewed changes

idiomatic scala in functions called once per game

ffe3e19

isaacl added 2 commits October 31, 2025 13:06

Cosmetic fixes

44968e6

- remove @inline annotations which do nothing - better line spacing for BitMask field

Add back optimizations

859a846

According to my JMH benches, these do matter. idk

ornicar reviewed Nov 1, 2025

View reviewed changes

ornicar merged commit 9472bdb into lichess-org:master Nov 1, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scala perf optimizations#23

Scala perf optimizations#23
ornicar merged 4 commits intolichess-org:masterfrom
isaacl:jmhPerf

isaacl commented Oct 30, 2025

Uh oh!

ornicar Oct 31, 2025

Uh oh!

isaacl Oct 31, 2025

Uh oh!

ornicar Oct 31, 2025

Uh oh!

isaacl Oct 31, 2025

Uh oh!

ornicar commented Oct 31, 2025 •

edited

Loading

Uh oh!

ornicar commented Oct 31, 2025

Uh oh!

ornicar commented Oct 31, 2025 •

edited

Loading

Uh oh!

isaacl commented Oct 31, 2025

Uh oh!

ornicar Nov 1, 2025

Uh oh!

Uh oh!

ornicar commented Nov 6, 2025

Uh oh!

isaacl commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		object LinearEstimator:

		@inline def encode(dest: Array[Int], startTime: Int): Unit =

Uh oh!

Conversation

isaacl commented Oct 30, 2025

Uh oh!

ornicar Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

isaacl Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

ornicar Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

isaacl Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

ornicar commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ornicar commented Oct 31, 2025

Uh oh!

ornicar commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isaacl commented Oct 31, 2025

Uh oh!

ornicar Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ornicar commented Nov 6, 2025

Uh oh!

isaacl commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ornicar commented Oct 31, 2025 •

edited

Loading

ornicar commented Oct 31, 2025 •

edited

Loading