Skip to content

Abnormally slow loop (25x) under OCaml 5 / macOS / arm64 #13262

Open
@fpottier

Description

@fpottier

Hello,

I am using macOS Ventura 13.6.7 with an Apple M2 Max processor.

A loop that writes values into an integer array is about 20x slower with OCaml 5 than with OCaml 4.

Using Array.set versus Array.unsafe_set does not make much difference.

If the loop is manually unrolled (5 times) then this surprising slowness disappears.

To reproduce the problem:

git clone git@github.com:fpottier/array_set_loop.git
cd array_set_loop
make test

On OCaml 4.14.2, I get the following (normal) results:

main $ make test
Benchmark 1: _build/default/main.exe
  Time (mean ± σ):      10.6 ms ±   0.1 ms    [User: 9.8 ms, System: 0.6 ms]
  Range (min … max):    10.2 ms …  10.9 ms    281 runs
 
Benchmark 2: _build/default/main.exe --unsafe
  Time (mean ± σ):       8.4 ms ±   0.1 ms    [User: 7.6 ms, System: 0.6 ms]
  Range (min … max):     8.1 ms …   9.1 ms    357 runs
 
Benchmark 3: _build/default/main.exe --unrolled
  Time (mean ± σ):       9.1 ms ±   0.1 ms    [User: 8.3 ms, System: 0.6 ms]
  Range (min … max):     8.9 ms …   9.3 ms    328 runs
 
Benchmark 4: _build/default/main.exe --unsafe --unrolled
  Time (mean ± σ):       7.7 ms ±   0.1 ms    [User: 6.9 ms, System: 0.6 ms]
  Range (min … max):     7.5 ms …   8.2 ms    381 runs
 
Summary
  _build/default/main.exe --unsafe --unrolled ran
    1.09 ± 0.02 times faster than _build/default/main.exe --unsafe
    1.18 ± 0.02 times faster than _build/default/main.exe --unrolled
    1.38 ± 0.02 times faster than _build/default/main.exe

On OCaml 5.x, I get the following abnormal results:

main $ make test
Benchmark 1: _build/default/main.exe
  Time (mean ± σ):     187.3 ms ±  10.9 ms    [User: 184.2 ms, System: 2.8 ms]
  Range (min … max):   155.3 ms … 206.4 ms    30 runs
 
Benchmark 2: _build/default/main.exe --unsafe
  Time (mean ± σ):     191.2 ms ±  11.8 ms    [User: 188.1 ms, System: 2.8 ms]
  Range (min … max):   156.4 ms … 210.7 ms    30 runs
 
Benchmark 3: _build/default/main.exe --unrolled
  Time (mean ± σ):       7.6 ms ±   0.1 ms    [User: 6.8 ms, System: 0.6 ms]
  Range (min … max):     7.4 ms …   8.0 ms    391 runs
 
Benchmark 4: _build/default/main.exe --unsafe --unrolled
  Time (mean ± σ):       7.6 ms ±   0.1 ms    [User: 6.7 ms, System: 0.6 ms]
  Range (min … max):     7.4 ms …   7.9 ms    388 runs
 
Summary
  _build/default/main.exe --unsafe --unrolled ran
    1.01 ± 0.02 times faster than _build/default/main.exe --unrolled
   24.70 ± 1.47 times faster than _build/default/main.exe
   25.22 ± 1.58 times faster than _build/default/main.exe --unsafe

I have tried OCaml 5.0, 5.1, and 5.2, and get similar results.

In the unrolled loop, the safe point (_caml_call_gc) is executed only once every 5 iterations. So, my (uninformed) guess is that somehow, if _caml_call_gc is called too often, it becomes very slow (?).

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

PerformancePR or issues affecting runtime performance of the compiled programsmemory-model

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions