Building Infer on OCaml 5.4.x for benchmarking

TL;DR I'm trying to benchmark Infer to address performance regressions across multiple ocaml versions (https://github.com/ocaml/ocaml/issues/14047) and would appreciate some help in successfully building Infer on OCaml 5.4.0 and above. Some preliminary improvements are noted below as motivation

### Steps to reproduce
1. Build infer release v1.2.0 from source (targets ocaml 4.14.x)
2. Build infer on OCaml 5.3.0.
3. Build infer on OCaml 5.4.0 using custom switch - build using `./build-infer.sh --no-opam-lock --user-opam-switch`
4. Use an openssl release for analysis (openssl 1.0.2d or 1.1.1g) as mentioned in the original issue. 

Instructions provided in the previously reported issue (Refer https://github.com/ocaml/ocaml/issues/14047) mostly work with some minor adjustments for stricter package versioning (edit the lockfile) and missing packages (use the opam repository archive). I've noted down some helpful tips in case they might be useful https://gist.github.com/curche/d88e1e317507d392877295815989a537

### Expected behavior
Successfully build infer on three different versions. 

### Actual behavior
- Large observable performance regressions between first two version as noted in above issue which I'm hoping to help investigate. 
- Unable to build Infer on OCaml 5.4.x (future official releases of which hope to add runtime improvements that might be interesting for performance & memory reasons).

### Other details

Release notes for OCaml 5.4.0 - https://ocaml.org/releases/5.4.0

#### Runtime results
Sidenote: I wasn't sure whether each infer-analyze needs to be preceded by an infer-capture step or whether the output repo infer-out that infer-capture produces can be reused. Assuming that one capture can work on multiple analyze steps, I tried to get some average execution times

The v1.2.0 release build of Infer targets OCaml 4.14.0 and has the following results

```sh
./infer-v1_2_0-4_14$ hyperfine './bin/infer analyze --no-progress-bar --no-report -j 28'
--
Benchmark : ./bin/infer analyze --no-progress-bar --no-report -j 28
Time (mean ± σ):     68.673 s ±  0.875 s    [User: 1235.389 s, System: 72.549 s]
Range (min … max):   67.382 s … 70.089 s    10 runs
```

On 5.3.0 we have
```sh
./infer-5_3$ hyperfine './bin/infer analyze --no-progress-bar --no-report --no-multicore -j 28'
--
Benchmark : ./bin/infer analyze --no-progress-bar --no-report --no-multicore -j 28
Time (mean ± σ):     78.146 s ±  9.614 s    [User: 1336.197 s, System: 52.114 s]
Range (min … max):   63.521 s … 91.277 s    10 runs
 
./infer-5_3$ hyperfine './bin/infer analyze --no-progress-bar --no-report --multicore -j 28'
Benchmark : ./bin/infer analyze --no-progress-bar --no-report --multicore -j 28
Time (mean ± σ):     139.033 s ± 10.394 s    [User: 2216.346 s, System: 79.341 s]
Range (min … max):   127.028 s … 155.504 s    10 runs
```

One avenue of addressing runtime performance is improving the compaction algorithm in the GC (disclaimer: not by me but by other collaborators I've been in touch with; I'm just trying to collect benchmark results to note down any improvements/changes). While initially targeting 5.4.0 (or trunk/head branch of ocaml/ocaml), there's a [backport](https://github.com/sadiqj/ocaml/tree/new_compactor_53) made available for 5.3.0 where I'm able to build Infer currently

Running Infer built using said modified compiler, we have
```sh
./infer-5_3-compactor$ hyperfine './bin/infer analyze --no-progress-bar --no-report --multicore -j 28'
--
Benchmark : ./bin/infer analyze --no-progress-bar --no-report --multicore -j 28
Time (mean ± σ):     126.810 s ±  7.564 s    [User: 2227.268 s, System: 69.826 s]
Range (min … max):   116.794 s … 140.186 s    10 runs
```

From ~139s to ~126s, around 9% difference (although we lose some in --no-multicore). Note that this is on default runtime parameters. For more rigorous tests, we'll at least need to look at different test cases (for eg, openssl 1.0.2d and 1.1.1g), on different machines, using varying no of jobs & heap sizes. However, I feel like this looks promising enough to further try to get infer building on OCaml 5.4.x (also see table below)

Now, using runtime_events_tools aka [olly](https://opam.ocaml.org/packages/runtime_events_tools/), one can get a trace output which can be viewed through [perfetto](https://ui.perfetto.dev/) (this uses fuchsia trace format) (Eg: `olly trace --format=fuchsia infer.trace.ftf './infer/bin/infer analyze --no-report --multicore -j 28'`) 
Based on trace results, compaction runs are observed to go down from on average >2sec to <1.4s. However, the total no of compactions that happen during analyze stayed pretty much the same (~111). There seems to be a large number of compactions in general. One possibility is the no of manual compactions being triggered. However, looking at infer's DomainPool and ProcessPool across versions, `do_compaction_if_needed` was present before as well and so needs further investigation on what might be happening.

Just for completeness, I tried commenting out `do_compaction_if_needed` from [DomainPool](https://github.com/facebook/infer/blob/e79ce34a3c888d1bed55e4374fedbbe5d1d1980f/infer/src/base/DomainPool.ml#L172) & ProcessPool and noticed some difference 

```sh 
./infer-5_3-dp$ hyperfine './bin/infer analyze --no-progress-bar --no-report --multicore -j 28'
Benchmark : ./bin/infer analyze --no-progress-bar --no-report --multicore -j 28
Time (mean ± σ):     102.304 s ± 10.037 s    [User: 1536.140 s, System: 62.340 s]
Range (min … max):   86.880 s … 119.525 s    10 runs
```

and furthermore, basing that on top of new compactor changes we have
```sh 
./infer-5_3-c-dp$ hyperfine './bin/infer analyze --no-progress-bar --no-report --multicore -j 28' 
Benchmark : ./bin/infer analyze --no-progress-bar --no-report --multicore -j 28
Time (mean ± σ):     98.749 s ±  9.660 s    [User: 1509.209 s, System: 59.148 s]
Range (min … max):   86.864 s … 115.321 s    10 runs
```

Aside: Above mentioned `do_compaction_if_needed` adds corresponding Domain.loop & waiting periods which can flood the ring buffer and can potentially help explain related lost events and crashes in the observability tool (Previously reported in https://github.com/tarides/runtime_events_tools/issues/63). Patching it out does in fact result in fewer/zero lost events when running olly and leads to more reliable gc-stats result. This is still a valid reason to improve olly for better, faster ring buffer reads.

To summarize, here's a table of different runtime values

| infer analyze flags ↓ / OCaml version → | 4.14.x | 5.3.0 | 5.3.0+c | 5.3.0-dp | 5.3.0+c-dp | 5.4.x | 5.4.x+c |
|--|--|--|--|--|--|--|--|
| --no-multicore | 69s | 78s | 84s | 76s | 80s | FAIL | FAIL |
| --multicore | N/A | 139s | 126s | 102s | 98s | FAIL | FAIL |

(`+c` includes compactor changes, `-dp` comments out manual compaction)

#### Notable issues on getting infer building on 5.4.x
One option is to stick to 7d504cc505 and patch it to work with updated dependencies. Alternatively, getting the latest HEAD commit building on 5.4.x is preferable so that recent changes can be accounted for. Issues I've ran into are:
- certain dependencies have updates which have breaking changes (in my testing I have observed errors involving ppxlib & containers to name a few)
- when building from HEAD/main branch/recents commits, building clang plugin fails. It looks like there are differing llvm versions in the [prepare-clang.sh](https://github.com/facebook/infer/blob/a1862e485520dee8bd02a85a756bc0526f120960/facebook-clang-plugins/clang/src/prepare_clang_src.sh#L20) script and the custom `local-llvm` opam package (there was a recent change from LLVM20 to LLVM19)

Since infer has been useful for observing noticeable performance changes, I think it'd be a great project to keep improving the ocaml runtime for 5.4.x as we try out case studies based upon ocaml software in production (and in turn, help bring actual multicore advantages to infer). If there are any benchmark suites which you can point me to or brief examples that cover different aspects of Infer where there are similar performance and memory bottlenecks, please feel free to add to this issue or comment on my above github gist. This'll greatly help coverage of more runtime aspects & infer aspects 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building Infer on OCaml 5.4.x for benchmarking #2013

Steps to reproduce

Expected behavior

Actual behavior

Other details

Runtime results

Notable issues on getting infer building on 5.4.x

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

infer analyze flags ↓ / OCaml version →	4.14.x	5.3.0	5.3.0+c	5.3.0-dp	5.3.0+c-dp	5.4.x	5.4.x+c
--no-multicore	69s	78s	84s	76s	80s	FAIL	FAIL
--multicore	N/A	139s	126s	102s	98s	FAIL	FAIL

Building Infer on OCaml 5.4.x for benchmarking #2013

Description

Steps to reproduce

Expected behavior

Actual behavior

Other details

Runtime results

Notable issues on getting infer building on 5.4.x

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions