TL;DR I'm trying to benchmark Infer to address performance regressions across multiple ocaml versions (ocaml/ocaml#14047) and would appreciate some help in successfully building Infer on OCaml 5.4.0 and above. Some preliminary improvements are noted below as motivation
Steps to reproduce
- Build infer release v1.2.0 from source (targets ocaml 4.14.x)
- Build infer on OCaml 5.3.0.
- Build infer on OCaml 5.4.0 using custom switch - build using
./build-infer.sh --no-opam-lock --user-opam-switch
- Use an openssl release for analysis (openssl 1.0.2d or 1.1.1g) as mentioned in the original issue.
Instructions provided in the previously reported issue (Refer ocaml/ocaml#14047) mostly work with some minor adjustments for stricter package versioning (edit the lockfile) and missing packages (use the opam repository archive). I've noted down some helpful tips in case they might be useful https://gist.github.com/curche/d88e1e317507d392877295815989a537
Expected behavior
Successfully build infer on three different versions.
Actual behavior
- Large observable performance regressions between first two version as noted in above issue which I'm hoping to help investigate.
- Unable to build Infer on OCaml 5.4.x (future official releases of which hope to add runtime improvements that might be interesting for performance & memory reasons).
Other details
Release notes for OCaml 5.4.0 - https://ocaml.org/releases/5.4.0
Runtime results
Sidenote: I wasn't sure whether each infer-analyze needs to be preceded by an infer-capture step or whether the output repo infer-out that infer-capture produces can be reused. Assuming that one capture can work on multiple analyze steps, I tried to get some average execution times
The v1.2.0 release build of Infer targets OCaml 4.14.0 and has the following results
./infer-v1_2_0-4_14$ hyperfine './bin/infer analyze --no-progress-bar --no-report -j 28'
--
Benchmark : ./bin/infer analyze --no-progress-bar --no-report -j 28
Time (mean ± σ): 68.673 s ± 0.875 s [User: 1235.389 s, System: 72.549 s]
Range (min … max): 67.382 s … 70.089 s 10 runs
On 5.3.0 we have
./infer-5_3$ hyperfine './bin/infer analyze --no-progress-bar --no-report --no-multicore -j 28'
--
Benchmark : ./bin/infer analyze --no-progress-bar --no-report --no-multicore -j 28
Time (mean ± σ): 78.146 s ± 9.614 s [User: 1336.197 s, System: 52.114 s]
Range (min … max): 63.521 s … 91.277 s 10 runs
./infer-5_3$ hyperfine './bin/infer analyze --no-progress-bar --no-report --multicore -j 28'
Benchmark : ./bin/infer analyze --no-progress-bar --no-report --multicore -j 28
Time (mean ± σ): 139.033 s ± 10.394 s [User: 2216.346 s, System: 79.341 s]
Range (min … max): 127.028 s … 155.504 s 10 runs
One avenue of addressing runtime performance is improving the compaction algorithm in the GC (disclaimer: not by me but by other collaborators I've been in touch with; I'm just trying to collect benchmark results to note down any improvements/changes). While initially targeting 5.4.0 (or trunk/head branch of ocaml/ocaml), there's a backport made available for 5.3.0 where I'm able to build Infer currently
Running Infer built using said modified compiler, we have
./infer-5_3-compactor$ hyperfine './bin/infer analyze --no-progress-bar --no-report --multicore -j 28'
--
Benchmark : ./bin/infer analyze --no-progress-bar --no-report --multicore -j 28
Time (mean ± σ): 126.810 s ± 7.564 s [User: 2227.268 s, System: 69.826 s]
Range (min … max): 116.794 s … 140.186 s 10 runs
From ~139s to ~126s, around 9% difference (although we lose some in --no-multicore). Note that this is on default runtime parameters. For more rigorous tests, we'll at least need to look at different test cases (for eg, openssl 1.0.2d and 1.1.1g), on different machines, using varying no of jobs & heap sizes. However, I feel like this looks promising enough to further try to get infer building on OCaml 5.4.x (also see table below)
Now, using runtime_events_tools aka olly, one can get a trace output which can be viewed through perfetto (this uses fuchsia trace format) (Eg: olly trace --format=fuchsia infer.trace.ftf './infer/bin/infer analyze --no-report --multicore -j 28')
Based on trace results, compaction runs are observed to go down from on average >2sec to <1.4s. However, the total no of compactions that happen during analyze stayed pretty much the same (~111). There seems to be a large number of compactions in general. One possibility is the no of manual compactions being triggered. However, looking at infer's DomainPool and ProcessPool across versions, do_compaction_if_needed was present before as well and so needs further investigation on what might be happening.
Just for completeness, I tried commenting out do_compaction_if_needed from DomainPool & ProcessPool and noticed some difference
./infer-5_3-dp$ hyperfine './bin/infer analyze --no-progress-bar --no-report --multicore -j 28'
Benchmark : ./bin/infer analyze --no-progress-bar --no-report --multicore -j 28
Time (mean ± σ): 102.304 s ± 10.037 s [User: 1536.140 s, System: 62.340 s]
Range (min … max): 86.880 s … 119.525 s 10 runs
and furthermore, basing that on top of new compactor changes we have
./infer-5_3-c-dp$ hyperfine './bin/infer analyze --no-progress-bar --no-report --multicore -j 28'
Benchmark : ./bin/infer analyze --no-progress-bar --no-report --multicore -j 28
Time (mean ± σ): 98.749 s ± 9.660 s [User: 1509.209 s, System: 59.148 s]
Range (min … max): 86.864 s … 115.321 s 10 runs
Aside: Above mentioned do_compaction_if_needed adds corresponding Domain.loop & waiting periods which can flood the ring buffer and can potentially help explain related lost events and crashes in the observability tool (Previously reported in tarides/runtime_events_tools#63). Patching it out does in fact result in fewer/zero lost events when running olly and leads to more reliable gc-stats result. This is still a valid reason to improve olly for better, faster ring buffer reads.
To summarize, here's a table of different runtime values
| infer analyze flags ↓ / OCaml version → |
4.14.x |
5.3.0 |
5.3.0+c |
5.3.0-dp |
5.3.0+c-dp |
5.4.x |
5.4.x+c |
| --no-multicore |
69s |
78s |
84s |
76s |
80s |
FAIL |
FAIL |
| --multicore |
N/A |
139s |
126s |
102s |
98s |
FAIL |
FAIL |
(+c includes compactor changes, -dp comments out manual compaction)
Notable issues on getting infer building on 5.4.x
One option is to stick to 7d504cc and patch it to work with updated dependencies. Alternatively, getting the latest HEAD commit building on 5.4.x is preferable so that recent changes can be accounted for. Issues I've ran into are:
- certain dependencies have updates which have breaking changes (in my testing I have observed errors involving ppxlib & containers to name a few)
- when building from HEAD/main branch/recents commits, building clang plugin fails. It looks like there are differing llvm versions in the prepare-clang.sh script and the custom
local-llvm opam package (there was a recent change from LLVM20 to LLVM19)
Since infer has been useful for observing noticeable performance changes, I think it'd be a great project to keep improving the ocaml runtime for 5.4.x as we try out case studies based upon ocaml software in production (and in turn, help bring actual multicore advantages to infer). If there are any benchmark suites which you can point me to or brief examples that cover different aspects of Infer where there are similar performance and memory bottlenecks, please feel free to add to this issue or comment on my above github gist. This'll greatly help coverage of more runtime aspects & infer aspects
TL;DR I'm trying to benchmark Infer to address performance regressions across multiple ocaml versions (ocaml/ocaml#14047) and would appreciate some help in successfully building Infer on OCaml 5.4.0 and above. Some preliminary improvements are noted below as motivation
Steps to reproduce
./build-infer.sh --no-opam-lock --user-opam-switchInstructions provided in the previously reported issue (Refer ocaml/ocaml#14047) mostly work with some minor adjustments for stricter package versioning (edit the lockfile) and missing packages (use the opam repository archive). I've noted down some helpful tips in case they might be useful https://gist.github.com/curche/d88e1e317507d392877295815989a537
Expected behavior
Successfully build infer on three different versions.
Actual behavior
Other details
Release notes for OCaml 5.4.0 - https://ocaml.org/releases/5.4.0
Runtime results
Sidenote: I wasn't sure whether each infer-analyze needs to be preceded by an infer-capture step or whether the output repo infer-out that infer-capture produces can be reused. Assuming that one capture can work on multiple analyze steps, I tried to get some average execution times
The v1.2.0 release build of Infer targets OCaml 4.14.0 and has the following results
On 5.3.0 we have
One avenue of addressing runtime performance is improving the compaction algorithm in the GC (disclaimer: not by me but by other collaborators I've been in touch with; I'm just trying to collect benchmark results to note down any improvements/changes). While initially targeting 5.4.0 (or trunk/head branch of ocaml/ocaml), there's a backport made available for 5.3.0 where I'm able to build Infer currently
Running Infer built using said modified compiler, we have
From ~139s to ~126s, around 9% difference (although we lose some in --no-multicore). Note that this is on default runtime parameters. For more rigorous tests, we'll at least need to look at different test cases (for eg, openssl 1.0.2d and 1.1.1g), on different machines, using varying no of jobs & heap sizes. However, I feel like this looks promising enough to further try to get infer building on OCaml 5.4.x (also see table below)
Now, using runtime_events_tools aka olly, one can get a trace output which can be viewed through perfetto (this uses fuchsia trace format) (Eg:
olly trace --format=fuchsia infer.trace.ftf './infer/bin/infer analyze --no-report --multicore -j 28')Based on trace results, compaction runs are observed to go down from on average >2sec to <1.4s. However, the total no of compactions that happen during analyze stayed pretty much the same (~111). There seems to be a large number of compactions in general. One possibility is the no of manual compactions being triggered. However, looking at infer's DomainPool and ProcessPool across versions,
do_compaction_if_neededwas present before as well and so needs further investigation on what might be happening.Just for completeness, I tried commenting out
do_compaction_if_neededfrom DomainPool & ProcessPool and noticed some differenceand furthermore, basing that on top of new compactor changes we have
Aside: Above mentioned
do_compaction_if_neededadds corresponding Domain.loop & waiting periods which can flood the ring buffer and can potentially help explain related lost events and crashes in the observability tool (Previously reported in tarides/runtime_events_tools#63). Patching it out does in fact result in fewer/zero lost events when running olly and leads to more reliable gc-stats result. This is still a valid reason to improve olly for better, faster ring buffer reads.To summarize, here's a table of different runtime values
(
+cincludes compactor changes,-dpcomments out manual compaction)Notable issues on getting infer building on 5.4.x
One option is to stick to 7d504cc and patch it to work with updated dependencies. Alternatively, getting the latest HEAD commit building on 5.4.x is preferable so that recent changes can be accounted for. Issues I've ran into are:
local-llvmopam package (there was a recent change from LLVM20 to LLVM19)Since infer has been useful for observing noticeable performance changes, I think it'd be a great project to keep improving the ocaml runtime for 5.4.x as we try out case studies based upon ocaml software in production (and in turn, help bring actual multicore advantages to infer). If there are any benchmark suites which you can point me to or brief examples that cover different aspects of Infer where there are similar performance and memory bottlenecks, please feel free to add to this issue or comment on my above github gist. This'll greatly help coverage of more runtime aspects & infer aspects