Open
Description
The latency of Merlin queries depends on a lot of different factors, such as
- The global buffer it's run on; in particular, size and typing complexity.
- The location inside the buffer it's run on.
- The dependency graph of the buffer.
- Whether and which PPX is applied.
- Merlin's cache state at the moment the query is run.
- Which Merlin query is run.
So for meaningful benchmark results, we need to run Merlin on a big variety of input samples. We've written merl-an
to generate such an input sample set in a random but deterministic way. It has a merl-an benchmark
command, which persists the telemetry part of the Merlin response in the format expected by current-bench
.
The next steps to get a Merlin benchmark CI up and running are:
- Finish the PoC for a
current-bench
CI on Merlin usingmerl-an
. We're currently blocked on this by a current-bench issue. Done: see PoC graphs - Improve the separation into different benchmarks (in
merl-an
): I think, with the currentmerl-an
output,current-bench
will create one different graph for each file that's being benchmarked. That doesn't scale. Instead: One graph per cache workflow and per query or similar. - Improve the Docker set-up: The whole benchmark set-up, such as installing
merl-an
and fetching the code base on which we run Merlin should be done inside the container etc. - Filter out spikes (on
merl-an
). Non-reproducible latency spikes (i.e. timings that exceed the expected timing by over factor 10), mess up the scale of thecurrent-bench
graphs. - Add cold-cache workflow to the benchmarks: The reason why the numbers look so good at the moment is that both cmi-caches and typer cache are fully warmed on all queries. Additionally, it would be interesting to have benchmarks for when the caches are cold.
- Improve the output UX: When some samples call attention, we'll want to know which location and query they correspond to.
- Lock the version of the dependencies of the project on which we run Merlin: Currently, we use Irmin as a code base to run the benchmarks on. We install Irmin's dependencies via
opam
without locking the versions of its dependencies. If a dependency splits or merges modules or increases the size of a module, the cmi-files and cmt-files will vary. That adds Merlin-independent noise to the benchmarks. To avoid that, we could vendor a fixed version of each dependency. - Find a more significant project input base. For now, we only use Irmin as a code base to run the benchmark on.
- Our CI will be very resource heavy. We'll need to decide when to run the benchmarks.
current-bench
supports running the benchmarks only "on demand" (i.e. when tagging the PR with a certain flag). - Possibly: It might also be interesting to track the number of latency spikes.
Metadata
Metadata
Assignees
Labels
No labels