-
Notifications
You must be signed in to change notification settings - Fork 40
V9 Optimization Log
Part of the Performance section - see Benchmarks for the overview, methodology and headline results.
Tracks the effect of each allocation/throughput optimization on the v9 benchmark suite, applied one change at a time. Every change is validated against the full unit + integration test suites before its numbers are recorded here.
- Machine: AMD Ryzen 9 3950X, Windows 10 (22H2), .NET 8.0.27 host. Absolute numbers are machine-specific - only the deltas within this page are meaningful.
-
Regenerate:
dotnet run -c Release --project benchmarks/Serilog.Sinks.Grafana.Loki.Benchmarks.V9for v9 alone, ordotnet fsi build.fsx -- --target Benchmarkfor all three suites. -
Allocatedis exact (deterministic regardless of the timing job) - treat it as the primary signal. -
Meanat 1 000 events is noisy (large StdDev - a fixed per-flush overhead dominates). The 10 000-event rows are the reliable throughput signal. - Target to beat - YetAnother 4.0.5: Simple·10k = 4.63 MB / 20.67 ms; Exception·10k = 70.09 MB / 126.77 ms (see Benchmarks).
Ten changes (#3, #7, #2, #5, #6, #1, #8, #9, #10) applied one at a time, each validated against the full test suite. End-to-end sink, 10 000 events, vs baseline and vs YetAnother:
| Workload | Baseline | Final | Δ baseline | YetAnother | v9 vs YA |
|---|---|---|---|---|---|
| Simple - Allocated | 19.48 MB | 1.36 MB | −93% | 4.63 MB | 3.4× less |
| Simple - Mean | 26.60 ms | 17.66 ms | −34% | 20.67 ms | 1.17× faster |
| Exception - Allocated | 133.57 MB | 66.21 MB | −50% | 70.09 MB | 1.06× less |
| Exception - Mean | 188.42 ms | 121.03 ms | −36% | 126.77 ms | 1.05× faster |
v9 now beats YetAnother on both workloads, on both allocation and throughput. Biggest wins:
the IEqualityComparer grouping (#1, removed the per-event F# Map + structural-hash boxing) and
caching ex.StackTrace (#10, halved exception allocation). All output behavior preserved
(WireFormatTests + Loki roundtrip integration tests unchanged), bar one deliberate refinement:
DateTime/DateTimeOffset body values now render as ISO 8601 / invariant (#7).
| Step | Change | Simple·1k | Simple·10k | Exc·1k | Exc·10k |
|---|---|---|---|---|---|
| - | Baseline (1ca0904) |
1.94 | 19.48 | 13.33 | 133.57 |
| 1 | #3 timestamp via Utf8Formatter
|
1.91 | 19.03 | 13.29 | 133.12 |
| 2 | #7 typed Guid/DateTime/DTO writes | 1.79 | 18.11 | 13.30 | 133.12 |
| 3 | #2 message via RenderMessage(TextWriter)
|
1.69 | 16.86 | 13.23 | 132.21 |
| 4 | #5 Events list→array, in-place sort |
1.17 | 11.44 | 12.68 | 126.79 |
| 5 | #6 metadata written direct (no Map) |
1.11 | 11.21 | 12.67 | 126.57 |
| 6 | #1 comparer grouping (no per-event Map) |
0.26 | 2.51 | 11.79 | 117.86 |
| 7 | #8 reuse body Utf8JsonWriter
|
0.14 | 1.36 | 11.68 | 116.72 |
| 8 | #9 hoist formatter dispatch (CPU only) | 0.14 | 1.36 | 11.68 | 116.72 |
| 9 | #10 cache StackTrace/Source
|
0.14 | 1.36 | 6.61 | 66.21 |
| - | YetAnother 4.0.5 (target) | 0.47 | 4.63 | 7.00 | 70.09 |
Final vs YetAnother: Simple·10k 1.36 MB vs 4.63 (3.4× less); Exception·10k 66.21 MB vs 70.09 (1.06× less) - v9 now wins both.
| Step | Change | Simple·1k | Simple·10k | Exc·1k | Exc·10k |
|---|---|---|---|---|---|
| - | Baseline (1ca0904) |
7.60 | 26.60 | 25.02 | 188.42 |
| 1 | #3 timestamp via Utf8Formatter
|
6.24 | 26.26 | 23.21 | 188.00 |
| 2 | #7 typed Guid/DateTime/DTO writes | 6.54 | 26.21 | 23.96 | 191.09 |
| 3 | #2 message via RenderMessage(TextWriter)
|
6.92 | 26.82 | 28.37 | 191.96 |
| 4 | #5 Events list→array, in-place sort |
7.54 | 22.97 | 27.64 | 185.83 |
| 5 | #6 metadata written direct (no Map) |
6.38 | 22.95 | 23.29 | 185.15 |
| 6 | #1 comparer grouping (no per-event Map) |
5.52 | 18.41 | 27.74 | 181.35 |
| 7 | #8 reuse body Utf8JsonWriter
|
5.44 | 18.37 | 50.50† | 182.73 |
| 8 | #9 hoist formatter dispatch (CPU only) | 5.40 | 18.60 | 19.75 | 178.08 |
| 9 | #10 cache StackTrace/Source
|
5.35 | 17.66 | 33.52 | 121.03 |
| - | YetAnother 4.0.5 (target) | 6.65 | 20.67 | 34.95 | 126.77 |
† Exc·1k mean is a noise outlier (that case has very high StdDev across all runs); use the Exc·10k column.
| Step | Change | Simple alloc | Simple mean | Exc alloc | Exc mean |
|---|---|---|---|---|---|
| - | Baseline (1ca0904) |
1.84 MB | 1.200 ms | 13.65 MB | 12.778 ms |
| 1 | #3 timestamp via Utf8Formatter
|
1.84 MB | 1.156 ms | 13.65 MB | 12.551 ms |
| 2 | #7 typed Guid/DateTime/DTO writes | 1.75 MB | 1.152 ms | 13.65 MB | 13.235 ms |
| 3 | #2 message via RenderMessage(TextWriter)
|
1.71 MB | 1.283 ms | 13.65 MB | 13.066 ms |
| 4 | #5 Events list→array, in-place sort |
1.71 MB | 1.263 ms | 13.65 MB | 13.220 ms |
| 5 | #6 metadata written direct (no Map) |
1.71 MB | 1.238 ms | 13.65 MB | 13.078 ms |
| 6 | #1 comparer grouping (no per-event Map) |
1.71 MB | 1.245 ms | 13.65 MB | 12.731 ms |
| 7 | #8 reuse body Utf8JsonWriter
|
1.71 MB | 1.250 ms | 13.65 MB | 12.773 ms |
| 8 | #9 hoist formatter dispatch | 1.71 MB | 1.245 ms | 13.65 MB | 12.9 ms |
| 9 | #10 cache StackTrace/Source
|
1.71 MB | 1.239 ms | 8.60 MB | 7.162 ms |
-
#3
Utf8Formattertimestamp - replacestring(ns)withUtf8Formatter.TryFormatinto a stack buffer. -
#7 Typed
WriteStringValuefor Guid/DateTime/DateTimeOffset inwriteValue. -
#2 Render
MessageviaRenderMessage(TextWriter)into a reused UTF-8 buffer (drop theStringWriter/string). -
#5
LokiStream.Events: F#list→ .NET list/array, in-place sort (dropSeq.sortBy+List.ofSeqcons cells). -
#4/#6 Kill per-event
Array.foldclosures + the per-event metadataMap(write metadata directly). -
#1 Group via
IEqualityComparer<LogEvent>; build the label set once per stream not per event (drop the per-eventMap+ structural-hash enumerators). -
#8 Reuse
Utf8JsonWriterinstances viaReset()(body + envelope). - #9 Hoist the per-event formatter type-check out of the batch loop (micro).
BenchmarkDotNet v0.15.8, AMD Ryzen 9 3950X, .NET 8.0.27, fake in-process transport.
End-to-end sink (SinkBenchmarks.Push)
| EventCount | Payload | Mean | Allocated |
|---|---|---|---|
| 1000 | Exception | 25.019 ms | 13.33 MB |
| 1000 | Simple | 7.599 ms | 1.94 MB |
| 10000 | Exception | 188.420 ms | 133.57 MB |
| 10000 | Simple | 26.604 ms | 19.48 MB |
Per-event formatter (FormatterBenchmarks, 1 000 events)
| Method | Mean | Allocated |
|---|---|---|
| Format_Simple | 1.200 ms | 1.84 MB |
| Format_Exception | 12.778 ms | 13.65 MB |
Tests: 98/98 unit · 11/11 integration ✅
Serialization.fs: replaced WriteStringValue(string (toUnixNanoseconds ts)) (a per-event
System.String from the F# string operator) with Utf8Formatter.TryFormat into a stack
buffer (NativePtr.stackalloc<byte> 20, hoisted once per batch out of the event loop) written
via WriteStringValue(ReadOnlySpan<byte>). Zero allocation, culture-invariant digits.
| EventCount · Payload | Alloc | Δ vs base | Mean | Δ |
|---|---|---|---|---|
| Simple · 10000 | 19.03 MB | −0.45 MB (−2.3%) | 26.26 ms | −0.34 ms |
| Simple · 1000 | 1.91 MB | −0.03 MB | 6.24 ms | (noisy) |
| Exception · 10000 | 133.12 MB | −0.45 MB | 188.00 ms | ~flat |
| Exception · 1000 | 13.29 MB | −0.04 MB | 23.21 ms | (noisy) |
~45 B/event saved (the timestamp string), as predicted. Formatter group unchanged (the timestamp is emitted by the serializer, not the formatter) - confirms the groups isolate cleanly.
Tests: 98/98 unit · 11/11 integration ✅
LokiJsonTextFormatter.writeValue: added :? Guid, :? DateTime, :? DateTimeOffset
cases using the typed Utf8JsonWriter overloads instead of the v.ToString() fallback.
Guid output is byte-identical (D format). Behavior refinement: DateTime/DateTimeOffset
now emit ISO 8601 / invariant (round-trippable) instead of the previous culture-dependent
ToString() - consistent with how renderLabelValue already renders labels. 3 new
WireFormatTests lock this (Guid exact; DateTime/DTO round-trip).
| EventCount · Payload | Alloc | Δ vs base | cumulative Δ |
|---|---|---|---|
| Simple · 10000 | 18.11 MB | −0.92 MB | −1.37 MB (−7.0%) |
| Simple · 1000 | 1.79 MB | −0.12 MB | −0.15 MB |
| Exception · 10000 | 133.12 MB | 0 (no Guid/DateTime props) | −0.45 MB |
~92 B/event = the CorrelationId Guid string in the Simple workload. Formatter Simple
1.84 → 1.75 MB confirms it's formatter-side. Exception flat, as expected.
Tests: 101/101 unit (+3) · 11/11 integration ✅
Utf8TextWriter gained WrittenSpan + Clear(). LokiJsonTextFormatter.FormatToBuffer now
takes a caller-owned Utf8TextWriter, renders the message into it via
LogEvent.RenderMessage(TextWriter, InvariantCulture), and emits the bytes as the JSON Message
value - dropping the per-event StringWriter + StringBuilder + backing char[] + final
String. The sink owns one reusable messageBuffer/messageWriter (serial use → safe);
the public Format passes a throwaway pair (stays thread-safe).
| EventCount · Payload | Alloc | Δ vs prev | cumulative Δ |
|---|---|---|---|
| Simple · 10000 | 16.86 MB | −1.25 MB | −2.62 MB (−13.4%) |
| Simple · 1000 | 1.69 MB | −0.10 MB | −0.25 MB |
| Exception · 10000 | 132.21 MB | −0.91 MB | −1.36 MB |
~125 B/event (Simple) - smaller than first estimated; Serilog's RenderMessage(string) wrapper
is lighter than assumed and the residual cost is the per-token int.ToString Serilog does
internally (unchanged, not worth chasing). The formatter group barely moves: its public path
is dominated by the body Encoding.UTF8.GetString it still performs.
Tests: 101/101 unit · 11/11 integration ✅
Grouping.fs: Events: LogEvent list → LogEvent[]; evs |> Seq.sortBy (fun e -> e.Timestamp) |> List.ofSeq → Seq.toArray + Array.sortInPlaceWith byTimestamp, where
byTimestamp a b = a.Timestamp.CompareTo b.Timestamp.
| EventCount · Payload | Alloc | Δ vs prev | Mean | Δ |
|---|---|---|---|---|
| Simple · 10000 | 11.44 MB | −5.42 MB | 22.97 ms | −3.85 ms (−14%) |
| Simple · 1000 | 1.17 MB | −0.52 MB | 7.54 ms | (noisy) |
| Exception · 10000 | 126.79 MB | −5.42 MB | 185.83 ms | −6.1 ms |
Far bigger than the cons-cell estimate, and the reason matters: the old
Seq.sortBy (fun e -> e.Timestamp) sorted through F#'s generic structural comparison, which
boxes the DateTimeOffset key on every comparison - O(n log n) boxes per stream (~8 MB of
garbage at 10 000 events). The strongly-typed DateTimeOffset.CompareTo is boxing-free, so the
whole pile vanished - and the boxing removal is why Mean dropped 14% too. The F# list's
per-event cons cell was only a minor part. Formatter group unchanged (sort is serializer-side).
Cumulative Simple·10k: 19.48 → 11.44 MB (−41%), 26.60 → 22.97 ms (−14%).
Tests: 101/101 unit · 11/11 integration ✅
LokiSink gained writeMetadata: Utf8JsonWriter -> LogEvent -> unit (lazy object open, pre-encoded
TraceId/SpanId names); Serialization.serialize calls it inline instead of materializing a
Map<string,string> per event. buildStructuredMetadata deleted from Labels.fs.
| EventCount · Payload | Alloc | Δ vs prev |
|---|---|---|
| Simple · 10000 | 11.21 MB | −0.23 MB |
| Exception · 10000 | 126.57 MB | −0.22 MB |
~23 B/event - the leftover per-event Array.fold closure from the empty-metadata default path.
(The bench configures no metadata, so the bigger saving - dropping the per-event Map build when
metadata is enabled - isn't reflected here.) Formatter unchanged (metadata is serializer-side).
Tests: 101/101 unit · 11/11 integration ✅
Grouping.fs: new LabelEqualityComparer (hashes Level + effective label-property values off
the event via System.HashCode, boxing-free). groupIntoStreams now keys a
Dictionary<LogEvent, ResizeArray<LogEvent>>(comparer) and builds each stream's LabelSet once
from its head via labelOf, instead of Seq.groupBy over a freshly-built Map per event. The
sink precomputes effectiveLabelProps (label property names minus reserved keys) for the comparer.
buildLabelSet/LabelSet/LabelsTests are unchanged; GroupingTests rewritten to drive the
comparer (+3 comparer tests).
| EventCount · Payload | Alloc | Δ vs prev | Mean | Δ |
|---|---|---|---|---|
| Simple · 10000 | 2.51 MB | −8.70 MB | 18.41 ms | −4.54 ms (−20%) |
| Simple · 1000 | 0.26 MB | −0.85 MB | 5.52 ms | −0.86 ms |
| Exception · 10000 | 117.86 MB | −8.71 MB | 181.35 ms | −3.8 ms |
| Exception · 1000 | 11.79 MB | −0.88 MB | (noisy) |
~870 B/event removed on Simple - far more than estimated, because the old Seq.groupBy keyed
on an F# Map paid (a) the per-event Map.add tree nodes and (b) Map's structural
GetHashCode/Equals, which enumerate the tree via heap MapTreeEnumerators on every event.
All gone. The −20% on Mean is the structural hashing disappearing.
Milestone: v9 now beats YetAnother on the Simple workload - 2.51 MB vs 4.63 MB (1.8× less
allocation) and 18.41 ms vs 20.67 ms (~11% faster). Cumulative Simple·10k: 19.48 → 2.51 MB
(−87%), 26.60 → 18.41 ms (−31%). Exception still trails YetAnother (117.9 vs 70.1 MB) -
dominated by per-event ex.StackTrace materialization (both pay it); see notes below.
Tests: 104/104 unit (+3) · 11/11 integration ✅
After step 9 the perf changes + the structured-metadata commit went through /code-review and
/simplify (9 specialist agents). Applied: comment corrections; metadata property keys
precomputed + pre-encoded once at construction and de-duplicated (Array.distinct, fixing a
duplicate-JSON-key regression vs the committed feature); a LabelEqualityComparer.ForLabels
factory + shared isReservedLabelKey predicate + null guard; a SerializationBuffers bundle
(serialize 9→6 params); and 4 new tests (multi-event body correctness, multi-property comparer,
non-string + duplicate metadata). Re-benchmarked to confirm no regression:
| EventCount · Payload | Alloc (post) | vs step 9 | Mean (post) | vs step 9 |
|---|---|---|---|---|
| Simple · 10000 | 1.36 MB | identical | 18.45 ms | +0.8 ms (noise) |
| Simple · 1000 | 0.14 MB | identical | 5.39 ms | ~flat |
| Exception · 10000 | 66.21 MB | identical (±0.003%) | 121.82 ms | +0.8 ms (noise) |
| Exception · 1000 | 6.62 MB | ~flat | 33.56 ms | ~flat |
Byte-identical allocation, time within noise - the changes touch construction, param bundling, comments, and the (bench-inactive) metadata path, not the measured hot path. Final standings vs YetAnother unchanged: Simple 1.36 MB vs 4.63 (3.4×); Exception 66.2 MB vs 70.1 - v9 wins both.
Tests: 108/108 unit (+4) · 11/11 integration ✅ (all green on net8.0/net9.0/net10.0)
FormatToBuffer now takes a caller-owned Utf8JsonWriter (Reset() at start, Flush() at end)
instead of new-ing one per event. The sink owns one bodyJsonWriter (over bodyBuffer, reused
across events); the public Format passes a throwaway. (The envelope writer stays per-batch - one
allocation, negligible.)
| EventCount · Payload | Alloc | Δ vs prev |
|---|---|---|
| Simple · 10000 | 1.36 MB | −1.17 MB |
| Simple · 1000 | 0.14 MB | −0.12 MB |
| Exception · 10000 | 116.72 MB | −1.14 MB |
~117 B/event - the Utf8JsonWriter object, as estimated. Mean is flat (Gen0 was already 0 on
Simple, so this wasn't GC-bound). Formatter group unchanged - its public path intentionally still
uses throwaway writers. v9 Simple·10k is now 1.36 MB vs YetAnother 4.63 MB - 3.4× less.
Tests: 104/104 unit · 11/11 integration ✅
Serialization.serialize now resolves the built-in-vs-custom formatter once (a typed reference,
null ⇒ custom), replacing the per-event isinst + GetType() match with a single reference check.
writeBody folded inline.
Allocations identical to step 7 (1.36 MB / 116.72 MB) - this is a CPU-only change. Mean
moves within noise. Kept for code quality / removing per-event work; not an allocation win.
Tests: 104/104 unit · 11/11 integration ✅
Not in the original list - found while inspecting the Exception path. LokiExceptionFormatter.Write
read ex.StackTrace twice (String.IsNullOrEmpty ex.StackTrace then WriteString(…, ex.StackTrace))
and ex.Source twice. Exception.StackTrace rebuilds the full stack-trace string on every access,
so each exception (and every nested inner exception) materialized its stack trace twice. Now read once
into a local. YetAnother reads it once - this was the bulk of the remaining Exception gap.
| EventCount · Payload | Alloc | Δ vs prev | Mean | Δ |
|---|---|---|---|---|
| Exception · 10000 | 66.21 MB | −50.51 MB (−43%) | 121.03 ms | −57 ms (−32%) |
| Exception · 1000 | 6.61 MB | −5.07 MB | (noisy) | |
| Simple (both) | unchanged | - | unchanged | - |
The per-event formatter group shows it cleanly: Format_Exception 13.65 → 8.60 MB, 12.77 → 7.16 ms. v9 now also beats YetAnother on Exception (66.21 MB / 121 ms vs 70.09 MB / 127 ms).
Tests: 104/104 unit · 11/11 integration ✅