Reduce V2 serializer allocations by ~23% via fast path#583
Conversation
Add a context-free serialization path for the common case where no extension hooks, conditionals, default values, formatters, or Proc extractors are configured. Previously, every call to `serialize` allocated a `Context::Field` and a `Context::Parent` struct unconditionally — even when nothing in the serialization loop ever read them. Together these accounted for ~22% of all object allocations and caused V2 to trigger 2× more GC runs than V1 under the same workload. Changes: - Add `Extractors::Property.extract_simple` to extract field values directly from an object/hash without a Context::Field - Precompute `@needs_field_ctx` at blueprint load time in `find_used_hooks!` (requires `finalize_fields!` to run first, hence the reorder in `initialize`) - Branch to `serialize_fast` when `@needs_field_ctx` is false, skipping the Context::Field allocation entirely - Make `Context::Parent` lazy in both paths — created at most once per `serialize` call, only when an association field is actually encountered Result on 500 widgets × 50 iterations (30 fields, 10 object associations, 5 collection associations): - Allocations: −23% (3.3M → 2.56M objects) - GC runs: −26% (85 → 63) - Context::Field: eliminated from hot path (75k → 0 samples) - Context::Parent: eliminated from hot path (75k → ~10 samples) V2 now allocates fewer objects than V1 for the common case. Made-with: Cursor
|
Thanks @scottmyron! I'm still studying it but there's definitely something here. I do wonder if we could keep a single Also, something we've been talking about is better automation around speed/memory/etc checks to prevent future regressions. (V1 saw huge perf degradation over time b/c nothing was checking it.) Would you be able to push up your memory perf code so we could eventually incorporate it into whatever system we eventually come up with? Automatically seeing "this PR adds a bunch of allocations" would be invaluable IMO. |
|
Going to merge this and riff on it a bit. Thanks! |
Reduce V2 serializer allocations by ~23% via fast path
Reduce V2 serializer allocations by ~23% via fast path
|
Apologies for the delay... I'll get another PR going soon with the memory profiling turned into a Github Action (hopefully). |
Add a context-free serialization path for the common case where no extension hooks, conditionals, default values, formatters, or Proc extractors are configured.
Previously, every call to
serializeallocated aContext::Fieldand aContext::Parentstruct unconditionally — even when nothing in the serialization loop ever read them. Together these accounted for ~22% of all object allocations and caused V2 to trigger 2× more GC runs than V1 under the same workload.Changes:
Extractors::Property.extract_simpleto extract field values directly from an object/hash without a Context::Field@needs_field_ctxat blueprint load time infind_used_hooks!(requiresfinalize_fields!to run first, hence the reorder ininitialize)serialize_fastwhen@needs_field_ctxis false, skipping the Context::Field allocation entirelyContext::Parentlazy in both paths — created at most once perserializecall, only when an association field is actually encounteredResult on 500 widgets × 50 iterations (30 fields, 10 object associations, 5 collection associations):
V2 now allocates fewer objects than V1 for the common case.
Made-with: Cursor
Checklist: