Skip to content

Reduce V2 serializer allocations by ~23% via fast path#583

Merged
jhollinger merged 2 commits intoprocore-oss:jh/release-2.0-fasterfrom
scottmyron:sm/release-2.0-faster-tweaks
Apr 28, 2026
Merged

Reduce V2 serializer allocations by ~23% via fast path#583
jhollinger merged 2 commits intoprocore-oss:jh/release-2.0-fasterfrom
scottmyron:sm/release-2.0-faster-tweaks

Conversation

@scottmyron
Copy link
Copy Markdown

Add a context-free serialization path for the common case where no extension hooks, conditionals, default values, formatters, or Proc extractors are configured.

Previously, every call to serialize allocated a Context::Field and a Context::Parent struct unconditionally — even when nothing in the serialization loop ever read them. Together these accounted for ~22% of all object allocations and caused V2 to trigger 2× more GC runs than V1 under the same workload.

Changes:

  • Add Extractors::Property.extract_simple to extract field values directly from an object/hash without a Context::Field
  • Precompute @needs_field_ctx at blueprint load time in find_used_hooks! (requires finalize_fields! to run first, hence the reorder in initialize)
  • Branch to serialize_fast when @needs_field_ctx is false, skipping the Context::Field allocation entirely
  • Make Context::Parent lazy in both paths — created at most once per serialize call, only when an association field is actually encountered

Result on 500 widgets × 50 iterations (30 fields, 10 object associations, 5 collection associations):

  • Allocations: −23% (3.3M → 2.56M objects)
  • GC runs: −26% (85 → 63)
  • Context::Field: eliminated from hot path (75k → 0 samples)
  • Context::Parent: eliminated from hot path (75k → ~10 samples)

V2 now allocates fewer objects than V1 for the common case.

Made-with: Cursor

Checklist:

  • I have updated the necessary documentation
  • I have signed off all my commits as required by DCO
  • My build is green

Add a context-free serialization path for the common case where no
extension hooks, conditionals, default values, formatters, or Proc
extractors are configured.

Previously, every call to `serialize` allocated a `Context::Field` and
a `Context::Parent` struct unconditionally — even when nothing in the
serialization loop ever read them. Together these accounted for ~22% of
all object allocations and caused V2 to trigger 2× more GC runs than V1
under the same workload.

Changes:
- Add `Extractors::Property.extract_simple` to extract field values
  directly from an object/hash without a Context::Field
- Precompute `@needs_field_ctx` at blueprint load time in
  `find_used_hooks!` (requires `finalize_fields!` to run first, hence
  the reorder in `initialize`)
- Branch to `serialize_fast` when `@needs_field_ctx` is false, skipping
  the Context::Field allocation entirely
- Make `Context::Parent` lazy in both paths — created at most once per
  `serialize` call, only when an association field is actually encountered

Result on 500 widgets × 50 iterations (30 fields, 10 object associations,
5 collection associations):
  - Allocations:      −23% (3.3M → 2.56M objects)
  - GC runs:          −26% (85 → 63)
  - Context::Field:   eliminated from hot path (75k → 0 samples)
  - Context::Parent:  eliminated from hot path (75k → ~10 samples)

V2 now allocates fewer objects than V1 for the common case.

Made-with: Cursor
@scottmyron scottmyron requested review from a team and ritikesh as code owners April 24, 2026 19:41
@jhollinger
Copy link
Copy Markdown
Contributor

Thanks @scottmyron! I'm still studying it but there's definitely something here. I do wonder if we could keep a single serialize method and conditionally create the Field Context (like you're doing with Parent Context)? Could ease maintenance and testing. Just a thought - YMMV.

Also, something we've been talking about is better automation around speed/memory/etc checks to prevent future regressions. (V1 saw huge perf degradation over time b/c nothing was checking it.) Would you be able to push up your memory perf code so we could eventually incorporate it into whatever system we eventually come up with? Automatically seeing "this PR adds a bunch of allocations" would be invaluable IMO.

@jhollinger
Copy link
Copy Markdown
Contributor

Going to merge this and riff on it a bit. Thanks!

@jhollinger jhollinger merged commit 30abb0f into procore-oss:jh/release-2.0-faster Apr 28, 2026
1 check failed
jhollinger added a commit that referenced this pull request Apr 28, 2026
Reduce V2 serializer allocations by ~23% via fast path
jhollinger added a commit that referenced this pull request Apr 28, 2026
Reduce V2 serializer allocations by ~23% via fast path
@scottmyron
Copy link
Copy Markdown
Author

Apologies for the delay... I'll get another PR going soon with the memory profiling turned into a Github Action (hopefully).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants