Skip to content

Deduplicate types in .elmi binary serialization#94

Closed
CharlonTank wants to merge 1 commit intolamdera:lamdera-nextfrom
CharlonTank:perf/elmi-type-dedup
Closed

Deduplicate types in .elmi binary serialization#94
CharlonTank wants to merge 1 commit intolamdera:lamdera-nextfrom
CharlonTank:perf/elmi-type-dedup

Conversation

@CharlonTank
Copy link
Copy Markdown
Contributor

@CharlonTank CharlonTank commented Apr 16, 2026

Summary

  • Intern all Can.Type subtrees into a deduplication pool before serializing .elmi files
  • Each unique type is stored once; references use Word32 indices
  • Backward compatible: old-format .elmi files are detected via magic byte and deserialized via the original path

Problem

When exported values share large monomorphic type aliases (records with many fields that transitively expand into other large records), the same type expansion is serialized inline for every export.

For a module that exposes many helpers all taking or returning a record alias with 100+ transitive fields, the same fully-expanded form is repeated dozens of times in the .elmi, producing files in the hundreds of MB and slowing the cold build correspondingly.

Results

Measured on a project with 391 modules and a few large model record aliases (Lamdera-style FrontendModel/BackendModel patterns with Effect.Test setups):

Metric Before After Improvement
Largest .elmi (20 exports of helpers over big aliases) 227 MB 151 KB ~1500x smaller
Second-largest .elmi 27 MB 150 KB ~180x smaller
Cold build (full test suite) 188s 150s -20%
Type-check time of bottleneck module 178s 109s -39%

Most of the saving comes from killing redundant expansion of the same TAlias (Filled ...) subtrees across many exports. The dedup is purely a serialization-format change; the in-memory Interface after deserialization is identical to the original.

Test plan

  • Cold build of large project succeeds
  • Warm build (no changes) completes instantly
  • Incremental build (touch one file) works correctly
  • Project's elm-test-rs suite passes
  • Round-trip: re-encoding a deserialized .elmi produces an identical file size
  • Old-format .elmi / artifacts.dat files from previous compilers are read correctly via the fallback path
  • Multiple test/scenario-* projects compile
  • App build (Frontend + Backend) unaffected on warm cache (~0.15s)

The Binary instance for Interface now interns all Can.Type subtrees
into a pool before serialization. Each unique type is stored once,
and references use Word32 indices. This eliminates massive redundancy
when exported values share large type aliases (FrontendModel,
BackendModel, Effect.Test types).

On a real project with 20 exports referencing types with 100+ field
records, .elmi dropped from 227 MB to 151 KB (1500x reduction).
Cold build time for tests dropped from 188s to 150s.

The new format is signaled by a 0x00 magic first byte. Old-format
.elmi files are detected and deserialized via the original path,
so cached artifacts from older compilers are rebuilt transparently.
@CharlonTank
Copy link
Copy Markdown
Contributor Author

Superseded by #96, which fixes the perf regression of this approach. The Map-based intern pool in this PR slowed cold builds by ~16% because every HashMap.lookup hashed the full Can.Type subtree it was looking up — a self-defeating O(N²) walk. PR #96 replaces it with a Shape-based bottom-up intern (children are already Word32 IDs by the time they enter the lookup key), keeping the same .elmi size reduction (30x) and adding two orthogonal wins (solver memoization + RTS nursery bump) for a combined -61% cold build time. Closing this in favor of #96.

@CharlonTank CharlonTank deleted the perf/elmi-type-dedup branch April 25, 2026 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant