Deduplicate types in .elmi binary serialization by CharlonTank · Pull Request #94 · lamdera/compiler

CharlonTank · 2026-04-16T14:42:51Z

Summary

Intern all Can.Type subtrees into a deduplication pool before serializing .elmi files
Each unique type is stored once; references use Word32 indices
Backward compatible: old-format .elmi files are detected via magic byte and deserialized via the original path

Problem

When exported values share large monomorphic type aliases (records with many fields that transitively expand into other large records), the same type expansion is serialized inline for every export.

For a module that exposes many helpers all taking or returning a record alias with 100+ transitive fields, the same fully-expanded form is repeated dozens of times in the .elmi, producing files in the hundreds of MB and slowing the cold build correspondingly.

Results

Measured on a project with 391 modules and a few large model record aliases (Lamdera-style FrontendModel/BackendModel patterns with Effect.Test setups):

Metric	Before	After	Improvement
Largest `.elmi` (20 exports of helpers over big aliases)	227 MB	151 KB	~1500x smaller
Second-largest `.elmi`	27 MB	150 KB	~180x smaller
Cold build (full test suite)	188s	150s	-20%
Type-check time of bottleneck module	178s	109s	-39%

Most of the saving comes from killing redundant expansion of the same TAlias (Filled ...) subtrees across many exports. The dedup is purely a serialization-format change; the in-memory Interface after deserialization is identical to the original.

Test plan

Cold build of large project succeeds
Warm build (no changes) completes instantly
Incremental build (touch one file) works correctly
Project's elm-test-rs suite passes
Round-trip: re-encoding a deserialized .elmi produces an identical file size
Old-format .elmi / artifacts.dat files from previous compilers are read correctly via the fallback path
Multiple test/scenario-* projects compile
App build (Frontend + Backend) unaffected on warm cache (~0.15s)

The Binary instance for Interface now interns all Can.Type subtrees into a pool before serialization. Each unique type is stored once, and references use Word32 indices. This eliminates massive redundancy when exported values share large type aliases (FrontendModel, BackendModel, Effect.Test types). On a real project with 20 exports referencing types with 100+ field records, .elmi dropped from 227 MB to 151 KB (1500x reduction). Cold build time for tests dropped from 188s to 150s. The new format is signaled by a 0x00 magic first byte. Old-format .elmi files are detected and deserialized via the original path, so cached artifacts from older compilers are rebuilt transparently.

CharlonTank · 2026-04-17T04:55:50Z

Superseded by #96, which fixes the perf regression of this approach. The Map-based intern pool in this PR slowed cold builds by ~16% because every HashMap.lookup hashed the full Can.Type subtree it was looking up — a self-defeating O(N²) walk. PR #96 replaces it with a Shape-based bottom-up intern (children are already Word32 IDs by the time they enter the lookup key), keeping the same .elmi size reduction (30x) and adding two orthogonal wins (solver memoization + RTS nursery bump) for a combined -61% cold build time. Closing this in favor of #96.

CharlonTank marked this pull request as draft April 17, 2026 01:22

CharlonTank mentioned this pull request Apr 17, 2026

perf: 60% faster cold builds via type dedup, solver memoization, and a larger GC nursery #96

Open

CharlonTank closed this Apr 17, 2026

CharlonTank deleted the perf/elmi-type-dedup branch April 25, 2026 08:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplicate types in .elmi binary serialization#94

Deduplicate types in .elmi binary serialization#94
CharlonTank wants to merge 1 commit intolamdera:lamdera-nextfrom
CharlonTank:perf/elmi-type-dedup

CharlonTank commented Apr 16, 2026 •

edited

Loading

Uh oh!

CharlonTank commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CharlonTank commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Results

Test plan

Uh oh!

CharlonTank commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CharlonTank commented Apr 16, 2026 •

edited

Loading