Deduplicate types in .elmi binary serialization#94
Closed
CharlonTank wants to merge 1 commit intolamdera:lamdera-nextfrom
Closed
Deduplicate types in .elmi binary serialization#94CharlonTank wants to merge 1 commit intolamdera:lamdera-nextfrom
CharlonTank wants to merge 1 commit intolamdera:lamdera-nextfrom
Conversation
The Binary instance for Interface now interns all Can.Type subtrees into a pool before serialization. Each unique type is stored once, and references use Word32 indices. This eliminates massive redundancy when exported values share large type aliases (FrontendModel, BackendModel, Effect.Test types). On a real project with 20 exports referencing types with 100+ field records, .elmi dropped from 227 MB to 151 KB (1500x reduction). Cold build time for tests dropped from 188s to 150s. The new format is signaled by a 0x00 magic first byte. Old-format .elmi files are detected and deserialized via the original path, so cached artifacts from older compilers are rebuilt transparently.
Contributor
Author
|
Superseded by #96, which fixes the perf regression of this approach. The Map-based intern pool in this PR slowed cold builds by ~16% because every |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Can.Typesubtrees into a deduplication pool before serializing.elmifilesWord32indices.elmifiles are detected via magic byte and deserialized via the original pathProblem
When exported values share large monomorphic type aliases (records with many fields that transitively expand into other large records), the same type expansion is serialized inline for every export.
For a module that exposes many helpers all taking or returning a record alias with 100+ transitive fields, the same fully-expanded form is repeated dozens of times in the
.elmi, producing files in the hundreds of MB and slowing the cold build correspondingly.Results
Measured on a project with 391 modules and a few large model record aliases (Lamdera-style
FrontendModel/BackendModelpatterns withEffect.Testsetups):.elmi(20 exports of helpers over big aliases).elmiMost of the saving comes from killing redundant expansion of the same
TAlias (Filled ...)subtrees across many exports. The dedup is purely a serialization-format change; the in-memoryInterfaceafter deserialization is identical to the original.Test plan
elm-test-rssuite passes.elmiproduces an identical file size.elmi/artifacts.datfiles from previous compilers are read correctly via the fallback pathtest/scenario-*projects compile