Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upreduce space required by idna data tables #291
Merged
+13,612
−8,354
Conversation
The `Range` type defined by uts46 takes up 20 bytes on a 32-bit platform and 32 bytes on a 64-bit platform. This bloat is due to `Mapping` using a full word-width discriminant for its type and native string slices, which are unnecessarily large. The use of native string slices in this context also require many load-time relocations on some platforms when idna is used in a static library, another unnecessary cost. Instead of this scheme, we can define our own string slice type that indexes into a generated table. This string slice type enables us to use a smaller discriminant type for `Mapping`, and therefore reduces the size of `Range` to 16 bytes on a 32-bit platform and 16 bytes on a 64-bit platform. For x86-64 Linux, this change reduces total idna size by ~130K, a significant savings. Changing the script to use unicode escapes makes things slightly more readable, enables better eyeball comparisons of the generated tables to the original mapping table, and makes syntax highlighting in emacs somewhat faster.
|
Looks good, thanks! @bors-servo r+ |
|
|
bors-servo
added a commit
that referenced
this pull request
Apr 3, 2017
reduce space required by idna data tables The `Range` type defined by uts46 takes up 20 bytes on a 32-bit platform and 32 bytes on a 64-bit platform. This bloat is due to `Mapping` using a full word-width discriminant for its type and native string slices, which are unnecessarily large. The use of native string slices in this context also require many load-time relocations on some platforms when idna is used in a static library, another unnecessary cost. Instead of this scheme, we can define our own string slice type that indexes into a generated table. This string slice type enables us to use a smaller discriminant type for `Mapping`, and therefore reduces the size of `Range` to 16 bytes on a 32-bit platform (I think) and 16 bytes on a 64-bit platform. For x86-64 Linux, this change reduces total idna size by ~130K, a significant savings, and makes more of idna read-only. Further reductions are possible by splitting the table into smaller variants for the basic multilingual plane vs. everything else; I will leave that reduction for a different PR. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-url/291) <!-- Reviewable:end -->
|
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
froydnj commentedMar 23, 2017
•
edited by larsbergstrom
The
Rangetype defined by uts46 takes up 20 bytes on a 32-bit platform and 32 bytes on a 64-bit platform. This bloat is due toMappingusing a full word-width discriminant for its type and native string slices, which are unnecessarily large. The use of native string slices in this context also require many load-time relocations on some platforms when idna is used in a static library, another unnecessary cost.Instead of this scheme, we can define our own string slice type that indexes into a generated table. This string slice type enables us to use a smaller discriminant type for
Mapping, and therefore reduces thesize of
Rangeto 16 bytes on a 32-bit platform (I think) and 16 bytes on a 64-bit platform. For x86-64 Linux, this change reduces total idna size by ~130K, a significant savings, and makes more of idna read-only.Further reductions are possible by splitting the table into smaller variants for the basic multilingual plane vs. everything else; I will leave that reduction for a different PR.
This change is