Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce space required by idna data tables #291

Merged
merged 1 commit into from Apr 3, 2017

Conversation

@froydnj
Copy link
Contributor

froydnj commented Mar 23, 2017

The Range type defined by uts46 takes up 20 bytes on a 32-bit platform and 32 bytes on a 64-bit platform. This bloat is due to Mapping using a full word-width discriminant for its type and native string slices, which are unnecessarily large. The use of native string slices in this context also require many load-time relocations on some platforms when idna is used in a static library, another unnecessary cost.

Instead of this scheme, we can define our own string slice type that indexes into a generated table. This string slice type enables us to use a smaller discriminant type for Mapping, and therefore reduces the
size of Range to 16 bytes on a 32-bit platform (I think) and 16 bytes on a 64-bit platform. For x86-64 Linux, this change reduces total idna size by ~130K, a significant savings, and makes more of idna read-only.

Further reductions are possible by splitting the table into smaller variants for the basic multilingual plane vs. everything else; I will leave that reduction for a different PR.


This change is Reviewable

The `Range` type defined by uts46 takes up 20 bytes on a 32-bit platform
and 32 bytes on a 64-bit platform.  This bloat is due to `Mapping` using
a full word-width discriminant for its type and native string slices,
which are unnecessarily large.  The use of native string slices in this
context also require many load-time relocations on some platforms when
idna is used in a static library, another unnecessary cost.

Instead of this scheme, we can define our own string slice type that
indexes into a generated table.  This string slice type enables us to
use a smaller discriminant type for `Mapping`, and therefore reduces the
size of `Range` to 16 bytes on a 32-bit platform and 16 bytes on a
64-bit platform.  For x86-64 Linux, this change reduces total idna size
by ~130K, a significant savings.

Changing the script to use unicode escapes makes things slightly more
readable, enables better eyeball comparisons of the generated tables to
the original mapping table, and makes syntax highlighting in emacs
somewhat faster.
@froydnj froydnj force-pushed the froydnj:uts-data-slimming branch from 3ef26e1 to 12a92d7 Mar 24, 2017
@SimonSapin
Copy link
Member

SimonSapin commented Apr 3, 2017

Looks good, thanks!

@bors-servo r+

@bors-servo
Copy link
Contributor

bors-servo commented Apr 3, 2017

📌 Commit 12a92d7 has been approved by SimonSapin

@bors-servo
Copy link
Contributor

bors-servo commented Apr 3, 2017

Testing commit 12a92d7 with merge 8c9b2b6...

bors-servo added a commit that referenced this pull request Apr 3, 2017
reduce space required by idna data tables

The `Range` type defined by uts46 takes up 20 bytes on a 32-bit platform and 32 bytes on a 64-bit platform.  This bloat is due to `Mapping` using a full word-width discriminant for its type and native string slices, which are unnecessarily large.  The use of native string slices in this context also require many load-time relocations on some platforms when idna is used in a static library, another unnecessary cost.

Instead of this scheme, we can define our own string slice type that indexes into a generated table.  This string slice type enables us to use a smaller discriminant type for `Mapping`, and therefore reduces the
size of `Range` to 16 bytes on a 32-bit platform (I think) and 16 bytes on a 64-bit platform.  For x86-64 Linux, this change reduces total idna size by ~130K, a significant savings, and makes more of idna read-only.

Further reductions are possible by splitting the table into smaller variants for the basic multilingual plane vs. everything else; I will leave that reduction for a different PR.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-url/291)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Apr 3, 2017

☀️ Test successful - status-travis
Approved by: SimonSapin
Pushing 8c9b2b6 to master...

@bors-servo bors-servo merged commit 12a92d7 into servo:master Apr 3, 2017
3 checks passed
3 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
@froydnj froydnj deleted the froydnj:uts-data-slimming branch Jun 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.