Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
26 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Notes on generation of UCD data | ||
|
||
The data in this crate was generated from Unicode 12.0. To re-spin: | ||
|
||
Fetch the UCD data and unpack. | ||
|
||
``` | ||
curl -LO https://www.unicode.org/Public/zipped/12.0.0/UCD.zip | ||
mkdir ucd | ||
cd ucd | ||
unzip ../UCD.zip | ||
``` | ||
|
||
Update list of scripts known to HarfBuzz. We derived the list from [harfbuzz_sys/src/lib.rs](https://github.com/servo/rust-harfbuzz/blob/master/harfbuzz-sys/src/lib.rs) and using a text editor, pasting the result as the `hb_scripts` variable in gen_tables.py. Note that four scripts are present in Unicode 12.0 but not in harfbuzz_sys 0.3.1 ('Elymaic', 'Nandinagari', 'Nyiakeng_Puachue_Hmong', 'Wancho'). Consider updating the script to parse the Rust source file (though this would mean another download). | ||
|
||
Run gen_tables.py. Note also that when running on Windows, you'll probably want to strip the CR | ||
from the CRLF line endings. | ||
|
||
``` | ||
python gen_tables.py ucd > src/tables.py | ||
cargo fmt | ||
``` | ||
|
||
We considered using [ucd-generate] but it did not get the data we needed in the correct form. For future work, consider migrating to that tool. Also consider trie lookups rather than binary searches, but one reason we did go for binary search is the relatively compact data. | ||
|
||
[ucd-generate]: https://github.com/BurntSushi/ucd-generate |