Teach data generation to parse and publish NameAliases.txt#68
Teach data generation to parse and publish NameAliases.txt#68mathiasbynens merged 4 commits intonode-unicode:mainfrom
Conversation
mathiasbynens
left a comment
There was a problem hiding this comment.
Thanks for the patch! I left a few comments — PTAL.
data/resources.js
Outdated
| 'script-extensions': 'https://unicode.org/Public/7.0.0/ucd/ScriptExtensions.txt', | ||
| 'blocks': 'https://unicode.org/Public/7.0.0/ucd/Blocks.txt', | ||
| 'properties': 'https://unicode.org/Public/7.0.0/ucd/PropList.txt', | ||
| 'aliases': 'https://unicode.org/Public/7.0.0/ucd/NameAliases.txt', |
There was a problem hiding this comment.
Let’s go with name-aliases instead of the less precise aliases
index.js
Outdated
| parsers.parseEmoji = require('./scripts/parse-emoji.js'); | ||
| parsers.parseEmojiSequences = require('./scripts/parse-emoji-sequences.js'); | ||
| parsers.parseNames = require('./scripts/parse-names.js'); | ||
| parsers.parseAliases = require('./scripts/parse-aliases.js'); |
There was a problem hiding this comment.
Similarly, parseNameAliases
index.js
Outdated
| extend(dirMap, utils.writeFiles({ | ||
| 'version': version, | ||
| 'map': parsers.parseAliases(version), | ||
| 'type': 'Aliases', |
There was a problem hiding this comment.
Should this go under the Names directory? Feels like it belongs there.
There was a problem hiding this comment.
Pushed an update. Here is the directory structure as reflected in the changed README:
diff --git a/README.md b/README.md
index 373e3dbadd85..c657677a7efe 100644
--- a/README.md
+++ b/README.md
@@ -57,6 +57,20 @@ const openingBrackets = require('@unicode/unicode-14.0.0/Bidi_Paired_Bracket_Typ
Other than categories, data on Unicode properties, blocks, scripts, and script extensions is available too (for recent versions of the Unicode standard). Here’s the full list of the available data for v14.0.0:
```js
+// `Names`:
+
+require('@unicode/unicode-14.0.0/Names/index.js'); // array of canonical names
+
+require('@unicode/unicode-14.0.0/Names/Abbreviation/index.js'); // lookup map from codepoint to aliases
+
+require('@unicode/unicode-14.0.0/Names/Alternate/index.js'); // lookup map from codepoint to aliases
+
+require('@unicode/unicode-14.0.0/Names/Control/index.js'); // lookup map from codepoint to aliases
+
+require('@unicode/unicode-14.0.0/Names/Correction/index.js'); // lookup map from codepoint to aliases
+
+require('@unicode/unicode-14.0.0/Names/Figment/index.js'); // lookup map from codepoint to aliases
+
// `Binary_Property`:
require('@unicode/unicode-14.0.0/Binary_Property/ASCII/code-points.js');There was a problem hiding this comment.
If you want something different let me know. I'm working with what's simple and hopefully least invasive. It's a bit hard to know exactly how this is impacting the "output" library because it outputs compressed data so it's effectively diffing a huge base64 string that changes profoundly with each npm test. that said, no files appear to be lost and the new files are all as indicated in the above snippet, so i think it's probably okay. if there's a better way to validate the change please let me know.
thank you for the feedback and for this excellent library!
|
hello @mathiasbynens , is there anything i can do to make this change more acceptable for merging? |
|
There’s still an unresolved comment here: #68 (comment) |
f55f0f9 to
5ef9c90
Compare
|
@mathiasbynens thanks for the reminder. I think it's addressed. Please note #70 should be merged first, without it the build is broken. Let me know if this directory structure is okay, or if you prefer an intermediate diff --git a/README.md b/README.md
index 0efc52f2343b..480a7449a401 100644
--- a/README.md
+++ b/README.md
@@ -57,6 +57,16 @@ const openingBrackets = require('@unicode/unicode-15.0.0/Bidi_Paired_Bracket_Typ
Other than categories, data on Unicode properties, blocks, scripts, and script extensions is available too (for recent versions of the Unicode standard). Here’s the full list of the available data for v15.0.0:
```js
+// `Names`:
+
+require('@unicode/unicode-15.0.0/Names/index.js'); // array of canonical names
+require('@unicode/unicode-15.0.0/Names/Abbreviation/index.js'); // lookup map from codepoint to aliases
+require('@unicode/unicode-15.0.0/Names/Alternate/index.js'); // lookup map from codepoint to aliases
+require('@unicode/unicode-15.0.0/Names/Control/index.js'); // lookup map from codepoint to aliases
+require('@unicode/unicode-15.0.0/Names/Correction/index.js'); // lookup map from codepoint to aliases
+require('@unicode/unicode-15.0.0/Names/Figment/index.js'); // lookup map from codepoint to aliases
+
+
// `Binary_Property`:
require('@unicode/unicode-15.0.0/Binary_Property/ASCII/code-points.js'); |
5ef9c90 to
fcb5610
Compare
Current modules allow users to query the official published Unicode "Name" for each code point. In some cases, more detailed or corrected information can be found using the NameAliases.txt file.
Implement a parser that reads the formal Unicode aliases and makes them part of the published modules, for unicode versions in which aliases are available.