Teach data generation to parse and publish NameAliases.txt by jpassaro · Pull Request #68 · node-unicode/node-unicode-data

jpassaro · 2022-03-04T23:53:03Z

Current modules allow users to query the official published Unicode "Name" for each code point. In some cases, more detailed or corrected information can be found using the NameAliases.txt file.

Implement a parser that reads the formal Unicode aliases and makes them part of the published modules, for unicode versions in which aliases are available.

mathiasbynens

Thanks for the patch! I left a few comments — PTAL.

mathiasbynens · 2022-03-07T11:30:35Z

data/resources.js

 		'script-extensions': 'https://unicode.org/Public/7.0.0/ucd/ScriptExtensions.txt',
 		'blocks': 'https://unicode.org/Public/7.0.0/ucd/Blocks.txt',
 		'properties': 'https://unicode.org/Public/7.0.0/ucd/PropList.txt',
+		'aliases': 'https://unicode.org/Public/7.0.0/ucd/NameAliases.txt',


Let’s go with name-aliases instead of the less precise aliases

mathiasbynens · 2022-03-07T11:30:47Z

index.js

 parsers.parseEmoji = require('./scripts/parse-emoji.js');
 parsers.parseEmojiSequences = require('./scripts/parse-emoji-sequences.js');
 parsers.parseNames = require('./scripts/parse-names.js');
+parsers.parseAliases = require('./scripts/parse-aliases.js');


Similarly, parseNameAliases

mathiasbynens · 2022-03-07T11:32:06Z

index.js

+	extend(dirMap, utils.writeFiles({
+		'version': version,
+		'map': parsers.parseAliases(version),
+		'type': 'Aliases',


Should this go under the Names directory? Feels like it belongs there.

Pushed an update. Here is the directory structure as reflected in the changed README:

diff --git a/README.md b/README.md index 373e3dbadd85..c657677a7efe 100644 --- a/README.md +++ b/README.md @@ -57,6 +57,20 @@ const openingBrackets = require('@unicode/unicode-14.0.0/Bidi_Paired_Bracket_Typ Other than categories, data on Unicode properties, blocks, scripts, and script extensions is available too (for recent versions of the Unicode standard). Here’s the full list of the available data for v14.0.0: ```js +// `Names`: + +require('@unicode/unicode-14.0.0/Names/index.js'); // array of canonical names + +require('@unicode/unicode-14.0.0/Names/Abbreviation/index.js'); // lookup map from codepoint to aliases + +require('@unicode/unicode-14.0.0/Names/Alternate/index.js'); // lookup map from codepoint to aliases + +require('@unicode/unicode-14.0.0/Names/Control/index.js'); // lookup map from codepoint to aliases + +require('@unicode/unicode-14.0.0/Names/Correction/index.js'); // lookup map from codepoint to aliases + +require('@unicode/unicode-14.0.0/Names/Figment/index.js'); // lookup map from codepoint to aliases + // `Binary_Property`: require('@unicode/unicode-14.0.0/Binary_Property/ASCII/code-points.js');

If you want something different let me know. I'm working with what's simple and hopefully least invasive. It's a bit hard to know exactly how this is impacting the "output" library because it outputs compressed data so it's effectively diffing a huge base64 string that changes profoundly with each npm test. that said, no files appear to be lost and the new files are all as indicated in the above snippet, so i think it's probably okay. if there's a better way to validate the change please let me know.

thank you for the feedback and for this excellent library!

jpassaro · 2022-07-25T22:35:52Z

hello @mathiasbynens , is there anything i can do to make this change more acceptable for merging?

mathiasbynens · 2022-07-26T08:22:20Z

There’s still an unresolved comment here: #68 (comment)

jpassaro · 2022-07-27T21:07:33Z

@mathiasbynens thanks for the reminder. I think it's addressed. Please note #70 should be merged first, without it the build is broken.

Let me know if this directory structure is okay, or if you prefer an intermediate Aliases or NameAliases directory

diff --git a/README.md b/README.md
index 0efc52f2343b..480a7449a401 100644
--- a/README.md
+++ b/README.md
@@ -57,6 +57,16 @@ const openingBrackets = require('@unicode/unicode-15.0.0/Bidi_Paired_Bracket_Typ
 Other than categories, data on Unicode properties, blocks, scripts, and script extensions is available too (for recent versions of the Unicode standard). Here’s the full list of the available data for v15.0.0:
 
 ```js
+// `Names`:
+
+require('@unicode/unicode-15.0.0/Names/index.js'); // array of canonical names
+require('@unicode/unicode-15.0.0/Names/Abbreviation/index.js'); // lookup map from codepoint to aliases
+require('@unicode/unicode-15.0.0/Names/Alternate/index.js'); // lookup map from codepoint to aliases
+require('@unicode/unicode-15.0.0/Names/Control/index.js'); // lookup map from codepoint to aliases
+require('@unicode/unicode-15.0.0/Names/Correction/index.js'); // lookup map from codepoint to aliases
+require('@unicode/unicode-15.0.0/Names/Figment/index.js'); // lookup map from codepoint to aliases
+
+
 // `Binary_Property`:
 
 require('@unicode/unicode-15.0.0/Binary_Property/ASCII/code-points.js');

scripts/utils.js

templates/README.md

Issue: #68

mathiasbynens reviewed Mar 7, 2022

View reviewed changes

jpassaro force-pushed the add-name-aliases branch from f55f0f9 to 5ef9c90 Compare July 27, 2022 21:02

add support for NameAliases file

fcb5610

jpassaro force-pushed the add-name-aliases branch from 5ef9c90 to fcb5610 Compare July 28, 2022 15:02

mathiasbynens reviewed Jul 29, 2022

View reviewed changes

scripts/utils.js Outdated Show resolved Hide resolved

Update scripts/utils.js

fce5408

mathiasbynens reviewed Jul 29, 2022

View reviewed changes

scripts/utils.js Outdated Show resolved Hide resolved

Update scripts/utils.js

f99ba85

mathiasbynens reviewed Jul 29, 2022

View reviewed changes

templates/README.md Outdated Show resolved Hide resolved

Update templates/README.md

786a1e7

mathiasbynens approved these changes Jul 29, 2022

View reviewed changes

mathiasbynens merged commit 13ebc14 into node-unicode:main Jul 29, 2022

mathiasbynens pushed a commit that referenced this pull request Jul 29, 2022

Add support for NameAliases

3480f5c

Issue: #68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Teach data generation to parse and publish NameAliases.txt#68

Teach data generation to parse and publish NameAliases.txt#68
mathiasbynens merged 4 commits intonode-unicode:mainfrom
jpassaro:add-name-aliases

jpassaro commented Mar 4, 2022 •

edited

Loading

Uh oh!

mathiasbynens left a comment

Uh oh!

mathiasbynens Mar 7, 2022

Uh oh!

mathiasbynens Mar 7, 2022

Uh oh!

mathiasbynens Mar 7, 2022

Uh oh!

jpassaro Mar 7, 2022

Uh oh!

jpassaro Mar 7, 2022 •

edited

Loading

Uh oh!

jpassaro commented Jul 25, 2022

Uh oh!

mathiasbynens commented Jul 26, 2022

Uh oh!

jpassaro commented Jul 27, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jpassaro commented Mar 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mathiasbynens left a comment

Choose a reason for hiding this comment

Uh oh!

mathiasbynens Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

mathiasbynens Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

mathiasbynens Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

jpassaro Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

jpassaro Mar 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpassaro commented Jul 25, 2022

Uh oh!

mathiasbynens commented Jul 26, 2022

Uh oh!

jpassaro commented Jul 27, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jpassaro commented Mar 4, 2022 •

edited

Loading

jpassaro Mar 7, 2022 •

edited

Loading