Merge data files before lookup by movermeyer · Pull Request #98 · ruby-i18n/ruby-cldr

movermeyer · 2021-12-07T15:50:46Z

What are you trying to accomplish?

Fixes #96.

Some parts (ldml, ldmlBCP47 amd supplementalData) of CLDR data require that you merge all the files with the same root element before doing lookups.

Ref: https://www.unicode.org/reports/tr35/tr35.html#XML_Format

That way, CLDR can move data between files, or split files, for purposes of organization without affecting the actual values.

However, ruby-cldr was hard-coding the paths where particular data should be read from.
This lead to data being no longer exported when it was moved in the upstream CLDR.

What approach did you choose and why?

I changed Cldr::Export::Data::Base#doc to return a doc that has all of the related XML files merged into one Nokogiri document.

Merging all of these files is expensive, so I use a class variable as a cache. This might not be the absolute ideal way to do things from a patterns perspective, but it saves us from having to do a much more intense refactor. I like it well enough. 🤷

I use @locale as a flag for whether or not the data is supplemental or locale dependent. This will not work if we ever want to use this for the other types of data in CLDR (keyboard, platform, or ldmlBCP47). Perhaps a future refactor would pull this out into subclasses of Base instead (Supplemental and LanguageDependent?)? I have this idea that lib/cldr/export.rb could be radically simpler if it didn't have to ask is_shared_component? and Transforms was less of a special snowflake. But that's a much larger refactor than what I'm attempting right now in this PR. Again, good enough for now. 🤷

What should reviewers focus on?

What do you think about the changes?

The impact of these changes

Fixes #96.

Testing

You could run thor cldr:export for master and this branch, then run diff -r to compare the outputs:

git checkout master
thor cldr:export --target=./data_from_master

git checkout movermeyer:movermeyer/merge_before_lookup
thor cldr:export --target=./data_from_pr

diff -r data_from_master data_from_pr

Which will show 3 classes of diffs:

Only in data_from_master/en-001: rbnf.yml. See comment.
The data/transforms/* files all change slightly since they no longer have a @doc instance variable. This should not be important to any consumer.
Only in data_after: variables.yml, this is the result of Variables stopped being output #96 getting fixed.

movermeyer · 2021-12-07T16:35:13Z

+          grouping_nodes = select("rbnf/rulesetGrouping")
+          return {} if grouping_nodes.empty?


I made this change since it didn't make sense to check for the existence of path anymore.

This had a side-effect of cleaning up a edge-case bug:

common/rbnf/en_001.xml exists, but is empty. So this code was returning [] instead of the expected {}. Now that it has been fixed to return {}, en-001/rbnf.yml is no longer output since we don't output empty hashes.

movermeyer · 2021-12-07T18:18:05Z

+            # and the <identity> elements are not important to us / make no sense when combined together.
+            return Nokogiri::XML('') if paths_to_merge.empty?
+
+            rest = paths_to_merge[1..paths_to_merge.size - 1]


I can't just use paths_to_merge[1..] since that's not available in Ruby 2.3, and we haven't officially dropped support for the old versions of Ruby.

Korri · 2021-12-10T10:17:37Z

+            rest.inject(Nokogiri::XML(File.read(paths_to_merge.first))) do |result, path|
+              next_doc = Nokogiri::XML(File.read(path))
+
+              next_doc.root.children.each do |child|
+                result.root.add_child(child)
+              end
+
+              result
+            end


I feel like there is a cleaner/clearer way to do this part, but quickly playing around couldn't quickly figure out something better 🤷‍♂️

Some parts (`ldml`, `ldmlBCP47` amd `supplementalData`) of CLDR data require that you merge all the files with the same root element before doing lookups. This gives you a mechanism to lookup the paths you need to merge together based on the root element. Ref: https://www.unicode.org/reports/tr35/tr35.html#XML_Format

movermeyer commented Dec 7, 2021

View reviewed changes

movermeyer force-pushed the movermeyer/merge_before_lookup branch 4 times, most recently from 53747d7 to ecb1dd3 Compare December 7, 2021 18:16

movermeyer commented Dec 7, 2021

View reviewed changes

movermeyer marked this pull request as ready for review December 7, 2021 18:18

movermeyer mentioned this pull request Dec 8, 2021

Export is missing some subdivisions #95

Open

movermeyer requested a review from Korri December 9, 2021 17:46

Korri approved these changes Dec 10, 2021

View reviewed changes

movermeyer added 2 commits December 13, 2021 11:54

Merge the related paths before doing lookups

d71b940

movermeyer force-pushed the movermeyer/merge_before_lookup branch from ecb1dd3 to d71b940 Compare December 13, 2021 16:55

movermeyer merged commit d1bdfef into ruby-i18n:main Dec 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge data files before lookup#98

Merge data files before lookup#98
movermeyer merged 2 commits intoruby-i18n:mainfrom
movermeyer:movermeyer/merge_before_lookup

movermeyer commented Dec 7, 2021 •

edited

Loading

Uh oh!

movermeyer Dec 7, 2021

Uh oh!

movermeyer Dec 7, 2021 •

edited

Loading

Uh oh!

Korri Dec 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		grouping_nodes = select("rbnf/rulesetGrouping")
		return {} if grouping_nodes.empty?

Conversation

movermeyer commented Dec 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What are you trying to accomplish?

What approach did you choose and why?

What should reviewers focus on?

The impact of these changes

Testing

Uh oh!

movermeyer Dec 7, 2021

Choose a reason for hiding this comment

Uh oh!

movermeyer Dec 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Korri Dec 10, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

movermeyer commented Dec 7, 2021 •

edited

Loading

movermeyer Dec 7, 2021 •

edited

Loading