Merge data files before lookup#98
Conversation
| grouping_nodes = select("rbnf/rulesetGrouping") | ||
| return {} if grouping_nodes.empty? |
There was a problem hiding this comment.
I made this change since it didn't make sense to check for the existence of path anymore.
This had a side-effect of cleaning up a edge-case bug:
common/rbnf/en_001.xml exists, but is empty. So this code was returning [] instead of the expected {}. Now that it has been fixed to return {}, en-001/rbnf.yml is no longer output since we don't output empty hashes.
53747d7 to
ecb1dd3
Compare
| # and the <identity> elements are not important to us / make no sense when combined together. | ||
| return Nokogiri::XML('') if paths_to_merge.empty? | ||
|
|
||
| rest = paths_to_merge[1..paths_to_merge.size - 1] |
There was a problem hiding this comment.
I can't just use paths_to_merge[1..] since that's not available in Ruby 2.3, and we haven't officially dropped support for the old versions of Ruby.
| rest.inject(Nokogiri::XML(File.read(paths_to_merge.first))) do |result, path| | ||
| next_doc = Nokogiri::XML(File.read(path)) | ||
|
|
||
| next_doc.root.children.each do |child| | ||
| result.root.add_child(child) | ||
| end | ||
|
|
||
| result | ||
| end |
There was a problem hiding this comment.
I feel like there is a cleaner/clearer way to do this part, but quickly playing around couldn't quickly figure out something better 🤷♂️
Some parts (`ldml`, `ldmlBCP47` amd `supplementalData`) of CLDR data require that you merge all the files with the same root element before doing lookups. This gives you a mechanism to lookup the paths you need to merge together based on the root element. Ref: https://www.unicode.org/reports/tr35/tr35.html#XML_Format
ecb1dd3 to
d71b940
Compare
What are you trying to accomplish?
Fixes #96.
Some parts (
ldml,ldmlBCP47amdsupplementalData) of CLDR data require that you merge all the files with the same root element before doing lookups.Ref: https://www.unicode.org/reports/tr35/tr35.html#XML_Format
That way, CLDR can move data between files, or split files, for purposes of organization without affecting the actual values.
However,
ruby-cldrwas hard-coding the paths where particular data should be read from.This lead to data being no longer exported when it was moved in the upstream CLDR.
What approach did you choose and why?
I changed
Cldr::Export::Data::Base#docto return a doc that has all of the related XML files merged into one Nokogiri document.Merging all of these files is expensive, so I use a class variable as a cache. This might not be the absolute ideal way to do things from a patterns perspective, but it saves us from having to do a much more intense refactor. I like it well enough. 🤷
I use
@localeas a flag for whether or not the data is supplemental or locale dependent. This will not work if we ever want to use this for the other types of data in CLDR (keyboard,platform, orldmlBCP47). Perhaps a future refactor would pull this out into subclasses ofBaseinstead (SupplementalandLanguageDependent?)? I have this idea thatlib/cldr/export.rbcould be radically simpler if it didn't have to askis_shared_component?andTransformswas less of a special snowflake. But that's a much larger refactor than what I'm attempting right now in this PR. Again, good enough for now. 🤷What should reviewers focus on?
What do you think about the changes?
The impact of these changes
Fixes #96.
Testing
You could run
thor cldr:exportformasterand this branch, then rundiff -rto compare the outputs:Which will show 3 classes of diffs:
Only in data_from_master/en-001: rbnf.yml. See comment.data/transforms/*files all change slightly since they no longer have a@docinstance variable. This should not be important to any consumer.Only in data_after: variables.yml, this is the result of Variables stopped being output #96 getting fixed.