Skip to content

Character map duplicated replacement issues and rule order problem #17

@ronaldtse

Description

@ronaldtse

Pattern replacements probably shouldn't be applied more than once. For example,

  59) Interscript maps/un-jpn-Hrkt-Latn-hepburn.yaml system test for {"source"=>"とうきょう", "expected"=>"tôkyô"}
      Failure/Error: expect(result).to eq(test["expected"])
      
        expected: "tôkyô"
             got: "tôukyoょu"
      
        (compared using ==)
      # ./spec/interscript_spec.rb:16:in `block (5 levels) in <top (required)>'

This is a problem. とうきょう is only supposed to match these rules:

    "とう": "tô"
    "きょう": "kyô"

But the result shows it matching (we should have a debugging method to show which rules are matching):

    "とう": "tô"
    "う": "u"
    "きょ": "kyo"
# No rule for "ょ"
    "う": "u"

We must remove those replaced tokens after they were swapped, and we must ensure that longer patterns are matched first.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions