Skip to content

Investigate Interscript maps for ISO 24229 spelling system codes #687

@ronaldtse

Description

@ronaldtse

ISO 24229 was the basis for Interscript system codes.

With the progress of Interscript we are now aware that the original code "system" doesn't quite work:
i.e. the pattern "{authority code}-{lang}-{source script}-{target-script}-{id}"

We now know there are systems that span multiple languages, or support different character sets (same script code, e.g. Latn, but different characters, e.g. diacritics).

So we need to introduce a new "spelling system code" for all languages and transliteration systems.

  • For example, Estonian spelling underwent spelling reform, and the words spelled before vs now are different.
  • One example in Interscript: Chinese language in Chinese characters (Hans) can be transliterated into the "pinyin" system, but Mongolian language in Mongolian characters can also be transliterated into the "pinyin" system. The expected pronunciation of the "pinyin" of both languages are the same. So these two systems produce the same "expected readable spelling", so the output of both systems should be labeled as the same "pinyin" system.
  • Another example is the ODNI systems -- they are all supposed to be read according to English pronunciation.

We need to:

  • come up with the new spelling system codes
  • and tag every Interscript system with the input and output

The input and output spelling systems can be identified by the defined character sets (in an interscript map, the rule keys are the input character set, the rule values are the output character set).

This task is to do this.

First, we need to make a script to define the character input and output sets of the interscript systems.

  • i.e. read all the maps, and see what character sets are equal/different, and then we decide whether those spelling systems are same or not, and then assign unique codes for the unique spelling systems.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions