Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lao #71

Open
scossu opened this issue Nov 8, 2023 · 2 comments
Open

Lao #71

scossu opened this issue Nov 8, 2023 · 2 comments
Assignees
Labels
help wanted Extra attention is needed script
Milestone

Comments

@scossu
Copy link
Collaborator

scossu commented Nov 8, 2023

Add support for Lao: https://www.loc.gov/catdir/cpso/romanization/lao.pdf

@scossu scossu added help wanted Extra attention is needed script labels Nov 8, 2023
@scossu scossu self-assigned this Nov 8, 2023
@andjc
Copy link

andjc commented Dec 7, 2023

Discrepancies between interpretations of the 1997 and the 2012 Lao romanisations tables may need to be resolved. There is divergent practice that would affect mappings. See a draft note.

The Lao -> Latin mapping is a sieve, lots of data is lost, making a Latin -> Lao mapping problematic at the least. Lao, Thai and various other mappings would be better served by developing machine learning models for them. In the absence of an ML model, next best approach would be to base the assigned Latin -> Lao mapping on character frequencies based on analysis of a Lao corpus. It will also require Lao syllable boundary identification to distinguish syllable initial and final consonants.

Alternatively, it may be easier to map complete syllables form Lao -> Latin. Romanised syllables to Lao are a one-to-many mapping.

An example of many-to-one syllable mappings. But the situation becomes more complex with syllable final consonants.

@scossu
Copy link
Collaborator Author

scossu commented Dec 7, 2023

We are exploring the use of Aksharamukha embedded in Scriptshifter for some East Asian and Southeast Asian languages. Currently there is experimental support for Bengali, Burmese, Devanagari, Gurmukhi, Japanese (Katakana + Hiragana - but slated for removal because not accurate enough), Tamil (+ Brahmi + extended), Thai, Tibetan via Aksharamukha.

Lao (in two versions) seems to be supported in Aksharamukha but we haven't tested it yet. If you were able to confirm the accuracy of Lao transliteration I could very easily add support for that in Scriptshifter.

Roman to Script transliteration support has a lower priority than Script to Roman at the moment. So if S2R transliteration is not reliable we can disable it on a script-by-script basis.

@scossu scossu added this to the Phase 3 milestone Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed script
Projects
None yet
Development

No branches or pull requests

2 participants