-
-
Notifications
You must be signed in to change notification settings - Fork 746
Description
Is your feature request related to a problem? Please describe.
In some cases, text can contain ligature characters that are not provided in a braille table. Alternatively, a speech synthesizer kan really struggle with these.
An example is the ligature ij (ij), in dutch as in ijsbeer (polar bear). The Dutch version of ESPeak is unable to pronounce this word correctly.
An exactly opposite example is á, which is composed of two characters, namely the letter a and the modifier ́.
Describe the solution you'd like
For both speech and braille, i propose adding the ability to enable unicode normalization with the NFKC algoritm (Normalization Form Compatibility Composition). This algorithm ensures that most ligatures are properly decomposed before passing them to the synthesizer while composing characters like á, which is much more common than á.
Note that while this sounds utterly complex, it is basically adding one line of code:
processed = unicodedata.normalize("NFKC", original)