Compile liblouis with 32 bit widechars #9544
Link to issue number:
Summary of the issue:
Liblouis currently uses a 2 byte encoding to process braille. This is pretty annoying when displaying emoji, as they are 32 bit unicode characters. For example,
More importantly though, using a 2 byte encoding with Python 3 is subject to break things in a major way. The braille module uses brailleToRawPos and rawToBraillePos to mape braille characters to real characters. In python 2, unicode strings are internally saved with a two byte encoding. Therefore, 32 bit unicode characters take two indexes or offsets in a string. In python 3, one index/offset corresponds with a code point. Liblouis 2 byte wide characters played pretty nicely with Python 2 unicode strings, but with 16 bit wide characters on python 3, the rawToBraillePos and brailleToRawPos mappings do no longer match, as liblouis reads
Description of how this pull request fixes the issue:
This compiles liblouis with 32 bit wide characters instead of 16. This means only one replacement pattern is printed for 32 bit characters instead of two, and it also should ensure that brailleToRawPos and rawTobraillePos mappings are correct, as both Python 3 and Liblouis UCS4 assume that all characters in the wild only take one offset in a string.
This pr is pretty theoretically. Testing can be performed as soon as #9543 is merged. Therefore, I will mark this a draft until that's the case.
Known issues with pull request:
None known as of yet
Change log entry:
The text was updated successfully, but these errors were encountered:
This isn't fully correct due to the definition (e.g. "undefined 0") in some braille tables. Undefined Unicode characters can also be displayed just as ⠀ (dot 0). Please read the HUC Braille Tables documentation for further details.
The one and only thing I have to know here is if I have to change all yhhhhh definitions to zhhhhhhhh definitions in the HUC Braille Tables. But these replacements are done quite quickly – in compare of the whole creating process of the HUC Braille Tables. Well, after NVDA fully supports UTF-32 characters I have to update the HUC Braille Tables documentation as well, because it references to NVDA 2019.1 yet.
Personally I really want to see the UTF-32 support in NVDA, because by using the HUC Braille Tables the amount of necessary braille characters for an undefined Unicode character between U+10000 and U+10FFFF is reduced from 16 to 3 8-dot braille characters. That would be great.
PS: In less than four hours I'm sitting in the train to the SightCity 2019 where I'm going to inform some people about the existence of the HUC Braille Tables. You can find A7 handouts ((cc) by-sa in EN and DE) regarding the HUC Braille Tables here on my website.
These are out of scope for this pr. This pr aims at fixing braille issues introduced when switching to Python 3, nothing more than that.
Thanks, I will fix this entry.