Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified 8-dot worldwide unified mapping for the hexadecimal value of Unicode characters #688

Open
DrSooom opened this issue Jan 24, 2019 · 31 comments
Labels
enhancement An enhancement in the functionality (not a bug fix or a table improvement) idea Just an idea, there is no clear consensus yet that this is the way to go needs test A YAML test is needed (and should be committed) to explain the bug or expected behavior of a table

Comments

@DrSooom
Copy link

DrSooom commented Jan 24, 2019

Introduction:

Several months ago I read issue #489 here and I also opened nvaccess/nvda#8702 on September 1, 2018. In the last few weeks I was thinking about how to shrink the amount of braille characters to display an emoji. I found a possible "solution" for it, which already included ⣑ (U+28D1, dots 1578) as an announcing/introducing prefix character. But I had to give up because I couldn't figure out a solution for 6-dot too as well. I planned to use almost the same braille dots for 8-dot and for 6-dot, like ⣑⡤⣺ which should be equal to ⠿⠤⠳⠽. But in the end there were too many issues for "blank" cells during converting from 8-dot into 6-dot. So, that's why the following solution is only designed for 8-dot and not for 6-dot too.
[Update 2019-01-25 09:48 CET] See issue #689 for the 6-dot solution. [/Update]
After I had opened #685 here on January 12, 2019, I also asked the head of the Brailleschriftkomitee der Deutschsprachigen Länder how Unicode characters should be implemented into the German 8-dot braille table. Well, they already knew this problem for many years, but never find a suitable solution until now. As he ask me for such a solution, I sent him on January 24, 2019, at 06:43 CET (today morning) a draft of the following solution via e-mail. My brainstorming wasn't finished at this time. But a few hours later and after even more researching, I guess I now have found the final solution, how we can display the first 196608 Unicode characters with only three braille characters. So, and here is it. And no, it's a completely other idea than the one describes in #489.

Definition:

Prefix braille characters:

  • The prefix character must stand in front of every single Unicode character. To avoid confusion, grouping of two or more unseparated Unicode characters isn't allowed.
  • ⣥ = characters between U+0000 and U+FFFF
    U+28E5, dots 13678; Defines the first 65536 Unicode characters.
    The prefix character is a combination of the letters u and c.
  • ⣭ = characters between U+10000 and U+1FFFF
    U+28ED, dots 134678; Defines the second 65536 Unicode characters.
    The prefix character is a combination of the letters u and c and the digit 1.
  • ⣽ = characters between U+20000 and U+2FFFF
    U+28FD, dots 1345678; Defines the third 65536 Unicode characters.
    The prefix character is a combination of the letters u and c and the digit 2.
  • ⣵ = characters between U+30000 and U+10FFFF
    U+28F5, dots 135678; Defines the other 917504 Unicode characters.
    The prefix character is a combination of the letters u, e and c.
    And here three braille characters are needed to define a Unicode character correctly.
    U+30000 will be changed to U+030000 before converting into braille.
    At the moment only 337 characters are defined in these blocks.

Converting hexadecimal values into braille:

  • 0 = ⠚, 1 = ⠁, 2 = ⠃, 3 = ⠉, 4 = ⠙, 5 = ⠑, 6 = ⠋, 7 = ⠛
  • 8 = ⠓, 9 = ⠊, A = ⠈, B = ⠘, C = ⠒, D = ⠂, E = ⠐, F = ⠀

Combining hexadecimal values:

  • 00 = ⣺, 01 = ⠞, 10 = ⣡, FE = ⢀, FF = ⠀

Examples:

  • Digit Zero = 0 = U+0030 = '\x0030' = ⣥⣺⣩
    But: Two Digit Zero = 00 = U+0030U+0030 = '\x0030''\x0030' = ⣥⣺⣩⣥⣺⣩
    Reducing it to ⣥⣺⣩⣺⣩ isn't allowed.
  • Music Sharp Sign = ♯ = U+266F = '\x266f' = ⣥⡧⠋
  • Braille Pattern Dots-12 = ⠃ = U+2803 = '\x2803' = ⣥⣇⠾
  • Musical Symbol G Clef = 𝄞 = U+1D11E = '\xd834''\xdd1e' = ⣭⠆⢁
    Obsolete mappings: ⣥⡁⠅⠐ and ⣥⣆⢭⡂⢁ (These were just part of my brainstorming.)
  • Grinning Face = 😀 = U+1F600 = '\xd83d''\xde00' = ⣭⡤⣺

Technical solution:

Please read nvaccess/nvda#8702 first, where I already explained a solution how to reduce the amount of cells on a braille display for undefined characters. The two apostrophes, the backslash and the small x are now replaced with ⣥, ⣭, ⣽ or with ⣵, but only for undefined characters – if the user wants this. As it is now possible to save 13 braille cells for one single undefined Unicode character like the above shown emoji, I suggest to set this as default, but always with the option to show '\x0000' and so on, as this could be still helpful for others.
Sadly as I'm not so familiar with programming, I'm not able to suggest you how this could be implemented in Liblouis as well as in other software like screenreaders and braille printer software. As my solution is nothing than a replacement, I guess that the converting shouldn't such a big problem. Creating a huge braille table with all 1114112 Unicode characters as a second priority level table, which doesn't overwrite already defined characters in the primary chosen braille table, makes absolutely no sense in my opinion.
Every thought and suggestion from the community are highly welcome. Maybe I have overlooked something.

Additional sources:

@LeonarddeR
Copy link
Member

I'm pretty sure @dkager has an opinion :)

@egli
Copy link
Member

egli commented Jan 24, 2019

I'm not a braille reader, so I cannot judge. I understand the need to make display of emojis short. However your system sounds pretty complicated. How does this solution fare with respect to:

  • explaining is to a student who is trying to learn this
  • reading it (iiuc this is an 8-dot system)

@DrSooom
Copy link
Author

DrSooom commented Jan 24, 2019

@egli: It doesn't change the behaviour of all current existing braille tables – as long as the four mentioned prefix braille characters aren't already used. Furthermore this reduction will not only have effects on emojis.

And there is a huge benefit regarding new upcoming Unicode characters: They haven't to be defined separately, because they already can be read with my above described system. Okay, that's not really new and depending on the selected braille table they already can be read right now. My system only saves space on a braille display, not more. But this is needed; the less cells I have to read on a braille display, the faster I can recognize the character behind it.

And please don't compare my above described solution with "classic" braille tables, because its functionality is completely different – and easy to understand. You only need to know the Unicode hexadecimal code for those characters you really need. And if you don't know the character behind such a shrunken braille combination, it is really easy to convert the braille dots back into the hexadecimal values. You only need to know in which order they are written in braille. That's all.

@DrSooom DrSooom changed the title Simplified 8-dot worldwide unified mapping for Unicode characters Simplified 8-dot worldwide unified mapping for the hexadecimal value of Unicode characters Jan 24, 2019
@DrSooom
Copy link
Author

DrSooom commented Jan 24, 2019

I recently updated the issue title for clarification. My system doesn't replace any characters which are already defined in a braille table.

@dkager
Copy link
Contributor

dkager commented Jan 27, 2019

The question that comes up for me is how one would know that e.g. U+2840 refers to. In other words, your code shortens the text displayed in braille, but leaves out a mapping from a number to an emoji or other Unicode character.

@DrSooom
Copy link
Author

DrSooom commented Jan 28, 2019

@dkager: The relation between the hexadecimal value for a Unicode character and its name can be found by a quick search in the web. By the way this is easier to find out the naming of a character behind the dots in compare with classic 8-dot tables like in "de-de-comp8.ctb", where in most cases only the hexadecimal values and the dots are noted down. So if you can read the hexadecimal values directly, which is the only way to make these tables (see also #689) usable for all languages the same way, you are saving one step. Well, and if you are offline, you can also take a look at the outdated, not used table "unicodedefs.cti" to find out the name of a Unicode character.

And don't forget that my solution has even one more "big benefit" in compare with other tables: There aren't multiple meanings for one combination of three, four or five characters (see also #689). One combination is always related to a single Unicode character. So you haven't to learn quite a lot of new rules for these new tables, which will only be used for undefined Unicode characters.

@DrSooom
Copy link
Author

DrSooom commented Jan 28, 2019

I created a test file for 6-dot and for 8-dot for demonstrating how my idea would look like with emoticons (U+1F600 to U+1F64F, UTF-16 encoding). You can include these text files into another table and test it with NVDA.

Little hint: It would be very nice if the Liblouis manual would say anything about the file encoding (UTF-8 without BOM) and the type of line ending (LF). The last one was clear for me, but the BOM killed and killed and killed NVDA (no braille output after restarting). I always thought that I made a mistake with the including syntax. But no, after approximately one hour (or maybe just a half of an hour) I found the issue. Well, it doesn't matter now. 😉

[Update 2019-01-28 17:41 CET] The 6-dot table was added. [/Update]

@bertfrees bertfrees added the needs test A YAML test is needed (and should be committed) to explain the bug or expected behavior of a table label Jan 28, 2019
@bertfrees
Copy link
Member

You might get more feedback if you post this idea on the mailing list.

It's an interesting idea. But you have to help me understand the use case. It's clear that if you use this code for all text, you get an increase in number of braille cells needed. So to make sense, you can only use it for uncommon characters. Is this what you have in mind? I'm not a braille reader, but to me it would seem more logical to encode uncommon characters with a method that is maybe longer, but easier to memoize.

Don't get me wrong, I see how your method is better than Liblouis' default method for undefined characters. However the improvement is only marginal assuming it is used for uncommon characters only. The biggest issue with the current method, and also your method, is that it takes time to look up the Unicode numbers (unless you can memoize the numbers).

@DrSooom
Copy link
Author

DrSooom commented Jan 29, 2019

@bertfrees: Please read all my previous comments to this issue first. I guess it is absolutely clear that "⣥⣺⢽⣥⣺⡟⣥⣺⠵⣥⣺⠋⣥⣺⠋⣥⣺⠋⣥⣺⡋" instead of "DrSooom" makes no sense. Furthermore searching on the web for the hexadecimal value of a Unicode character is even faster than searching for the braille character ⣼ (dots 345678) – or even for a combination of braille characters, which are often used for 6-dot tables. I guess it's quite hard to find the rules for a specific table, which is written in a different language like the table itself (e.g. Japanese full documentation for German Grade 2).

But the "marginal improvement" of my solution is to save five cells on a braille display, which is quite a lot of space. Based on NVDA 2018.x only five emoticons (U+1F600 to U+1F64F) can be shown at the same time on an 80 cell braille display (because of UTF-16/surrogate splitting, see nvaccess/nvda#9044 for more details). But with my solution only 30 cells (and maybe later on only 15 cells) are required for presenting the exactly same values. On other words: You can read more at the same time without having to scroll and scroll and scroll the braille display all the time. And in the end this always mean that you are able to recognize the hexadecimal values much faster because you only have to read three instead of eight cells.

@egli
Copy link
Member

egli commented Jan 29, 2019

I'm not a braille reader so I'm probably not qualified to say anything, but this strikes a cord that I have to voice my concern.

The thing about braille IMHO is that we have to make it accessible to people. Contractions might make it smaller on a display or and paper. But they do not make it more accessible. In fact they achieve the opposite: They make braille harder to produce, harder to learn and I would claim also harder to read.

So personally while I see the space issue with emoticons, I am not into very intricate schemes to contract the braille to squeeze out a few braille cells at the cost of teaching this scheme to people.

Could you instead not just show the textual representation of an emoticon, such as the CLDR Short Name for example. I would expect this to make much more sense to an unsuspecting user.

Seems to be much better than complicated untangling of a contraction scheme of a hex number which would then have to be googled to finally find what it really means.

To get back to your proposed solution: I'm not opposed to include such a table. I'm just not sure we should make it the standard.

@egli egli added the idea Just an idea, there is no clear consensus yet that this is the way to go label Jan 29, 2019
@bertfrees
Copy link
Member

@DrSooom I have read all your previous comments, but not everybody will make the effort to do so, so I wanted to give you an opportunity to summarize what the use cases are that you have in mind. It is essential to understand this because every problem has a solution that works best for that specific case. Also it may be that a certain solution is intrinsically better, but that the effort it takes to develop a new braille code and to make it be adopted outweighs the advantages.

I'm still not sure whether this is theoretical talking, or whether there is a concrete use case:

  • Is this mainly to cater for the characters not defined (yet) in the German braille code?
  • Are you specifically thinking about emoji or do you have other symbols in mind that you come across occasionally?
  • Are you also looking for a way to read texts written in foreign alphabets for instance?
  • Is this an attempt to develop a new braille code that applies worldwide?
  • Are you just inviting us to join you in some brainstorming?

For the emoji use case, I was also thinking along the lines of what Christian suggested, i.e. a textual representation. I'm not saying that this would be a better solution, just trying to prove my point that different use cases may have different solutions.

I'd like to suggest to share this idea on the mailing list. The mailing list is where the people are that know more about braille and the development of braille codes. Github is more where the technical people are. Technically I don't see anything wrong with your method, that is, it indeed does a good job to minimize the amount of braille cells needed, it can be decoded relatively easily (with the help of the web) and it is unambiguous. These are essentially the things that make up a good braille code. However even if technically sound, there may be practical considerations that we technical people are forgetting.

In any case, I don't think it is our job to implement something that is not supported by a community or official entity, so the first step is to sell it to them. It's probably good to start with the BSKDL, and who knows, maybe more will follow?

@DrSooom
Copy link
Author

DrSooom commented Jan 29, 2019

Is this mainly to cater for the characters not defined (yet) in the German braille code?

No, it's for all tables, where it is possible to include them – always as an optional, shorter mapping for undefined Unicode characters.
@egli: I didn't intent to completely replace the current mapping for undefined characters with my shorter mapping. It always should be an option – even for backwards compatibility. The end user should decide how undefined Unicode characters should displayed via a simple checkbox. That would be the best solution for both. And it is still possible to combine it with nvaccess/nvda#8702.

Are you specifically thinking about emoji or do you have other symbols in mind that you come across occasionally?

All Unicode characters from U+0000 to U+10FFFF. The emoticons were just for testing and demonstrating because I only had to define 81 characters. Maybe I will extend my 8-dot test table to all 2048 surrogates (U+D800 to U+DFFF), because the current mishmash in this Unicode area on my system is beginning to ride on my nerves. 😉

Are you also looking for a way to read texts written in foreign alphabets for instance?

Yes and no. If you want to learn a new language, would you also want to learn a new braille table? Sometimes it isn't necessary and sometimes you have to. It depends on the language combination. Reading English and French with the German 8-dot table isn't a problem, but try Japanese.

Is this an attempt to develop a new braille code that applies worldwide?

Yes.

Are you just inviting us to join you in some brainstorming?

Yep, in the end I just want to provide a better solution in compare to '\xhhhh' and so on. The BSKDL is already informed about this issue here and #689. If desired I could give a talk at the SightCity-Forum 2020 in Frankfurt/Main (Germany) about this. The call for paper period for the SightCity-Forum 2019 has already ended. But if desired I could try to ask for a 30 minutes talk even for 2019, but the chances are bad.

And to the emojis: They can be misunderstood by naming it if you are chatting in different languages (system is set to German, but the conversation is in French). So if you change their naming into the different languages, you also have to learn them with the other language table together. Read this article for more details.

That's why in my opinion one braille combination fits one single Unicode character better here. In older board software like phpBB or PunBB :flower: was replaced with a flower graphic like emoticons. But that's how graphics can be described in the own language. This can also be implemented as an option, but would be a completely other topic and has nothing to do with this issue here – and also nothing with Liblouis itself. Well, and in the end you can replace the current mapping with my mapping followed by the replacement which is part of the end user software. So if you want, you can have all three solutions at the same time. The more options, the better. (Okay, okay, I'm a power user. 😀)

@bertfrees
Copy link
Member

OK, thanks. This confirms more or less what I thought.

Forgive me for saying, but it all sounds rather theoretical at this point. I don't mean anything negative by this. I just didn't hear many concrete use cases. Real use cases (such as the musical symbols ♭, ♮ and ♯) would generally benefit from proper support in braille codes I think. So the main practical use of your proposed code would be for symbols that are either too uncommon to support in the native braille code, or as a transitional measure for more common symbols until they have a proper representation.

All Unicode characters from U+0000 to U+10FFFF.

But how many of these do you actually encounter in the wild and have no braille representation yet?

Regarding learning new languages, I would really like to hear some stories of experiences of blind people that learned a new language but couldn't use their own braille system and in which case learning the new braille system was not an effective way to do it.

By the way, it is funny that you mention Japanese because especially for braille systems like Japanese and Chinese, which are mainly phonetic based, your method seems very uneffective. It was probably just a bad example though :)

Is this an attempt to develop a new braille code that applies worldwide?

Yes.

I think that is very ambitious, but you have my support.

I don't quite get what you are saying about emojis. Can't we assume that the system language is set to the language that corresponds with your braille system of choice?

If desired I could give a talk at the SightCity-Forum 2020 in Frankfurt/Main about this.

Ah, cool. I've never been there but maybe I should go once.

@DrSooom
Copy link
Author

DrSooom commented Jan 30, 2019

So the main practical use of your proposed code would be for symbols that are either too uncommon to support in the native braille code, or as a transitional measure for more common symbols until they have a proper representation.

Not only for more common symbols, for all Unicode characters at once.

But how many of these do you actually encounter in the wild and have no braille representation yet?

Thousands of thousands characters. I have quite a lot of C-, Canto-, Mando- and J-Pop and hundreds of soundtracks from Japan on my hard drive. Displaying their titles needs quite a lot of space on a braille display. And set the dot 0 for all undefined characters like in "fr-bfu-comp8.utb" isn't also no solution for me, because I want to select a title without using TTS. I must be able to identify them only on the braille display. So shorten them all, fits it best for me.
Based on my researches I would say that characters between U+0000 to U+2FFFF are most important yet. U+30000 to U+10FFFF can be also added later, if we really want to use a "classic" table for it, as I did with my demonstration table. It isn't difficult to create such tables, it only takes a while – even in Notepad++ using the replacement feature. But I will do that because it's a nice challenge for me.

By the way, it is funny that you mention Japanese because especially for braille systems like Japanese and Chinese, which are mainly phonetic based, your method seems very uneffective. It was probably just a bad example though :)

Well, too less knowledge on my side. 😉 But also read my previous comment regarding music titles written in Mandarin, Cantonese or Japanese. We can also use the 1071 Egyptian Hieroglyphs (U+13000 to U+1342F) as an additional example. I don't think that creating a braille table only for these characters in the different languages makes any sense. So shorten the hexadecimal value from 9 to 3 cells would be the only suitable way here – always as an option for the end user. In UTF+16 you will only need 6 instead of 16 cells for presenting the same information – and fully independent of the current used language.

I don't quite get what you are saying about emojis. Can't we assume that the system language is set to the language that corresponds with your braille system of choice?

If you want to replace them with a describing text like "Kissing Face with Smiling Eyes", then no – especially if you want to print them too. How emojis and all other Unicode characters are displayed, depends on the OS and on the (smartphone) application. The Unicode point helps here to make it easier to transfer text between different OS and applications. If there are no replacements (graphical or text-based) in the end application, the user should be able to read the hexadecimal value for this Unicode character itself.
I'm still thinking that it isn't the work of Liblouis to describe emojis or other Unicode characters, because it's on a completely other level. And if there would be a description, the description itself must be translated into all other languages and also defined in all other braille tables too. That would be really too much work. And I also never intended to do this with my mappings. But I'm not saying that it shouldn't be done – as an optional feature in the end application.

Well, and I'm going to visit the SightCity 2019 – my 7th time by the way. So I'm already able to talk there with some companies and organisations around the globe about this issue here and #689.

@bertfrees
Copy link
Member

bertfrees commented Jan 30, 2019

Not only for more common symbols, for all Unicode characters at once.

I understand. But I'm talking about practical use. To matter in practice the characters need to be common enough, or you need to encounter a lot of them at once.

Your example of Chinese/Japanese track names was a very enlightening one for me. Thanks. Even if you can't understand Chinese, it is perfectly possible to recognize the track names based on the dot patterns, or in my case, by recognizing the sequence of glyphs without knowing the meaning. So in this case it doesn't really makes sense to learn the Chinese braille system, just like it doesn't really make sense to learn how the tracks are pronounced. And of course when you encounter some Chinese text, it is not an isolated symbol, as is the case with an emoticon for instance, so you will indeed get the "scroll and scroll and scroll" issue if the code isn't short enough.

@DrSooom
Copy link
Author

DrSooom commented Jan 30, 2019

As my mappings only depend on the hexadecimal values, the relevance is irrelevant at all. In the end I just define all 65536 Unicode characters (U+0000 to U+FFFF), which would be enough for UTF-16. For UTF-32 the next two 65536 Unicode areas are required (for the upcoming ten years). That's why I asked for a better solution for displaying them in my introducing comment.

But with the table solution I still have to figure out what will happen if a character is defined twice. If Liblouis (or NVDA) is choosing always the first or always the last appearance, everything is good. Otherwise we have a problem.

Update 2019-01-30 21:40 CET:

The first appearance seems to be the critical one. So adding my test tables with include huc8-try-u+d800-u+dfff.txt at the end of a regular Liblouis table should have no effects on existing definitions. Note that I only performed a quick test with "de-de-comp8.ctb" and with NVDA 2018.1. Other tables weren't tested.

@DrSooom
Copy link
Author

DrSooom commented Jan 30, 2019

After approximately one and a half hour the 8-dot test file for all 2048 surrogates (U+D800 to U+DFFF) was created – and of course successfully tested with NVDA 2018.1, which uses UTF-16. That went significant faster than I thought. 😀

@egli
Copy link
Member

egli commented Jan 31, 2019

Hi @DrSooom , can you turn this into a pull request?

@DrSooom
Copy link
Author

DrSooom commented Jan 31, 2019

@egli: Not yet, because I'm not finished with #688 and #689 at all. The 8-dot braille table for the surrogates is still a demonstration – even if it's already working as expected. Including it into another table isn't difficult (for advanced users).

@Adriani90
Copy link

@dkager to come back to your comment, I think if there would be a speech dictionary for those undefined characters These would make it quite easy for users to find out which character it is. You just Need to listen to the voice. But in this case we should really also look at the Performance, especially in big documents with many undefined Unicode characters.
@DrSooom I agree with you in this matter. Especially for a new user it doesn't matter if U+13000 or its new Braille patern is displayed. The user would have to do some Research on the character anyway if the meaning is important for a Special purpose. But I guess for the real use cases People would just be ok with saving some Braille cells.
If these paterns see the light of the day, then we really Need also a Glossar where users and developers can find the coresponding hexadecimal code or the description for the character. I guess this is quite big work but I think there are already multiple sources where People can find tehm. We just have to merge them and add the Braille paterns.

@DrSooom
Copy link
Author

DrSooom commented Jan 31, 2019

@Adriani90: Such a file already exists on your hard drive – it's called "unicodedefs.cti". And it's part of Liblouis, but outdated, as I already mentioned here. And there is also a significant mistake in this file: "\x1D11E" (and so on) must be replaced with "\y1D11E". But in the end that's a different issue and has nothing to do with #688 and with #689. And no, I'm not going to add the naming of a Unicode character in my tables too, because they are already big enough (~1.6 MiB for the first 65536 characters in the 8-dot table; the 6-dot one is even bigger).

btw: In the meantime I found the note regarding "UTF-8 without BOM" in this wiki article (chapter "How do I edit the YAML files?").

@DrSooom
Copy link
Author

DrSooom commented Feb 7, 2019

I'm going to create all 8- and 6-dot tables for UTF-16 and UTF-32. I will add a new comment here, after I've finished everything.

@DrSooom
Copy link
Author

DrSooom commented Mar 1, 2019

As these days every single good idea needs its own website, I just created one for the Hexadecimal Unicode Characters Braille Tables. And here is the official announcement.
TL;DR: HUC8 Braille Tables and the English documentation are finished and the HUC6 Braille Tables, parts of the German documentation and the offline version of the documentation are still in progress.

@egli and @bertfrees: If you want to have the HUC8 Braille Tables already in Liblouis 3.9, feel free to copy the 19 files from the 7z archive into the "tables" folder in Liblouis, because I'm not going to open a PR regarding this in the upcoming days.

@bertfrees
Copy link
Member

Nice! Thank you. Whether we will include this in 3.9 depends on how much we'll have over the weekend. I'm not very hopeful about myself.

@egli
Copy link
Member

egli commented Aug 14, 2019

As mentioned in #730, the tables should either be distributed independently (outside of liblouis) or a new mode should be implemented that emits HUC Braille for unknown Unicode characters.

@egli egli closed this as completed Aug 14, 2019
@DrSooom
Copy link
Author

DrSooom commented Aug 16, 2019

@egli: Could you explain me why you also closed issue #688 and #689 right now, as PR #730 only showed up one way how to fix this? Please read the section "Technical solution" in the issue description of issue #688 as well as this comment in issue #689 and nvaccess/nvda#8702, which should be the goal. Therefore I ask you to re-open issue #688 and #689, as both aren't fixed yet and they are now going to be fixed by another PR in the future.

@egli
Copy link
Member

egli commented Aug 16, 2019

Well, OK I might have been a bit too over-zealous. I thought #688 and #689 were also about adding tables.

@egli egli reopened this Aug 16, 2019
@egli egli added the enhancement An enhancement in the functionality (not a bug fix or a table improvement) label Aug 16, 2019
@DrSooom
Copy link
Author

DrSooom commented Aug 16, 2019

Thanks. Could you still remove the label "needs test" for this issue here? Then it would be identical to issue #689.

PS: This comment in issue #664 was a little bit bad. Now I know how to improve the HUC Braille Tables regarding UTF-16 surrogate pairs. But before I'm going to do this, I have to finalize the FAQ section in the documentation first.

@bertfrees
Copy link
Member

Could you still remove the label "needs test" for this issue here?

@DrSooom Hmm I doesn't look like there is a test yet. A (YAML) test is going to be needed regardless of how it is eventually implement, with a table or a special mode or whatever.

I've added the label to #689 as well.

This was referenced Nov 18, 2019
@aaclause
Copy link
Contributor

As mentioned in #730, the tables should either be distributed independently (outside of liblouis) or a new mode should be implemented that emits HUC Braille for unknown Unicode characters.

For info, I challenged myself to make a POC of this mode (in Python for now, probably in C later). Available here -> https://github.com/Andre9642/HUC-braille-converter
It's not probably the best approach/algorithms but it seems work. Works for HUC8 and HUC6.

@bertfrees
Copy link
Member

This issue is still open because we're keeping the possibility open to implement HUC via a new mode or a new opcode or other new feature, anything that doesn't result in a huge table.

To improve the chance that this gets picked up, it would be nice to have some kind of YAML test that explains the requirement in a simple way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement in the functionality (not a bug fix or a table improvement) idea Just an idea, there is no clear consensus yet that this is the way to go needs test A YAML test is needed (and should be committed) to explain the bug or expected behavior of a table
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants