Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent precedence of character definition opcodes when capsletter is in use #384

Closed
rimas-kudelis opened this issue Aug 24, 2017 · 15 comments
Labels
bug Bug in the code (not in a table)
Milestone

Comments

@rimas-kudelis
Copy link
Contributor

rimas-kudelis commented Aug 24, 2017

I have a table like this:

$ cat a.tbl 
uplow Aa 1

uppercase B 12
uppercase B 1
lowercase b 12
lowercase b 1

uplow Cc 14

uplow Dd 145
uplow Dd 14

uplow Ee 15

uplow Ff 124
uplow Ff 15,15

uplow Gg 1245

uplow Hh 125,125
uplow Hh 1245,1245

#sign U 136
#capsletter 136

With this table, as long as the capsletter opcode is not in use, the precedence of conflicting definitions of the same character appears to be top-to-bottom, that is, the topmost definition wins:

$ echo "Aa Bb Cc Dd Ee Ff Gg Hh" | lou_translate -f a.tbl 
aa BB cc dd ee ff gg hh

However, if I uncomment the two capsletter lines, a funny thing starts happening: for uppercase letters, this precedence rule gets reversed and the bottommost rule becomes the winning one:

$ echo "Aa Bb Cc Dd Ee Ff Gg Hh" | lou_translate -f a.tbl 
Uaa UaB Ucc Ucd Uee Uef Ugg Ugh

Am I wrong, or is this a bug?

@bertfrees
Copy link
Member

Not sure what exactly is happening here. That needs some investigation.

Whether or not it is a "bug" is hard to say because the documentation doesn't mention this specific case, and the behavior has probably always been like this, so it's hard to find out whether it is intentional.

If it turns out the current behavior has no clear purpose and is really too confusing, we can change it (and document the new behavior).

It is kind of understandable though that this is currently not documented. Defining the same character twice with different dot patterns is really something you aren't expected to do. Why would you ever want to do this in real life?

@rimas-kudelis
Copy link
Contributor Author

rimas-kudelis commented Aug 25, 2017

The real life example is including a generalized table file, and overriding some of its definitions from within the including table.

In my particular case, I want to create fallbackAccentedLatinLetterDef6Dots.uti and fallbackAccentedLatinLetterDef8Dots.uti files (better name suggestions are welcome) which will map all accented Latin letters and ligatures to their non-accented equivalents, like this:

noback uplow     \x00c5\x00e5 1        Åå LATIN CAPITAL LETTER A WITH RING ABOVE,LATIN SMALL LETTER A WITH RING ABOVE
noback uplow     \x00c6\x00e6 1-15     Ææ LATIN CAPITAL LETTER AE,LATIN SMALL LETTER AE

There is a huge number of characters like this in Unicode, and having such generic table would allow to have at least some degree of support for all these characters without cluttering the language table with hundreds of these barely relevant definitions. But there has to be a guaranteed way to override such definitions from within the including table, e.g. to define characters that exist in the national alphabet.

In fact, I just checked and this is exactly what the Latvian table already does. For historical reasons, Latvians use a slightly different mapping of three Latin letters, and here's the relevant excerpt from Lv-Lv-g1.utb:

# define the dot combinations that are different from the default.
# placed before the include to take precedence.
uplow Uu 34                      letter U *** Different from other langs ***
uplow Vv 2456                    letter V *** Different from other langs ***
uplow Zz 345                     letter Z *** Different from other langs ***
include latinLetterDef6Dots.uti

And it doesn't work as expected:

$ echo 'Tt Uu Vv Zz' | lou_translate -f lv.tbl 
,tt ,u/ ,vw ,z>
$ echo 'Tt Uu Vv Zz' | lou_translate -f unicode.dis,lv.tbl
⠠⠞⠞ ⠠⠥⠌ ⠠⠧⠺ ⠠⠵⠜

Note how capital letters U, V and Z get different dot mappings from lowercase ones, which is clearly unintended and unexpected.

@bertfrees
Copy link
Member

Thanks. OK I guess this is a valid use case indeed. In general I think including another table and overriding parts of it is not a good idea, but if it is done in a controlled way, like in your example where the included table has a very specific function, it is OK.

So the next step is to find out what exactly happens in the code and document it. Don't hesitate if you want to help with this.

Also if you are sure that the Latvian table does not work correctly, could you please add some tests?

@rimas-kudelis
Copy link
Contributor Author

rimas-kudelis commented Aug 31, 2017

@bertfrees I don't really speak or write Latvian, but I guess I could prepare a few simplistic tests, like I did for Lithuanian.

@bertfrees
Copy link
Member

Oh, didn't realize that. Anyway, a few simplistic tests would already be great. Thanks!

rimas-kudelis added a commit to rimas-kudelis/liblouis that referenced this issue Sep 13, 2017
…ions to it.

The tests and corrections are based on the information supplied in the World Braille Usage report, third edition.
The second test currently fails due to liblouis#384.
rimas-kudelis added a commit to rimas-kudelis/liblouis that referenced this issue Sep 13, 2017
…ions to it.

The tests and corrections are based on the information supplied in the World Braille Usage report, third edition.
The second test currently fails due to liblouis#384.
rimas-kudelis added a commit to rimas-kudelis/liblouis that referenced this issue Sep 13, 2017
rimas-kudelis added a commit to rimas-kudelis/liblouis that referenced this issue Sep 13, 2017
@rimas-kudelis
Copy link
Contributor Author

@bertfrees as you can see, I've made a couple pull requests with failing tests related to this issue. Hope this can progress.

@rimas-kudelis
Copy link
Contributor Author

Another observation: the dots reported by lou_trace are different than the ones present in the output:

$ lou_trace -f unicode.dis,lv.tbl
Aa Uu Vv
⠠⠁⠁ ⠠⠥⠌ ⠠⠧⠺
1.	uppercase	A	1
2.	lowercase	a	1
3.	space	 	0
4.	uppercase	U	34
5.	lowercase	u	34
6.	space	 	0
7.	uppercase	V	2456
8.	lowercase	v	2456

@bertfrees
Copy link
Member

Brilliant, thank you!

@bertfrees bertfrees added the bug Bug in the code (not in a table) label Sep 14, 2017
egli added a commit to rimas-kudelis/liblouis that referenced this issue Sep 15, 2017
bertfrees pushed a commit that referenced this issue Oct 6, 2017
bertfrees pushed a commit that referenced this issue Oct 6, 2017
…ions to it.

The tests and corrections are based on the information supplied in the World Braille Usage report, third edition.
The second test currently fails due to #384.
bertfrees pushed a commit that referenced this issue Oct 6, 2017
@rimas-kudelis rimas-kudelis changed the title Inconsistent precedency of character definition opcodes when capsletter is in use Inconsistent precedence of character definition opcodes when capsletter is in use Nov 21, 2017
@bertfrees
Copy link
Member

See 6bf242e

@rimas-kudelis
Copy link
Contributor Author

@bertfrees that patch problably works around the problem for the Latvian case, but doesn't really fix the underlying issue. I suggest to at least mention this issue in a comment in that table.
Also, this is still a bug, isn't it? Will you maybe have time to look at what exactly is happening here and why?

@bertfrees
Copy link
Member

Yes, it's just a workaround. OK, I'll add a reference to this issue.

I don't have time now, but maybe after the release...

@bertfrees bertfrees added the needs test A YAML test is needed (and should be committed) to explain the bug or expected behavior of a table label Jun 21, 2019
@bertfrees
Copy link
Member

I've added the label "needs test" because there needs to be a YAML file whose purpose is to give an overview of the various precedence rules. Initially it should simply document the current behavior including bugs (and obvious bugs can be fixed of course), but it will also provide us a way to look at the whole picture and find inconsistencies and more subtle issues, and define a new expected behavior based on this.

bertfrees added a commit to Ronan555/liblouis that referenced this issue Aug 3, 2019
This has to do with issue liblouis#384.
@bertfrees bertfrees changed the title Inconsistent precedence of character definition opcodes when capsletter is in use Inconsistent precedence of character definition opcodes when capsletter is in use Aug 14, 2019
bertfrees added a commit to Ronan555/liblouis that referenced this issue Aug 22, 2019
This has to do with issue liblouis#384.
bertfrees added a commit to Ronan555/liblouis that referenced this issue Aug 27, 2019
This has to do with issue liblouis#384.
@bertfrees bertfrees added help wanted Maintainers want help because they don't have the knowledge or the time, or for another reason prio:low Low priority - minor issue, might never be fixed (but a reminder is kept) labels Feb 17, 2020
@bertfrees
Copy link
Member

bertfrees commented Feb 17, 2020

I've created the start of this "precedence.yaml" test in 80f8d5a, and also merged issue-384.yaml into it.

@bertfrees bertfrees removed the needs test A YAML test is needed (and should be committed) to explain the bug or expected behavior of a table label Feb 17, 2020
bertfrees added a commit to Futyn-Maker/liblouis that referenced this issue Aug 24, 2022
bertfrees added a commit to Futyn-Maker/liblouis that referenced this issue Aug 24, 2022
bertfrees added a commit to Futyn-Maker/liblouis that referenced this issue Aug 24, 2022
bertfrees added a commit to Futyn-Maker/liblouis that referenced this issue Aug 24, 2022
bertfrees added a commit to Futyn-Maker/liblouis that referenced this issue Aug 29, 2022
bertfrees added a commit to Futyn-Maker/liblouis that referenced this issue Aug 29, 2022
@jrbowden
Copy link
Contributor

I have the same use case: needing to override parts of English tables for example for special treatment of accented letters.

bertfrees added a commit to danghoaiphuc/liblouis that referenced this issue Mar 4, 2023
bertfrees added a commit to danghoaiphuc/liblouis that referenced this issue Mar 4, 2023
bertfrees added a commit to danghoaiphuc/liblouis that referenced this issue Mar 4, 2023
@bertfrees bertfrees self-assigned this Jun 5, 2023
@bertfrees bertfrees added this to the 3.27 milestone Jun 5, 2023
@bertfrees bertfrees modified the milestones: 3.27, 3.28 Aug 24, 2023
@bertfrees bertfrees removed help wanted Maintainers want help because they don't have the knowledge or the time, or for another reason prio:low Low priority - minor issue, might never be fixed (but a reminder is kept) labels Nov 24, 2023
@bertfrees
Copy link
Member

Will be fixed by #1481

@bertfrees bertfrees removed their assignment Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug in the code (not in a table)
Projects
None yet
Development

No branches or pull requests

3 participants