Skip to content

Commit

Permalink
Fix bug in matching of context rules that begin with uppercase letters
Browse files Browse the repository at this point in the history
This fixes #1461.
  • Loading branch information
bertfrees committed Nov 28, 2023
1 parent 8a0e5e4 commit 34986dd
Show file tree
Hide file tree
Showing 2 changed files with 87 additions and 17 deletions.
42 changes: 42 additions & 0 deletions liblouis/compileTranslationTable.c
Original file line number Diff line number Diff line change
Expand Up @@ -4461,6 +4461,48 @@ finalizeTable(TranslationTableHeader *table) {
characterOffset = character->next;
}
}
// Rearrange rules in `forRules' so that when iterating over candidate rules in
// for_selectRule(), both case-sensitive and case-insensitive rules are contained
// within the same ordered list. We do the rearrangement by iterating over all
// case-sensitive rules and if needed move them to another bucket. This may slow down
// the compilation of tables with a lot of context rules, but the good news is that
// translation speed is not affected.
for (unsigned long int i = 0; i < HASHNUM; i++) {
TranslationTableOffset *p = &table->forRules[i];
while (*p) {
TranslationTableRule *rule = (TranslationTableRule *)&table->ruleArea[*p];
// For now only move the rules that we know are case-sensitive, namely
// `context' rules. (Note that there may be other case-sensitive rules that
// we're currently not aware of.) We don't move case insensitive rules because
// the user can/should define them using all lowercases.
if (rule->opcode == CTO_Context) {
unsigned long int hash = _lou_stringHash(&rule->charsdots[0], 1, table);
// no need to do anything if the first two characters are not uppercase
// letters
if (hash != i) {
// compute new position
TranslationTableOffset *insert_at = &table->forRules[hash];
while (*insert_at) {
TranslationTableRule *r =
(TranslationTableRule *)&table->ruleArea[*insert_at];
if (rule->charslen > r->charslen)
break;
else if (rule->charslen == r->charslen && r->opcode == CTO_Always)
break;
insert_at = &r->charsnext;
}
// remove rule from current list and insert it at the correct position
// in the new list
TranslationTableOffset next = rule->charsnext;
rule->charsnext = *insert_at;
*insert_at = *p;
*p = next;
continue;
}
}
p = &rule->charsnext;
}
}
table->finalized = 1;
return 1;
}
Expand Down
62 changes: 45 additions & 17 deletions tests/yaml/case-sensitivity.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# when a table does not define any capital marks, rules should be case sensitive
# see second part of issue #498: https://github.com/liblouis/liblouis/issues/498#issuecomment-498358354
display: tables/unicode.dis

# when a table does not define any capital marks, rules are case sensitive
# test adapted from https://github.com/liblouis/liblouis/issues/498
table: |
space \s 0
uppercase A 17
Expand All @@ -19,21 +20,18 @@ table: |
lowercase r 1235
lowercase y 13456
lowercase z 1356
noback context "goal"$s"crazy" @2347-1367-147-147-2
noback context "goal"$s"Crazy" @2347-1367-147-147-23
noback context "Goal"$s"crazy" @2347-1367-147-147-25
noback context "Goal"$s"Crazy" @2347-1367-147-147-256
always goal\scrazy 2347-1367-147-147-2
always goal\sCrazy 2347-1367-147-147-23
always Goal\scrazy 2347-1367-147-147-25
always Goal\sCrazy 2347-1367-147-147-256
tests:
- [goal crazy, ⡎⡥⡉⡉⠂] # first context rule matches
- [goal Crazy, ⡎⡥⡉⡉⠆] # second context rule matches
- [Goal crazy, ⡎⡥⡉⡉⠒] # third context rule matches
- [Goal Crazy, ⡎⡥⡉⡉⠲] # fourth context rule matches
- [goal crazy, ⡎⡥⡉⡉⠂] # first always rule matches
- [goal Crazy, ⡎⡥⡉⡉⠆, xfail: uppercase C in the middle cancels rule]
- [Goal crazy, ⡎⡥⡉⡉⠒] # third always rule matches
- [Goal Crazy, ⡎⡥⡉⡉⠲, xfail: uppercase C in the middle cancels rule]

# when a table uses "base uppercase" but does not define any capital
# marks, the behavior is not really consistent when it comes to case
# sensitivity: context rules are case sensitive with regard to
# uppercase letters within the rule, but when the rule starts with an
# uppercase letter, it never matches at all.
# context rules are always case sensitive
# test adapted from https://github.com/liblouis/liblouis/issues/498
table: |
space \s 0
lowercase a 1
Expand All @@ -59,5 +57,35 @@ table: |
tests:
- [goal crazy, ⡎⡥⡉⡉⠂] # first context rule matches
- [goal Crazy, ⡎⡥⡉⡉⠆] # second context rule matches
- [Goal crazy, ⠛⠕⠁⠇⠀⠉⠗⠁⠵⠽] # no context rule matches
- [Goal Crazy, ⠛⠕⠁⠇⠀⠉⠗⠁⠵⠽] # no context rule matches
- [Goal crazy, ⡎⡥⡉⡉⠒] # third context rule matches
- [Goal Crazy, ⡎⡥⡉⡉⠲] # fourth context rule matches

# test taken from https://github.com/liblouis/liblouis/issues/1461
table: |
include tables/latinLetterDef6Dots.uti
include tables/spaces.uti
capsletter 6
begcapsword 6-6
lencapsphrase 3
begcapsphrase 6-6-6
endcapsphrase after 6-3
always ar 345
noback context _$l["ound"] @46-145a
noback context _$l["OUND"] @46-145a
noback pass2 [@6-46-145a] @6-1256-1345-145
noback pass2 [@6-6-46-145a] @6-6-1256-1345-145
noback pass2 [@6-3-46-145a] @6-3-1256-1345-145
noback pass2 @46-145a @46-145
tests:
- - "around"
- "⠜⠨⠙"
- - "Around"
- "⠠⠜⠨⠙"
- - "arOund"
- "⠜⠠⠕⠥⠝⠙"
- - "AROUND"
- "⠠⠠⠜⠨⠙"
- - "MOVE AROUND"
- "⠠⠠⠍⠕⠧⠑⠀⠠⠠⠜⠨⠙"
- - "GO MOVE AROUND"
- "⠠⠠⠠⠛⠕⠀⠍⠕⠧⠑⠀⠜⠨⠙⠠⠄"

0 comments on commit 34986dd

Please sign in to comment.