Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numericnocontchars are not preceeded by nocontractsign when in a begcaps/endcaps block #631

Closed
BueVest opened this issue Aug 31, 2018 · 7 comments · Fixed by #845
Closed
Assignees
Labels
documentation Change in the user manual or wiki
Milestone

Comments

@BueVest
Copy link
Contributor

BueVest commented Aug 31, 2018

This issue is a spin-off of the discussion in #611 and the tests in issue-400.yaml.

If the first letter after a number (with no intervening space) is a numericnocontchar, a nocontractsign is always needed. Otherwise the letter will be interpreted as a digit by the reader and the back-translation routine. However, this does currently not happen, if the construct is part of a begcaps/endcaps block.
Once a caps block has been started, it is presumably not canceled by anything other than an endcaps indicator (as opposed to capsword). So, inserting a nocontractsign should not pose any problem to interpreting the caps block correctly.

@bertfrees bertfrees added the back-translation Anything related to backward translation label Sep 4, 2018
@BueVest
Copy link
Contributor Author

BueVest commented Sep 5, 2018

Sorry, @bertfrees, but this happens during forward translation , not during back-translation.
I will see if I can make any sense of the code.

@bertfrees bertfrees removed the back-translation Anything related to backward translation label Sep 6, 2018
@bertfrees
Copy link
Member

bertfrees commented Sep 26, 2018

To do:

  • Reference this issue from the issue-400.yaml test:
      # number does not cancel a block in capitals
      - - "ABC123ABC"
        - ",abc#abc;abc."
        - {xfail: missing nocontractsign after number}

@bertfrees bertfrees added the bug Bug in the code (not in a table) label Sep 26, 2018
BueVest added a commit to BueVest/liblouis that referenced this issue Sep 30, 2018
@BueVest
Copy link
Contributor Author

BueVest commented Oct 2, 2018

The test in question seems to fail for a simple reason: a, b and c are members of numericnocontchars, while A, B and C are not. We have a similar situation in the capsletter tests in line 29 ff, but here, we apparently don't want an extra nocontractsign because we already have the capsletter sign to separate digits from letters.

I thought there was a genuine bug, but it appears that there is rather a problem in the defined behaviour. with capsletter and capsword, capital letters after digits will automatically be marked with either of the two indicators, but if the number appears in the middle of a capsphrase, there will be no indicator too separate digits from capital letters.

I suppose this is mostly a UEB problem, if it is a problem. So, perhaps we should hear what the UEB people have to say about it.

In Danish, all letters (cap and small) are numericnocontchars, which I currently address through context lines. And we don't have the concept of capsphrase in Danish Braille, only capsletter and capsword.

So the test in question is xfail for a good reason, namely the problematic combi of capsphrase and numericnocontchars.

I am still willing to help address it, but what should we do? remove the test? Change the table to include
numericnocontchars abcdefghijABCDEFGHIJ?

@bertfrees
Copy link
Member

No I think we should keep the test. Maybe it's not a bug, but then at least we document a possible pitfall. The documentation should maybe also be clear about the fact that numericnocontchars is case sensitive.

numericnocontchars abcdefghijABCDEFGHIJ might work. But you might end up with a nocontractsign followed by a capsign in some situations, so you'd have to remove one of them in a second pass.

You could also argue that it should work out of the box: that numericnocontchars should compare dots rather than text characters. I think this would make the most sense.

An alternative solution could be to not treat digits as capital letters by default (and use capsmodechars 123456789 to override this), so that the capsign would have to be repeated, but this of course would need to match the braille standard of the language in question.

@BueVest
Copy link
Contributor Author

BueVest commented Oct 22, 2018 via email

@bertfrees
Copy link
Member

Sure, that would be great, thanks! I'd maybe also add a test to show how to work around the issue (with numericnocontchars abcdefghijABCDEFGHIJ e.g.), and maybe one specific to UEB (in tests/braille-specs/ueb-issue-x.yaml, and create a new issue).

@bertfrees
Copy link
Member

It doesn't seem to be an issue in UEB though. "ABC123ABC" translates to ",,abc#abc,,abc"...

@egli egli added documentation Change in the user manual or wiki and removed bug Bug in the code (not in a table) labels Nov 12, 2018
@BueVest BueVest mentioned this issue Sep 7, 2019
@bertfrees bertfrees added this to the 3.12 milestone Sep 7, 2019
@egli egli closed this as completed in #845 Sep 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Change in the user manual or wiki
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants