Skip to content
Christian Egli edited this page Jun 17, 2016 · 1 revision

This page is outdated. For information on the new UEB opcodes see https://github.com/liblouis/liblouis/wiki/Emphasis-Opcodes

Table of Contents

New opcodes

  • firstwordcaps
  • lastwordbeforecaps
  • lastwordaftercaps
  • lencapsphrase
  • firstlettercaps
  • lastlettercaps
  • singlelettercaps
  • capsword
  • capswordstop
  • firstwordital
  • lastworditalbefore
  • lastworditalafter
  • firstletterital
  • lastletterital
  • singleletterital
  • italword
  • italwordstop
  • lenitalphrase
  • firstwordbold
  • lastwordboldbefore
  • lastwordboldafter
  • firstletterbold
  • lastletterbold
  • singleletterbold
  • boldword
  • boldwordstop
  • lenboldphrase
  • firstwordunder
  • lastwordunderbefore
  • lastwordunderafter
  • firstletterunder
  • lastletterunder
  • singleletterunder
  • underword
  • underwordstop
  • lenunderphrase
  • singleletterscript
  • scriptword
  • scriptwordstop
  • firstletterscript
  • lastletterscript
  • firstwordscript
  • lastwordscriptbefore
  • lastwordscriptafter
  • lenscriptphrase
  • singlelettertrans1
  • trans1word
  • trans1wordstop
  • firstlettertrans1
  • lastlettertrans1
  • firstwordtrans1
  • lastwordtrans1before
  • lastwordtrans1after
  • lentrans1phrase
  • singlelettertrans2
  • trans2word
  • trans2wordstop
  • firstlettertrans2
  • lastlettertrans2
  • firstwordtrans2
  • lastwordtrans2before
  • lastwordtrans2after
  • lentrans2phrase
  • singlelettertrans3
  • trans3word
  • trans3wordstop
  • firstlettertrans3
  • lastlettertrans3
  • firstwordtrans3
  • lastwordtrans3before
  • lastwordtrans3after
  • lentrans3phrase
  • singlelettertrans4
  • trans4word
  • trans4wordstop
  • firstlettertrans4
  • lastlettertrans4
  • firstwordtrans4
  • lastwordtrans4before
  • lastwordtrans4after
  • lentrans4phrase
  • singlelettertrans5
  • trans5word
  • trans5wordstop
  • firstlettertrans5
  • lastlettertrans5
  • firstwordtrans5
  • lastwordtrans5before
  • lastwordtrans5after
  • lentrans5phrase
  • singlelettertransnote
  • transnoteword
  • transnotewordstop
  • firstlettertransnote
  • lastlettertransnote
  • firstwordtransnote
  • lastwordtransnotebefore
  • lastwordtransnoteafter
  • lentransnotephrase
  • seqdelimiter
  • seqbeforechars
  • seqafterchars
  • seqafterpattern
  • nocontractsign
  • numericmodechars
  • numericnocontchars
  • capsmodechars

New Attributes

  • word_reset
  • passage_break

New/changed opcodes and attributes

Capital/emphasis indicators

emph = {caps, ital, bold, under, script, trans1, trans2, trans3, trans4, trans5}

  • firstword{emph}
  • lastwordbefore{emph}
  • lastwordafter{emph}
  • len{emph}phrase
  • firstletter{emph}
  • lastletter{emph}
  • singleletter{emph}
  • {emph}word
  • {emph}wordstop

The first step in determining what indicator to use where is that the beginning and ending of each consecutive sequence of the set emphasis are found and marked. When capitalization is checked, it will start when an uppercase letter is found, and continue, even past spaces, until a lowercase letter is found. If a space was not encountered, the end will be marked at the lowercase letter. If a space was encountered, the end will be marked at the last space after the non-space sequence that contained an uppercase letter.

If {emph}word is defined then each character sequence between spaces are checked if they contain a beginning or ending mark from the previous step, if they do then they are marked to use the {emph}word and if defined, the {emph}wordstop. Also if a sequence is contained entirely between the previous markings then it is also marked as a whole word. Also if singleletter{emph} is defined then any marked sequence of length one is set to use the singleletter{emph}.

If {emph}word is defined and len{emph}phrase is not zero then any consecutive whole word markings that are greater than or equal to len{emph}phrase, and does not contain a passage-break, are marked to use the firstletter{emph} and lastletter{emph} symbols. If lastletter{emph} is not defined and lastwordbefore{emph} or lastwordafter{emph} is defined, then they are set use them accordingly instead.

If {emph}word is not defined and singleletter{emph} is defined then any marked sequence of length one is set to use the singleletter{emph}.

If {emph}word and singleletter{emph} are not defined, then the markings are set to use firstletter{emph} and lastletter{emph} accordingly if they are defined.

Capitalization will also have an additional step to check. Non-letter characters not defined in capsmodechars or a word_reset contained in the typebuf will cause a word reset.

Sequence Delimiters

These opcodes can be used to define what a word or letter sequence are. When the "seqdelimiter" is used, the opcodes "word" and "contraction" use the definitions to determine whether or not they are to be used. If "seqdelimiter" is not used then the original implementation is used.

seqdelimiter [chars]

Sets a character attribute on all the listed characters such that they will designate a valid beginning and ending to a letter sequence used to determine when a letter sequence is "standing alone". Spaces do not need to be listed as they are automatically delimiters. For example, in UEB (section 2.6.1 page 15), any hyphen or dash count as delimiters. This opcode can be used several times, but the characters must have already been defined.

seqbeforechars [chars]

Sets a character attribute on all listed characters such that they may appear between a beginning sequence delimiter and the letter sequence itself. For example, in UEB (2.6.2, page 15), opening parenthesis and opening quotations and such are allowed. This opcode can be used several times, but the characters must have already been defined.

seqafterchars [chars]

Same as seqbeforechars except the characters can be between an ending sequence delimiter and the letter sequence itself. For example, in UEB (2.6.3, page 16), closing parenthesis and closing quotations and such are allowed.

seqafterpattern [string]

Specifies that a specific string of characters can be between an ending sequence delimiter and the letter sequence itself. For example, in UEB (section 2.6.4, page 18), the 'd, 's, 'll, 've, etc. can be after a letter sequence provided the overall sequence is "standing alone". This opcode is used multiple times, once per pattern.

Misc

numericmodechars [chars]

Sets a character attribute on all the listed characters such that when they are followed by a digit, then start numeric mode. If they are followed by another character set by numericmodechars, then checking continues until a digit or non-digit if found.

numericnocontchars [chars]

Sets a character attribute on all the listed characters such that they will have the nocontractsign set when numeric mode ends. This is used for instances when a letter a-j follows a digit, it must have the nocontractsign sign set (section 6.5.2, page 62).

nocontractsign [dots]

This symbol, if defined, is outputted when a "contraction" rule has been reached. It is also outputted when numeric mode ends on a character set by numericnocontchars.

Attributes

word_reset

When this bit is set on a character, that character will act as a word reset regardless of its attributes. This is only checked for capitalization.

passage_break

This bit is checked as passages are being resolved. Passages are not allowed to cross over and character with this bit set. reset and cannot cross it.