UEB Documentation

This page is outdated. For information on the new UEB opcodes see https://github.com/liblouis/liblouis/wiki/Emphasis-Opcodes

Table of Contents

New opcodes
New Attributes
New/changed opcodes and attributes

New opcodes

firstwordcaps
lastwordbeforecaps
lastwordaftercaps
lencapsphrase
firstlettercaps
lastlettercaps
singlelettercaps
capsword
capswordstop
firstwordital
lastworditalbefore
lastworditalafter
firstletterital
lastletterital
singleletterital
italword
italwordstop
lenitalphrase
firstwordbold
lastwordboldbefore
lastwordboldafter
firstletterbold
lastletterbold
singleletterbold
boldword
boldwordstop
lenboldphrase
firstwordunder
lastwordunderbefore
lastwordunderafter
firstletterunder
lastletterunder
singleletterunder
underword
underwordstop
lenunderphrase
singleletterscript
scriptword
scriptwordstop
firstletterscript
lastletterscript
firstwordscript
lastwordscriptbefore
lastwordscriptafter
lenscriptphrase
singlelettertrans1
trans1word
trans1wordstop
firstlettertrans1
lastlettertrans1
firstwordtrans1
lastwordtrans1before
lastwordtrans1after
lentrans1phrase
singlelettertrans2
trans2word
trans2wordstop
firstlettertrans2
lastlettertrans2
firstwordtrans2
lastwordtrans2before
lastwordtrans2after
lentrans2phrase
singlelettertrans3
trans3word
trans3wordstop
firstlettertrans3
lastlettertrans3
firstwordtrans3
lastwordtrans3before
lastwordtrans3after
lentrans3phrase
singlelettertrans4
trans4word
trans4wordstop
firstlettertrans4
lastlettertrans4
firstwordtrans4
lastwordtrans4before
lastwordtrans4after
lentrans4phrase
singlelettertrans5
trans5word
trans5wordstop
firstlettertrans5
lastlettertrans5
firstwordtrans5
lastwordtrans5before
lastwordtrans5after
lentrans5phrase
singlelettertransnote
transnoteword
transnotewordstop
firstlettertransnote
lastlettertransnote
firstwordtransnote
lastwordtransnotebefore
lastwordtransnoteafter
lentransnotephrase
seqdelimiter
seqbeforechars
seqafterchars
seqafterpattern
nocontractsign
numericmodechars
numericnocontchars
capsmodechars

New Attributes

word_reset
passage_break

New/changed opcodes and attributes

Capital/emphasis indicators

emph = {caps, ital, bold, under, script, trans1, trans2, trans3, trans4, trans5}

firstword{emph}
lastwordbefore{emph}
lastwordafter{emph}
len{emph}phrase
firstletter{emph}
lastletter{emph}
singleletter{emph}
{emph}word
{emph}wordstop

The first step in determining what indicator to use where is that the beginning and ending of each consecutive sequence of the set emphasis are found and marked. When capitalization is checked, it will start when an uppercase letter is found, and continue, even past spaces, until a lowercase letter is found. If a space was not encountered, the end will be marked at the lowercase letter. If a space was encountered, the end will be marked at the last space after the non-space sequence that contained an uppercase letter.

If {emph}word is defined then each character sequence between spaces are checked if they contain a beginning or ending mark from the previous step, if they do then they are marked to use the {emph}word and if defined, the {emph}wordstop. Also if a sequence is contained entirely between the previous markings then it is also marked as a whole word. Also if singleletter{emph} is defined then any marked sequence of length one is set to use the singleletter{emph}.

If {emph}word is defined and len{emph}phrase is not zero then any consecutive whole word markings that are greater than or equal to len{emph}phrase, and does not contain a passage-break, are marked to use the firstletter{emph} and lastletter{emph} symbols. If lastletter{emph} is not defined and lastwordbefore{emph} or lastwordafter{emph} is defined, then they are set use them accordingly instead.

If {emph}word is not defined and singleletter{emph} is defined then any marked sequence of length one is set to use the singleletter{emph}.

If {emph}word and singleletter{emph} are not defined, then the markings are set to use firstletter{emph} and lastletter{emph} accordingly if they are defined.

Capitalization will also have an additional step to check. Non-letter characters not defined in capsmodechars or a word_reset contained in the typebuf will cause a word reset.

Sequence Delimiters

These opcodes can be used to define what a word or letter sequence are. When the "seqdelimiter" is used, the opcodes "word" and "contraction" use the definitions to determine whether or not they are to be used. If "seqdelimiter" is not used then the original implementation is used.

seqdelimiter [chars]

Sets a character attribute on all the listed characters such that they will designate a valid beginning and ending to a letter sequence used to determine when a letter sequence is "standing alone". Spaces do not need to be listed as they are automatically delimiters. For example, in UEB (section 2.6.1 page 15), any hyphen or dash count as delimiters. This opcode can be used several times, but the characters must have already been defined.

seqbeforechars [chars]

Sets a character attribute on all listed characters such that they may appear between a beginning sequence delimiter and the letter sequence itself. For example, in UEB (2.6.2, page 15), opening parenthesis and opening quotations and such are allowed. This opcode can be used several times, but the characters must have already been defined.

seqafterchars [chars]

Same as seqbeforechars except the characters can be between an ending sequence delimiter and the letter sequence itself. For example, in UEB (2.6.3, page 16), closing parenthesis and closing quotations and such are allowed.

seqafterpattern [string]

Specifies that a specific string of characters can be between an ending sequence delimiter and the letter sequence itself. For example, in UEB (section 2.6.4, page 18), the 'd, 's, 'll, 've, etc. can be after a letter sequence provided the overall sequence is "standing alone". This opcode is used multiple times, once per pattern.

Misc

numericmodechars [chars]

Sets a character attribute on all the listed characters such that when they are followed by a digit, then start numeric mode. If they are followed by another character set by numericmodechars, then checking continues until a digit or non-digit if found.

numericnocontchars [chars]

Sets a character attribute on all the listed characters such that they will have the nocontractsign set when numeric mode ends. This is used for instances when a letter a-j follows a digit, it must have the nocontractsign sign set (section 6.5.2, page 62).

nocontractsign [dots]

This symbol, if defined, is outputted when a "contraction" rule has been reached. It is also outputted when numeric mode ends on a character set by numericnocontchars.

Attributes

word_reset

When this bit is set on a character, that character will act as a word reset regardless of its attributes. This is only checked for capitalization.

passage_break

This bit is checked as passages are being resolved. Passages are not allowed to cross over and character with this bit set. reset and cannot cross it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly