-
-
Notifications
You must be signed in to change notification settings - Fork 204
/
liblouis.texi
2933 lines (2412 loc) · 102 KB
/
liblouis.texi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\input texinfo
@c %**start of header
@setfilename liblouis.info
@documentencoding UTF-8
@include version.texi
@settitle Liblouis User's and Programmer's Manual
@dircategory Misc
@direntry
* Liblouis: (liblouis). A braille translator and back-translator
@end direntry
@finalout
@c Macro definitions
@defindex opcode
@c Opcode.
@macro opcode{name, args}
@opcodeindex \name\
@anchor{\name\ opcode}
@item \name\ \args\
@end macro
@macro opcoderef{name}
@code{\name\} opcode (@pxref{\name\ opcode,\name\,@code{\name\}})
@end macro
@c Opcode.
@macro deprecatedopcode{name, args, replacement}
@opcodeindex \name\
@anchor{\name\ opcode}
@item \name\ \args\
This opcode is deprecated. Use the @opcoderef{\replacement\} instead.
@end macro
@copying
This manual is for liblouis (version @value{VERSION}, @value{UPDATED}),
a Braille Translation and Back-Translation Library derived from the
Linux screen reader @acronym{BRLTTY}.
@vskip 10pt
@noindent
Copyright @copyright{} 1999-2006 by the BRLTTY Team.
@noindent
Copyright @copyright{} 2004-2007 ViewPlus Technologies, Inc.
@uref{www.viewplus.com}.
@noindent
Copyright @copyright{} 2007, 2009 Abilitiessoft, Inc.
@uref{www.abilitiessoft.org}.
@noindent
Copyright @copyright{} 2014, 2016 Swiss Library for the Blind, Visually
Impaired and Print Disabled. @uref{www.sbs.ch}.
@vskip 10pt
@quotation
This file is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser (or library) General Public License
(LGPL) as published by the Free Software Foundation; either version 3,
or (at your option) any later version.
This file is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser (or Library) General Public License LGPL for more details.
You should have received a copy of the GNU Lesser (or Library) General
Public License (LGPL) along with this program; see the file COPYING.
If not, write to the Free Software Foundation, 51 Franklin Street,
Fifth Floor, Boston, MA 02110-1301, USA.
@end quotation
@end copying
@titlepage
@title Liblouis User's and Programmer's Manual
@subtitle for version @value{VERSION}, @value{UPDATED}
@author by John J. Boyer
@c The following two commands start the copyright page.
@page
@vskip 0pt plus 1filll
@insertcopying
@end titlepage
@c Output the table of contents at the beginning.
@contents
@ifnottex
@node Top
@top Liblouis User's and Programmer's Manual
@insertcopying
@end ifnottex
@menu
* Introduction::
* How to Write Translation Tables::
* Notes on Back-Translation::
* Testing Translation Tables interactively::
* Automated Testing of Translation Tables::
* Programming with liblouis::
* Concept Index::
* Opcode Index::
* Function Index::
* Program Index::
@detailmenu
--- The Detailed Node Listing ---
How to Write Translation Tables
* Overview::
* Hyphenation Tables::
* Character-Definition Opcodes::
* Braille Indicator Opcodes::
* Emphasis Opcodes::
* Special Symbol Opcodes::
* Special Processing Opcodes::
* Translation Opcodes::
* Character-Class Opcodes::
* Swap Opcodes::
* The Context and Multipass Opcodes::
* The correct Opcode::
* Miscellaneous Opcodes::
Emphasis Opcodes
* Emphasis class::
* Contexts::
* Fallback behavior::
* Computer braille::
Contexts
* None::
* Letter::
* Word::
* Phrase::
* Symbol::
Testing Translation Tables interactively
* lou_debug::
* lou_trace::
* lou_checktable::
* lou_allround::
* lou_translate (program)::
* lou_checkhyphens::
Automated Testing of Translation Tables
* YAML Tests::
* Doctests::
Programming with liblouis
* Overview (library)::
* Data structure of liblouis tables::
* lou_version::
* lou_translateString::
* lou_translate::
* lou_backTranslateString::
* lou_backTranslate::
* lou_hyphenate::
* lou_compileString::
* lou_getTypeformForEmphClass::
* lou_dotsToChar::
* lou_charToDots::
* lou_registerLogCallback::
* lou_setLogLevel::
* lou_logFile::
* lou_logPrint::
* lou_logEnd::
* lou_setDataPath::
* lou_getDataPath::
* lou_getTable::
* lou_checkTable::
* lou_readCharFromFile::
* lou_free::
* lou_charSize::
* Python bindings::
@end detailmenu
@end menu
@node Introduction
@chapter Introduction
Liblouis is an open-source braille translator and back-translator
derived from the translation routines in the BRLTTY screen reader for
Linux. It has, however, gone far beyond these routines. It is named in
honor of Louis Braille. In Linux and Mac OSX it is a shared library,
and in Windows it is a DLL. For installation instructions see the
README file. Please report bugs and oddities to the mailing list,
@email{liblouis-liblouisxml@@freelists.org}
This documentation is derived from Chapter 7 of the BRLTTY manual, but
it has been extensively rewritten to cover new features.
@section Who is this manual for
This manual has two main audiences: People who want to write or
improve a braille translation table and people who want to use the
braille translator library in their own programs. This manual is
probably not for people who are looking for some turn-key braille
translation software.
@section How to read this manual
If you are mostly interested in writing braille translation tables
then you want to focus on @ref{How to Write Translation Tables}. You
might want to look at @ref{Notes on Back-Translation} if you are
interested in back-translation. Finally @ref{Testing Translation
Tables interactively} and @ref{Automated Testing of Translation
Tables} will show how your braille translation tables can be tested
interactively and also in an automated fashion.
If you want to use the braille translation library in your own program
or you are interested in enhancing the braille translation library
itself then you will want to look at @ref{Programming with liblouis}.
@node How to Write Translation Tables
@chapter How to Write Translation Tables
For many languages there is already a translation table, so before
creating a new table start by looking at existing tables to modify
them as needed.
Typically, a braille translation table consists of several parts.
First are header and includes, in which you write what the table is
for, license information and include tables you need for your table.
Following this, you'll write various translation rules and lastly you
write special rules to handle certain situations.
@cindex opcode
A translation rule is composed of at least three parts: the opcode
(translation command), character(s) and braille dots. An opcode is a
command you give to a machine or a program to perform something on
your behalf. In liblouis, an opcode tells it which rule to use when
translating characters into braille. An operand can be thought of as
parameters for the translation rule and is composed of two parts: the
character or word to be translated and the braille dots.
For example, suppose you want to read the word @samp{world} using
braille dots @samp{456}, followed by the letter @samp{W} all the time.
Then you'd write:
@example
always world 456-2456
@end example
The word @code{always} is an opcode which tells liblouis to always
honor this translation, that is to say when the word @samp{world} (an
operand) is encountered, always show braille dots @samp{456} followed
by the letter @samp{w} (@samp{2456}).
When you write any braille table for any language, we'd recommend
working from some sort of official standard, and have a device or a
program in which you can test your work.
@menu
* Overview::
* Hyphenation Tables::
* Character-Definition Opcodes::
* Braille Indicator Opcodes::
* Emphasis Opcodes::
* Special Symbol Opcodes::
* Special Processing Opcodes::
* Translation Opcodes::
* Character-Class Opcodes::
* Swap Opcodes::
* The Context and Multipass Opcodes::
* The correct Opcode::
* Miscellaneous Opcodes::
@end menu
@node Overview
@section Overview
Many translation (contraction) tables have already been made up. They
are included in the distribution in the tables directory and can be
studied as part of the documentation. Some of the more helpful (and
normative) are listed in the following table:
@table @file
@item chardefs.cti
Character definitions for U.S. tables
@item compress.ctb
Remove excessive whitespace
@item en-us-g1.ctb
Uncontracted American English
@item en-us-g2.ctb
Contracted or Grade 2 American English
@item en-us-brf.dis
Make liblouis output conform to BRF standard
@item en-us-comp8.ctb
8-dot computer braille for use in coding examples
@item en-us-comp6.ctb
6-dot computer braille
@item nemeth.ctb
Nemeth Code translation for use with liblouisutdml
@item nemeth_edit.ctb
Fixes errors at the boundaries of math and text
@end table
The names used for files containing translation tables are completely
arbitrary. They are not interpreted in any way by the translator.
Contraction tables may be 8-bit ASCII files, UTF-8, 16-bit big-endian
Unicode files or 16-bit little-endian Unicode files. Blank lines are
ignored. Any leading and trailing whitespace (any number of blanks
and/or tabs) is ignored. Lines which begin with a number sign or hatch
mark (@samp{#}) are ignored, i.e. they are comments. If the number
sign is not the first non-blank character in the line, it is treated
as an ordinary character. If the first non-blank character is
less-than (@samp{<}) the line is also treated as a comment. This makes
it possible to mark up tables as xhtml documents. Lines which are not
blank or comments define table entries. The general format of a table
entry is:
@example
opcode operands comments
@end example
Table entries may not be split between lines. The opcode is a mnemonic
that specifies what the entry does. The operands may be character
sequences, braille dot patterns or occasionally something else. They
are described for each opcode, please @pxref{Opcode Index}. With some
exceptions, opcodes expect a certain number of operands. Any text on
the line after the last operand is ignored, and may be a comment. A
few opcodes accept a variable number of operands. In this case a
number sign (@samp{#}) begins a comment unless it is preceded by a
backslash (@samp{\}).
Here are some examples of table entries.
@example
# This is a comment.
always world 456-2456 A word and the dot pattern of its contraction
@end example
Most opcodes have both a "characters" operand and a "dots" operand,
though some have only one and a few have other types.
@cindex characters operand
The characters operand consists of any combination of characters and
escape sequences proceeded and followed by whitespace. Escape
sequences are used to represent difficult characters. They begin with
a backslash (@samp{\}). They are:
@table @kbd
@item \
backslash
@item \f
form feed
@item \n
new line
@item \r
carriage return
@item \s
blank (space)
@item \t
horizontal tab
@item \v
vertical tab
@item \e
"escape" character (hex 1b, dec 27)
@item \xhhhh
4-digit hexadecimal value of a character
@end table
If liblouis has been compiled for 32-bit Unicode the following are
also recognized.
@table @kbd
@item \yhhhhh
5-digit (20 bit) character
@item \zhhhhhhhh
Full 32-bit value.
@end table
@cindex dots operand
The dots operand is a braille dot pattern. The real braille dots, 1
through 8, must be specified with their standard numbers.
@cindex virtual dots
@anchor{virtual dots}
liblouis recognizes @emph{virtual dots}, which are used for special
purposes, such as distinguishing accent marks. There are seven virtual
dots. They are specified by the number 9 and the letters @samp{a}
through @samp{f}.
@cindex multi-cell dot pattern
For a multi-cell dot pattern, the cell specifications must be
separated from one another by a dash (@samp{-}). For example, the
contraction for the English word @samp{lord} (the letter @samp{l}
preceded by dot 5) would be specified as @samp{5-123}. A space may be
specified with the special dot number 0.
An opcode which is helpful in writing translation tables is
@code{include}. Its format is:
@example
include filename
@end example
It reads the file indicated by @code{filename} and incorporates or
includes its entries into the table. Included files can include other
files, which can include other files, etc. For an example, see what
files are included by the entry @code{include en-us-g1.ctb} in the table
@file{en-us-g2.ctb}. If the included file is not in the same directory
as the main table, use a full path name for filename. Tables can also be
specified in a table list, in which the table names are separated by
commas and given as a single table name in calls to the translation
functions.
The order of the various types of opcodes or table entries is
important. Character-definition opcodes should come first. However, if
the optional @opcoderef{display} is used it should precede
character-definition opcodes. Braille-indicator opcodes should come
next. Translation opcodes should follow. The @opcoderef{context} is a
translation opcode, even though it is considered along with the
multipass opcodes. These latter should follow the translation opcodes.
The @opcoderef{correct} can be used anywhere after the
character-definition opcodes, but it is probably a good idea to group
all @code{correct} opcodes together. The @opcoderef{include} can be
used anywhere, but the order of entries in the combined table must
conform to the order given above. Within each type of opcode, the
order of entries is generally unimportant. Thus the translation
entries can be grouped alphabetically or in any other order that is
convenient. Hyphenation tables may be specified either with an
@code{include} opcode or as part of a table list. They should come after
everything else. Character-definition opcodes are necessary for
hyphenation tables to work.
@node Hyphenation Tables
@section Hyphenation Tables
Hyphenation tables are necessary to make opcodes such as the
@opcoderef{nocross} function properly. There are no opcodes for
hyphenation table entries because these tables have a special format.
Therefore, they cannot be specified as part of an ordinary table.
Rather, they must be included using the @opcoderef{include} or as part
of a table list. The liblouis hyphenation algorithm was adopted from the
one used by OpenOffice. Note that Hyphenation tables must follow
character definitions and should preferably be the last. For an example
of a hyphenation table, see @file{hyph_en_US.dic}.
@node Character-Definition Opcodes
@section Character-Definition Opcodes
These opcodes are needed to define attributes such as digit,
punctuation, letter, etc. for all characters and their dot patterns.
liblouis has no built-in character definitions, but such definitions
are essential to the operation of the @opcoderef{context}, the
@opcoderef{correct}, the multipass opcodes and the back-translator. If
the dot pattern is a single cell, it is used to define the mapping
between dot patterns and characters, unless a @opcoderef{display} for
that character-dot-pattern pair has been used previously. If only a
single-cell dot pattern has been given for a character, that dot
pattern is defined with the character's own attributes. If more than
one cell is given and some of them have not previously been defined as
single cells, the undefined cells are entered into the dots table with
the space attribute. This is done for backward compatibility with
old tables, but it may cause problems with the above opcodes or
back-translation. For this reason, every single-cell dot pattern
should be defined before it is used in a multi-cell character
representation. The best way to do this is to use the 8-dot computer
braille representation for the particular braille code. If a character
or dot pattern used in any rule, except those with the @code{display}
opcode, the @opcoderef{repeated} or the @opcoderef{replace}, is not
defined by one of the character-definition opcodes, liblouis will give
an error message and refuse to continue until the problem is fixed. If
the translator or back-translator encounters an undefined character in
its input it produces a succinct error indication in its output, and
the character is treated as a space.
@table @code
@opcode{space, character dots}
Defines a character as a space and also defines the dot pattern as
such. for example:
@example
space \s 0 \s is the escape sequence for blank; 0 means no dots.
@end example
@opcode{punctuation, character dots}
Associates a punctuation mark in the particular language with a
braille representation and defines the character and dot pattern as
punctuation. For example:
@example
punctuation . 46 dot pattern for period in NAB computer braille
@end example
@opcode{digit, character dots}
Associates a digit with a dot pattern and defines the character as a
digit. For example:
@example
digit 0 356 NAB computer braille
@end example
@opcode{uplow, characters dots [@comma{}dots]}
The characters operand must be a pair of letters, of which the first
is uppercase and the second lowercase. The first dots suboperand
indicates the dot pattern for the upper-case letter. It may have more
than one cell. The second dots suboperand must be separated from the
first by a comma and is optional, as indicated by the square brackets.
If present, it indicates the dot pattern for the lower-case letter. It
may also have more than one cell. If the second dots suboperand is not
present the first is used for the lower-case letter as well as the
upper-case letter. This opcode is needed because not all languages
follow a consistent pattern in assigning Unicode codes to upper and
lower case letters. It should be used even for languages that do. The
distinction is important in the forward translator. for example:
@example
uplow Aa 17,1
@end example
@opcode{grouping, name characters dots @comma{}dots}
This opcode is used to indicate pairs of grouping symbols used in
processing mathematical expressions. These symbols are usually
generated by the MathML interpreter in liblouisutdml. They are used in
multipass opcodes. The name operand must contain only letters, but
they may be upper- or lower-case. The characters operand must contain
exactly two Unicode characters. The dots operand must contain exactly
two braille cells, separated by a comma. Note that grouping dot
patterns also need to be declared with the @opcoderef{exactdots}. The
characters may need to be declared with the @opcoderef{math}.
@example
grouping mrow \x0001\x0002 1e,2e
grouping mfrac \x0003\x0004 3e,4e
@end example
@opcode{letter, character dots}
Associates a letter in the language with a braille representation and
defines the character as a letter. This is intended for letters which
are neither uppercase nor lowercase.
@opcode{lowercase, character dots}
Associates a character with a dot pattern and defines the character as
a lowercase letter. Both the character and the dot pattern have the
attributes lowercase and letter.
@opcode{uppercase, character dots}
Associates a character with a dot pattern and defines the character as
an uppercase letter. Both the character and the dot pattern have the
attributes uppercase and letter. @code{lowercase} and @code{uppercase}
should be used when a letter has only one case. Otherwise use the
@opcoderef{uplow}.
@opcode{litdigit, digit dots}
Associates a digit with the dot pattern which should be used to
represent it in literary texts. For example:
@example
litdigit 0 245
litdigit 1 1
@end example
@opcode{sign, character dots}
Associates a character with a dot pattern and defines both as a sign.
This opcode should be used for things like at sign (@samp{@@}),
percent (@samp{%}), dollar sign (@samp{$}), etc. Do not use it to
define ordinary punctuation such as period and comma. For example:
@example
sign % 4-25-1234 literary percent sign
@end example
@opcode{math, character dots}
Associates a character and a dot pattern and defines them as a
mathematical symbol. It should be used for less than (@samp{<}),
greater than(@samp{>}), equals(@samp{=}), plus(@samp{+}), etc. For
example:
@example
math + 346 plus
@end example
@end table
@node Braille Indicator Opcodes
@section Braille Indicator Opcodes
Braille indicators are dot patterns which are inserted into the
braille text to indicate such things as capitalization, italic type,
computer braille, etc. The opcodes which define them are followed only
by a dot pattern, which may be one or more cells.
@table @code
@opcode{capsign, dots}
The dot pattern which indicates capitalization of a single letter. In
English, this is dot 6. For example:
@example
capsign 6
@end example
@opcode{begcaps, dots}
The dot pattern which begins a block of capital letters. For example:
@example
begcaps 6-6
@end example
@opcode{endcaps, dots}
The dot pattern which ends a block of capital letters within a word.
For example:
@example
endcaps 6-3
@end example
@opcode{letsign, dots}
This indicator is needed in Grade 2 to show that a single letter is
not a contraction. It is also used when an abbreviation happens to be
a sequence of letters that is the same as a contraction. For example:
@example
letsign 56
@end example
@opcode{noletsign, letters}
The letters in the operand will not be proceeded by a letter sign.
More than one @code{noletsign} opcode can be used. This is equivalent
to a single entry containing all the letters. In addition, if a single
letter, such as @samp{a} in English, is defined as a @code{word}
(@pxref{word opcode,word,@code{word}}) or @code{largesign}
(@pxref{largesign opcode,largesign,@code{largesign}}), it will be
treated as though it had also been specified in a @code{noletsign}
entry.
@opcode{noletsignbefore, characters}
If any of the characters proceeds a single letter without a space a
letter sign is not used. By default the characters apostrophe
(@samp{'}) and period (@samp{.}) have this property. Use of a
@code{noletsignbefore} entry cancels the defaults. If more than one
@code{noletsignbefore} entry is used, the characters in all entries
are combined.
@opcode{noletsignafter, characters}
If any of the characters follows a single letter without a space a
letter sign is not used. By default the characters apostrophe
(@samp{'}) and period (@samp{.}) have this property. Use of a
@code{noletsignafter} entry cancels the defaults. If more than one
@code{noletsignafter} entry is used the characters in all entries are
combined.
@opcode{nocontractsign, dots}
The dots in this opcode are used to indicate a letter or a sequence of
letters that are not a contraction, e.g. @samp{CD}. The opcode is
similar to the @opcoderef{letsign}.
@c FIXME: In what way is the nocontractsign opcode different from the
@c letsign opcode, apart from apparently being a more focused version of
@c letsign?
@opcode{numsign, dots}
The translator inserts this indicator before numbers made up of digits
defined with the @opcoderef{litdigit} to show that they are a number
and not letters or some other symbols. For example:
@example
numsign 3456
@end example
@end table
@node Emphasis Opcodes
@section Emphasis Opcodes
In many braille systems emphasis such as bold, italics or underline is
indicated using special dot patterns that mark the start and often
also the end. For some languages these braille indicators differ
depending on the context, i.e. here is an separate indicator for an
emphasized word and another one for an emphasized phrase. To
accommodate for all these usage scenarios liblouis provides a number of
opcodes for various contexts.
At the same time some braille systems use different indicators for
different kinds of emphasis while others know only one kind of
emphasis. For that reason liblouis doesn't hard code any emphasis but
the table author defines which kind of emphasis exist for a specific
language using the @opcoderef{emphclass} opcode.
@menu
* Emphasis class::
* Contexts::
* Fallback behavior::
* Computer braille::
@end menu
@node Emphasis class
@subsection Emphasis class
The @code{emphclass} opcode defines the classes of emphasis that are
relevant for a particular language. For all emphasis that need special
indicators an emphasis class has to be declared.
@table @code
@opcode{emphclass, <emphasis class>}
Define an emphasis class to be used later in other emphasis related
opcodes in the table.
@example
emphclass italic
emphclass underline
emphclass bold
emphclass transnote
@end example
@end table
@node Contexts
@subsection Contexts
In order to understand the capabilities of Liblouis for emphasis
handling we have to look at the different contexts that are supported.
@menu
* None::
* Letter::
* Word::
* Phrase::
* Symbol::
@end menu
@node None
@subsubsection None
For some languages there is no such concept as contexts. Emphasis is
always handled the same regardless of context. There is simply an
indicator for the beginning of emphasis and another one for the end of
the emphasis.
@table @code
@opcode{begemph, <emphasis class> <dot pattern>}
Braille dot pattern to indicate the beginning of emphasis.
@example
begemph italic 46-3
@end example
@opcode{endemph, <emphasis class> <dot pattern>}
Braille dot pattern to indicate the end of emphasis.
@example
endemph italic 46-36
@end example
@end table
@node Letter
@subsubsection Letter
Some languages have special indicators for single letter emphasis.
@table @code
@opcode{emphletter, <emphasis class> <dot pattern>}
Braille dot pattern to indicate that the next character is emphasized.
@example
emphletter italic 46-25
@end example
@end table
@node Word
@subsubsection Word
Many languages have special indicators for emphasized words. Usually
they start at the beginning of the word and and implicitly, i.e.
without a closing indicator at the end of the word. There are also use
cases where the emphasis starts in the middle of the word and an
explicit closing indicator is required.
@table @code
@opcode{begemphword, <emphasis class> <dot pattern>}
Braille dot pattern to indicate the beginning of an emphasized word
or the beginning of emphasized characters within a word.
@example
begemphword underline 456-36
@end example
@opcode{endemphword, <emphasis class> <dot pattern>}
Generally emphasis with word context ends when the word ends. However
when an indication is required to close a word emphasis then this
opcode defines the Braille dot pattern that indicates the end of a word
emphasis.
@example
endemphword transnote 6-3
@end example
If emphasis ends in the middle of a word the Braille dot pattern
defined in this opcode is also used.
@end table
@node Phrase
@subsubsection Phrase
Many languages have a concept of a phrase where the emphasis is valid
for a number of words. The beginning of the phase is indicated with a
braille dot pattern and a closing indicator is put before or after the
last word of the phrase. To define how many words are considered a
phrase in your language use the @opcoderef{lenemphphrase}.
@table @code
@opcode{begemphphrase, <emphasis class> <dot pattern>}
Braille dot pattern to indicate the beginning of a phrase.
@example
begemphphrase bold 456-46-46
@end example
@opcode{endemphphrase before, <emphasis class> before <dot pattern>}
Braille dot pattern to indicate the end of a phrase. The closing indicator
will be placed before the last word of the phrase.
@example
endemphphrase bold before 456-46
@end example
@opcode{endemphphrase after, <emphasis class> after <dot pattern>}
Braille dot pattern to indicate the end of a phrase. The closing
indicator will be placed after the last word of the phrase. If both
@code{endemphphrase <emphasis class> before} and @code{endemphphrase
<emphasis class> after} are defined an error will be signaled.
@example
endemphphrase underline after 6-3
@end example
@opcode{lenemphphrase, <emphasis class> <number>}
Define how many words are required before a sequence of words is
considered a phrase.
@example
lenemphphrase underline 3
@end example
@end table
@node Symbol
@subsubsection Symbol
UEB has a concept of symbols that need special indication. When the
translator detects an emphasis sequence that needs to be indicated
with the rules for a symbol then it will use the dots defined with the
@opcoderef{emphletter}. To indicate the end of the symbol it will use
the dots defined in the @opcoderef{endemphword}.
@node Fallback behavior
@subsection Fallback behavior
Many braille systems either handle emphasis using no contexts or
otherwise by employing a combination of the letter, word and phrase
contexts. So if a table defines any opcodes for the letter, word or
phrase contexts then liblouis will signal an error for opcodes that
define emphasis with no context. In other words contrary to previous
versions of liblouis there is no fallback behavior.
As a consequence, there will only be emphasis for a context when the
table defines it. So for example when defining a braille dot pattern
for phrases and not for words liblouis will not indicate emphasis on
words that aren't part of a phrase.
@node Computer braille
@subsection Computer braille
For computer braille there are only two braille indicators, for the
beginning and end of a sequence of characters to be rendered in
computer braille. Such a sequence may also have other emphasis. The
computer braille indicators are applied not only when computer braille
is indicated in the @code{typeform} parameter, but also when a
sequence of characters is determined to be computer braille because it
contains a subsequence defined by the @opcoderef{compbrl}.
@node Special Symbol Opcodes
@section Special Symbol Opcodes
These opcodes define certain symbols, such as the decimal point, which
require special treatment.
@table @code
@opcode{decpoint, character dots}
This opcode defines the decimal point. The character operand must have
only one character. For example, in @file{en-us-g1.ctb} we have:
@example
decpoint . 46
@end example
@opcode{hyphen, character dots}
This opcode defines the hyphen, that is, the character used in
compound words such as @samp{have-nots}. The back-translator uses it
to determine the end of individual words.
@end table
@node Special Processing Opcodes
@section Special Processing Opcodes
These opcodes cause special processing to be carried out.
@table @code
@opcode{capsnocont,}
This opcode has no operands. If it is specified, words or parts of
words in all caps are not contracted. This is needed for languages
such as Norwegian.
@end table
@node Translation Opcodes
@section Translation Opcodes
These opcodes define the braille representations for character
sequences. Each of them defines an entry within the contraction table.
These entries may be defined in any order except, as noted below, when
they define alternate representations for the same character sequence.
Each of these opcodes specifies a condition under which the
translation is legal, and each also has a characters operand and a
dots operand. The text being translated is processed strictly from
left to right, character by character, with the most eligible entry
for each position being used. If there is more than one eligible entry
for a given position in the text, then the one with the longest
character string is used. If there is more than one eligible entry for
the same character string, then the one defined first is is tested for
legality first. (This is the only case in which the order of the
entries makes a difference.)
The characters operand is a sequence or string of characters preceded
and followed by whitespace. Each character can be entered in the
normal way, or it can be defined as a four-digit hexadecimal number
preceded by @samp{\x}.
The dots operand defines the braille representation for the characters
operand. It may also be specified as an equals sign (@samp{=}). This
means that the the default representation for each character
(@pxref{Character-Definition Opcodes}) within the sequence is to be
used. Note however that the @samp{=} shortcut for dot patterns is
deprecated. Dot patterns should be written out. Otherwise
back-translation may not be correct.
In what follows the word @samp{characters} means a sequence of one or
more consecutive letters between spaces and/or punctuation marks.
@table @code
@opcode{noback, opcode ...}
This is an opcode prefix, that is to say, it modifies the operation of
the opcode that follows it on the same line. noback specifies that no
back-translation is to be done using this line.
@example
noback always ;\s; 0
@end example
@opcode{nofor, opcode ...}
This is an opcode prefix which modifies the operation of the opcode
following it on the same line. nofor specifies that forward translation
is not to use the information on this line.
@opcode{compbrl, characters}
If the characters are found within a block of text surrounded by
whitespace the entire block is translated according to the default
braille representations defined by the @ref{Character-Definition
Opcodes}, if 8-dot computer braille is enabled or according to the dot
patterns given in the @opcoderef{comp6}, if 6-dot computer braille is
enabled. For example:
@example
compbrl www translate URLs in computer braille
@end example
@opcode{comp6, character dots}
This opcode specifies the translation of characters in 6-dot computer
braille. It is necessary because the translation of a single character
may require more than one cell. The first operand must be a character
with a decimal representation from 0 to 255 inclusive. The second
operand may specify as many cells as necessary. The opcode is somewhat
of a misnomer, since any dots, not just dots 1 through 6, can be