Skip to content

Commit

Permalink
docs: update description of letter groups
Browse files Browse the repository at this point in the history
Make it linkable by moving them under separate heading.
Update behavior description of applied rules.
Add note about SetLetterBitsUTF8() for redefining standard letter
groups.
  • Loading branch information
valdisvi committed Sep 16, 2019
1 parent 20096dc commit c60a109
Showing 1 changed file with 12 additions and 6 deletions.
18 changes: 12 additions & 6 deletions docs/dictionary.md
Expand Up @@ -4,6 +4,7 @@
- [Phoneme names](#phoneme-names)
- [Pronunciation Rules](#pronunciation-rules)
- [Rule Groups](#rule-groups)
- [Letter Groups](#letter-groups)
- [Rules](#rules)
- [Special Characters in \<phoneme string\>](#special-characters-in-phoneme-string)
- [Special Characters in Both \<pre\> and \<post\> ](#special-characters-in-both-pre-and-post)
Expand Down Expand Up @@ -103,6 +104,12 @@ The rules are organized in groups, each starting with a `.group` line:
* `.replace`
See section [Character Substitution](#character-substitution).

### Letter groups

Specific group of rules is declaration of letter sequences with some common
feature of letters for particular language. It may be used as a placeholder
of prefixes/infixes of words (in prerules) or infixed/postfixes in (postrules).

* `.L<nn>`
Defines a group of letter sequences, any of which can match with `Lnn` in a
pre or post rule (see below). nn is a 2 digit decimal number in the range 01
Expand All @@ -113,12 +120,11 @@ The rules are organized in groups, each starting with a `.group` line:

There can be up to 200 items in one letter group.

When matching a word, firstly the 2-letter group for the two letters at
the current position in the word (if such a group exists) is searched,
and then the single-letter group. The highest scoring rule in either of
those two groups is used.
When matching a word, firstly the group containing most letters is checked at
the current position in the word (if such a group exists), then shorter ones
till to the single-letter groups. The highest scoring rule of matching group is used.

`~` Letter in letter group means, that there can be no letter in this group
`~` Letter in letter group means, that there can be no letter in this group
in the pre- or post- rule.

_Example with prerule group:_
Expand Down Expand Up @@ -531,5 +537,5 @@ usually have specific meaning for each particular language.
file by calling `SetLetterBits()` function from (usually) `NewTranslator()` function.
Note, that letters should be stored as array of chars, thus multibyte
unicode letters should be transposed using `transpose_min` and `transpose_max` parameters
of particular `Translator` structure.
of particular `Translator` structure, or using `SetLetterBitsUTF8()` function.

0 comments on commit c60a109

Please sign in to comment.