# <span style="color:blue"> B. Specific details for programmers: how it works</span>

## <span style="color:purple">Syllabification</span>

EstNLTK's Vabamorf tool provides functions for word syllabification.
The function `syllabify_word()` accepts the following parameters:

  1. surface word form (string);
  2. (optional) boolean `as_dict` if results are formatted as a dict (default) or as a tuple;
  3. (optional) boolean `split_compounds` if compound words will be heuristically split before syllabification;
  
For example, let's syllabify word _"kiireloomuline"_ :

In [1]:
from estnltk.vabamorf.morf import syllabify_word
syllabify_word('kiireloomuline')

[{'syllable': 'kii', 'quantity': 2, 'accent': 1},
 {'syllable': 're', 'quantity': 1, 'accent': 0},
 {'syllable': 'loo', 'quantity': 2, 'accent': 1},
 {'syllable': 'mu', 'quantity': 1, 'accent': 0},
 {'syllable': 'li', 'quantity': 1, 'accent': 0},
 {'syllable': 'ne', 'quantity': 1, 'accent': 0}]

The function returns a list of syllables, and the following information about each syllable:
   * 'syllable' -- string corresponding to the syllable;
   * 'quantity' -- degree of quantity: 1, 2 or 3 ( _'välde'_ )
   * 'accent' -- 0 or 1 (if the syllable is accented or not) ( _'rõhuline silp'_ )


If `as_dict==False`, the function returns tuples `(syllable, quantity, accent)`:

In [2]:
syllabify_word('kiireloomuline', as_dict=False)

[('kii', 2, 1),
 ('re', 1, 0),
 ('loo', 2, 1),
 ('mu', 1, 0),
 ('li', 1, 0),
 ('ne', 1, 0)]

#### Syllabification of compound words ( `split_compounds` )

Although the word _'kiireloomuline'_ was syllabified correctly in the previous example, the default settings of `syllabify_word()` do not guarantee the correct syllabification of compound words.
However, if you switch on `split_compounds`, then word boundaries inside a compound word will be determined heuristically, and each subword of a compound word will be syllabified separately.
Examples:

In [3]:
syllabify_word('vanaema',split_compounds=True)

[{'syllable': 'va', 'quantity': 1, 'accent': 1},
 {'syllable': 'na', 'quantity': 1, 'accent': 0},
 {'syllable': 'e', 'quantity': 1, 'accent': 1},
 {'syllable': 'ma', 'quantity': 1, 'accent': 0}]

In [4]:
syllabify_word('lasteaialapselegi',split_compounds=True)

[{'syllable': 'las', 'quantity': 2, 'accent': 1},
 {'syllable': 'te', 'quantity': 1, 'accent': 0},
 {'syllable': 'ai', 'quantity': 2, 'accent': 1},
 {'syllable': 'a', 'quantity': 1, 'accent': 0},
 {'syllable': 'lap', 'quantity': 2, 'accent': 1},
 {'syllable': 'se', 'quantity': 1, 'accent': 0},
 {'syllable': 'le', 'quantity': 1, 'accent': 0},
 {'syllable': 'gi', 'quantity': 1, 'accent': 0}]

<div class="alert alert-block alert-warning">
<h4><i>Limitations of syllabifying compound words</i></h4>
<br>
If <code>split_compounds==True</code>, then function <code>syllabify_word()</code> applies morphological analysis on the input word, and uses information from word lemmas to determine subword boundaries inside a compound word.
This works reasonably well if: a) lemma of the word can be unambiguously determined, b) the lemma and the input word match by their prefix -- to an extent that covers all the subwords.
If the conditions a) and b) are met, then splitting into compound words succeeds and this also improves the quality of syllabification. 
Otherwise, the input word will be syllabified as it is (without any knowledge of compound word boundaries), and this can also lessen the quality of syllabification. 
</div>