## <span style="color:purple">Syllabification</span>

EstNLTK provides word syllabification functionality with the help of the [Vabamorf library](https://github.com/Filosoft/vabamorf).
The function `syllabify_word()` accepts the following parameters:

  1. surface word form (string);
  2. (optional) `as_dict` -- a boolean specifying if results are formatted as a dict (default) or as a tuple;
  3. (optional) `split_compounds` -- a boolean specifying if compound words will be heuristically split before syllabification;
  
For example, let's syllabify word _"kiireloomuline"_ :

In [1]:
from estnltk.vabamorf.morf import syllabify_word
syllabify_word('kiireloomuline')

[{'syllable': 'kii', 'quantity': 2, 'accent': 1},
 {'syllable': 're', 'quantity': 1, 'accent': 0},
 {'syllable': 'loo', 'quantity': 2, 'accent': 1},
 {'syllable': 'mu', 'quantity': 1, 'accent': 0},
 {'syllable': 'li', 'quantity': 1, 'accent': 0},
 {'syllable': 'ne', 'quantity': 1, 'accent': 0}]

The function returns a list of syllables, and the following information about each syllable:
   * 'syllable' -- string corresponding to the syllable;
   * 'quantity' -- 1, 2 or 3 (degree of quantity) ( _'välde'_ )
   * 'accent' -- 1 or 0 (if the syllable is accented or not) ( _'rõhuline silp'_ )


If `as_dict==False`, the function returns tuples `(syllable, quantity, accent)`:

In [2]:
syllabify_word('kiireloomuline', as_dict=False)

[('kii', 2, 1),
 ('re', 1, 0),
 ('loo', 2, 1),
 ('mu', 1, 0),
 ('li', 1, 0),
 ('ne', 1, 0)]

#### Syllabification of compound words ( `split_compounds` )

Since the version 1.6.x, EstNLTK's syllabifier includes a heuristic which improves syllabification of compound words.
The heuristic checks if the input word is a compound word and if so, then tries to split it into subwords, so that each subword will be syllabified separately.
Examples:

In [3]:
syllabify_word('vanaema')

[{'syllable': 'va', 'quantity': 1, 'accent': 1},
 {'syllable': 'na', 'quantity': 1, 'accent': 0},
 {'syllable': 'e', 'quantity': 1, 'accent': 1},
 {'syllable': 'ma', 'quantity': 1, 'accent': 0}]

In [4]:
syllabify_word('lasteaialapselegi')

[{'syllable': 'las', 'quantity': 2, 'accent': 1},
 {'syllable': 'te', 'quantity': 1, 'accent': 0},
 {'syllable': 'ai', 'quantity': 2, 'accent': 1},
 {'syllable': 'a', 'quantity': 1, 'accent': 0},
 {'syllable': 'lap', 'quantity': 2, 'accent': 1},
 {'syllable': 'se', 'quantity': 1, 'accent': 0},
 {'syllable': 'le', 'quantity': 1, 'accent': 0},
 {'syllable': 'gi', 'quantity': 1, 'accent': 0}]

You can use the flag `split_compounds` to switch the heuristic off.
This can improve syllabification speed a bit, but reduces the quality of syllabification on compound words:

In [5]:
syllabify_word('vanaema', split_compounds=False)

[{'syllable': 'va', 'quantity': 1, 'accent': 0},
 {'syllable': 'nae', 'quantity': 2, 'accent': 1},
 {'syllable': 'ma', 'quantity': 1, 'accent': 0}]

<div class="alert alert-block alert-warning">
<h4><i>Limitations of syllabifying compound words</i></h4>
<br>
If <code>split_compounds==True</code> (the default setting), then the function <code>syllabify_word()</code> applies morphological analysis on the input word, and uses information from word lemmas to determine subword boundaries inside a compound word.
This works reasonably well if: a) lemma of the word can be unambiguously determined, b) the lemma and the input word match by their prefix -- to an extent that covers all the subwords.
If the conditions a) and b) are met, then splitting into compound words succeeds and this also improves the quality of syllabification. 
Otherwise, the input word will be syllabified as it is (without any knowledge of compound word boundaries), and the quality of syllabification can be suboptimal. 
</div>

<div class="alert alert-block alert-warning">
<h4><i>Remark: <code>-</code> and <code>/</code> are special symbols for syllabifier</i></h4>
<br>
Syllabifier treats <code>-</code> and <code>/</code> as special symbols, and they always appear as  "standalone syllables" in the output.
You can use these symbols to guide the syllabification process, e.g in syllabification of foreign proper names: 
<pre>
>> syllabify_word('Mc/Donald/s',as_dict=False)
[('Mc', 3, 1),
 ('/', 3, 1),
 ('do', 1, 0),
 ('nald', 3, 1),
 ('/', 3, 1),
 ('s', 3, 1)]
</pre>
But if you need an exact count of syllables, then you should count syllables while ignoring these symbols.
</div>