Skip to content

nlp-unibuc/ro-hyphen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ro-hyphen

End of line hyphenation and syllabication in Romanian.

Authors: Liviu P. Dinu, Vlad Niculae (@vene), Octavia-Maria Șulea

License: BSD 3-clause

Usage

echo "dinozaur \n telefon" | python make_crfsuite_input.py | crfsuite tag -m models/4grams.C=1.0.nb.model

1
0
1
0
1
0
1

1
0
1
0
1
2

Tags are as described in the paper and are different between the two available models. With the nb model, tags represent the distance since the last syllable split, and therefore the 0 tag translates to a hyphen. With the simple model, 1 translates to a hyphen.

The output above should be read as: di-no-za-ur, te-le-fon. If you need to use diacritics, please use a UTF-8 encoded file instead of standard input, to avoid terminal encoding issues.

About

End of line hyphenation and syllabication in Romanian.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages