Skip to content

mikahama/haracat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hAraCat 🐈

DOI

hAraCat is a Python library for adding diacritics automatically to Medieval Arabic text.

Install 😻

pip3 install haracat
python3 -m haracat.download

Use 🐱

Diacritics can be added as follows

from haracat import diacritics_sentence
diacritics_sentence("الإجاج، مثلثة الأول: الستر.".split(" "))
>> الْإِجاجُ، مُثَلَّثَةَ الْأَوَّلِ: السِّتْرُ.

First the sentence is tokenized before the diacritics are predicted.

Credits

@misc{alnajjar2020automated,
      title={Automated Prediction of Medieval Arabic Diacritics}, 
      author={Khalid Alnajjar and Mika Hämäläinen and Niko Partanen and Jack Rueter},
      year={2020},
      eprint={2010.05269},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Khalid Alnajjar, Mika Hämäläinen, Niko Partanen and Jack Rueter

About

hAraCat 🐈 adds diacritics to Arabic text

Resources

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Languages