Skip to content

Generate arabic golden standard corpus for morphology and stemming

License

Notifications You must be signed in to change notification settings

linuxscout/miknaaz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Miknaaz مكناز

Description

Generate Arabic golden standard corpus for morphology and stemming

Citation

If you would cite it in academic work, can you use this citation

Taha Zerrouki‏, Miknaaz,  http://github.com/linuxscout/miknaaz, 2023

or in bibtex format

@misc{zerrouki2018miknaaz,
  title={Miknaaz: Generate arabic golden standard},
  author={Zerrouki, Taha},
  url={http://github.com/linuxscout/miknaaz},
  year={2018}
}

Usage

  • Build word features for linguistics building corpus
from miknaaz.corpus_builder import CorpusBuilder
text = u"إلى البيت"
lemmer = CorpusBuilder()
words = lemmer.tokenize(text)
for word in words:
    result = lemmer.morph_suggestions(word, True)
    print(result)
  • Extract separate features

    from miknaaz.corpus_builder import CorpusBuilder
    text = u"إلى البيت"
    lemmer = CorpusBuilder()
    words = lemmer.tokenize(text)
    # test get lemmas
    for word in words:
        result = lemmer.get_lemmas(word)
        # the result contains objects
        print(result)
    # test get roots
    for word in words:
        result = lemmer.get_roots(word)
        # the result contains objects
        print(result)
    # test get wordtypes
    for word in words:
        result = lemmer.get_word_type(word)
        # the result contains objects
        print(result)
    # test get wazns
    for word in words:
        result = lemmer.get_wazns(word)
        # the result contains objects
        print(result)

About

Generate arabic golden standard corpus for morphology and stemming

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published