AraStem-corpus

ARASTEM is a corpus dedicated to the Arabic stemming field, where it contains several documents containing grouped words which are semantically and morphologically related.

Hence, the corpus was constructed manually by the full intervention of native Arabic speakers after collecting several texts from different Arabic discussion forums. Furthermore, it contains words belonging to the Standard Arabic, Dialectical Arabic and Modern Pseudo Arabic languages.

Contributors:

Ibtissem Abainia
Ahmed Kedaya
Chouaib Fellah
Otman Bordjiba
Reviwed by Taha Zerrouki

Parts

The new reviewed version is deived into two parts:

Roots oriented Data: words are grouped according to their roots
Stems oriented Data: words are grouped according to their lemma

Links

The data is developed to evaluate the ARLStem stemmerARLStem stemmer.

The ARLStemmer is included in NLTK frameworkNLTK framework .

Citation

The data is developed to evaluate the ARLStem stemmer, To cite this corpus use

@article{abainia2017novel,
  title={A novel robust Arabic light stemmer},
  author={Abainia, Kheireddine and Ouamour, Siham and Sayoud, Halim},
  journal={Journal of Experimental \& Theoretical Artificial Intelligence},
  volume={29},
  number={3},
  pages={557--573},
  year={2017},
  publisher={Taylor \& Francis}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data_root		data_root
data_stem		data_stem
ARASTEM.rar		ARASTEM.rar
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_root

data_root

data_stem

data_stem

ARASTEM.rar

ARASTEM.rar

LICENSE

LICENSE

README.md

README.md

Repository files navigation

AraStem-corpus

Contributors:

Parts

Links

Citation

About

Releases

Packages

License

linuxscout/ARASTEM-corpus

Folders and files

Latest commit

History

Repository files navigation

AraStem-corpus

Contributors:

Parts

Links

Citation

About

Resources

License

Stars

Watchers

Forks