Skip to content

linuxscout/arabicnlptoolslist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Arabic NLP Tools and Resources Lists

Arabic NLP tools List inventory

Tools

STEMMING

MORPHOLOGICAL ANALYSIS AND GENERATION

  • QalsadiQalsadi Qalsadi: Arabic mophological analyzer Library for python.
  • Buckwalter Arabic Morphological Analyzer (BAMA BAMA)
  • Standard Arabic Morphological Analyzer (SAMA SAMA, version 3.0 of BAMA)
  • ElixirFM ElixirFM : Functional Arabic Morphology
  • Xerox Arabic Morphological Analysis and Generation
  • (Deprecated) NMSU NMSU’s Arabic Morphological Analyzer - MAGEAD: Morphological Analysis and Generation for Arabic and its Dialects ~~- Almorgeana : Arabic Lexeme-based Morphological Generation and Analysis is distributed as part of the MADA system. ~~ - Alkhalil Alkhalil Morphological Analyzer
  • Araflex Araflex

MORPHOLOGICAL DISAMBIGUATION AND POS TAGGING

- Khoja Arabic Tagger Khoja Arabic Tagger

  • AMIRA: AMIRA: Toolkit for Arabic tokenization, POS tagging and base phrase chunking
  • MADA MADA: Morphological Analysis and Disambiguation for Arabic – a tool for tokenization, lemmatization, diacritization and POS tagging

PARSERS

NAMED ENTITY RECOGNITION

~~- Yassine Benajiba’s ANER (Arabic Named Entity Recognition) system ~~ - BBN’s Identifinder BBN’s Identifinder (English, Arabic, Chinese)

MACHINE TRANSLATION

TREE EDITING

LEXICOGRAPHY

  • aConCorde: A concordance generation program for Arabic

Verb conjugator

  • Qutrub Source on github
  • The CJKI Arabic Verb Conjugator (CAVE).
    An interactive Arabic-English verb conjugation application for iOS devices that provides conjugation paradigms for over 1,600 Arabic verbs.
  • AraCon ARACON is a verb conjugator for Arabic implemented as part of a morphological Analyser and generator (java).

Transcription and transliteration

  • Arabic Transcription and Transliteration.
    An overview of some linguistic issues related to transliteration and transcription, with special focus on our Arabic transcription technology.
  • The ARAN and NANA systems automatically transcribe CJK and Latin names to and from Arabic.

Numbers to words

  • Tafqit : Tafqeet of Arabic Number to Word تحويل الأرقام إلى ما يقابلها كتابة باللغة العربية

Poetry

Al-Faraheedy-Project

Resources

Corpora

Monolignual corpora

Multilingual corpora

Dicrionaries

Wordlists

  • Comprehensive Word Lists for Arabic (CJKAWORD).
    Comprehensive monolingual word lists for Arabic covering general vocabulary, proper nouns and technical terms. Includes both a lexical database for canonical forms and a full-furm lexicon.

  • Arabic Broken Plurals.
    A comprehensive database of broken plurals (unpredictable) in Arabic given in three versions -- voweled, unvoweled, and transcription.

ROOT LISTS

GAZETTEERS

  • ANERCorp : Is a Corpus of more than 150,000 words annotated for the NER task.
  • ANERGazet: Is a collection of 3 Gazetteers, (i) Locations: a Gazetteer containing names of continents, countries, cities, etc.; (ii) People: a Gazetteer containing names of people recollected manually from different Arabic websites; and finally (iii) Organizations: containing names of Organizations like companies, football teams, etc.
  • FAOTERM: United Nations’ Food and Agriculture Organization of the Terminology refer- ence for country names (six languages including Arabic)
  • Foreignword.com’s country names in 16 languages including Arabic
  • Geonames.de’s multilingual resource for names of geographical entities (and other things)C.5. LEXICAL DATABASES 139
  • U.S. Board on Geographic Names (including Arab countries) – uses SATTS Arabic translit- eration
  • Database of Arab Names (DAN).
    A comprehensive database covering over 6.5 million Arab names and variants, based on authoritative resources and extensively proofread by a team of Arabic native speaker editors.

  • Database of Arab Names in Arabic (DANA).
    A one-of-a-kind resource of Arab personal names and variants, in the original Arabic script. This database covers several hundred thousand Arabic script variants, along with common spelling mistakes.

  • Database of Arabic Business Names (DABNA).
    Arabic Companies and Organizations. A database of Arabic company and organization names is now under development.

  • Expanded OFAC (XOFAC).
    To address the shortcomings of OFAC's SDN List, CJKI has developed a comprehensive "Expanded OFAC" database of OFAC full name variants, the vast majority of which are not listed in OFAC.

  • Database of Foreign Names in Arabic (DAFNA).
    A database of non-Arab names transcribed to Arabic, including Arabic orthographic variants and common orthographic errors.

  • Dictionary of Arabic Place Names (DAPNA).
    A database of Arabic-English place names including systematic coverage for orthographic variants and common orthographic errors.

Question answering

  • Documents:: more than 11,000 Arabic Wikipedia Articles in SGML format (the format adopted in the CLEF and also the one accepted by the JIRS system).
  • [List of Questions]](http://users.dsic.upv.es/~ybenajiba/resources/):: This is a list of 200 questions of different types. The proportion of each type of questions is the same proportion adopted in CLEF.
  • [List of Correct Answers]](http://users.dsic.upv.es/~ybenajiba/resources/):: For each of the questions presented in my list of questions, I give you here a list of correct answers for each question. This list is very important for automatic evaluation.

Ontologies

SEMANTIC ONTOLOGIES

  • Arabic Wordnet Arabic VerbNet Arabic Verbnet is a lage scale verb lexicon that classifies verbs in Arabic using syntactic alternations inspired by the work of Kipper Schuler (2005) on English VerbNet.

Releases

No releases published

Packages

No packages published