Skip to content

iwan-rg/A-Monolingual-Arabic-Parallel-Corpus-

Repository files navigation

A-Monolingual-Arabic-Parallel-Corpus-

A7'ta: A Monolingual Arabic Parallel Corpus for Grammar Checking

Collected by Nora Madi email : nmadi at ksu dot edu dot sa site: https://github.com/iwan-rg

Reference: N. Madi and H. S. Al‐Khalifa, “A7’ta: Data on a Monolingual Arabic Parallel Corpus for Grammar Checking,” Data in Brief, vol. 22, pp. 237–240, 2019.

Resource

The parallel corpus is a collection of Modern Standard Arabic (MSA) sentences (and words) extracted from the book كشاف الأخطاء اللغوية - الصحافة السعودية أنموذجاً (Linguistic Error Detector – Saudi Press as a Sample).

Data Files:

Contains erroneous Arabic sentences and their correct counterparts.

Data Structure:

1- Text format 2- UTF-8 encoding

Statitics :

The data contains 300 documents, 445 erroneous sentences and their error-free counterparts, and a total of 3,532 words. Each pair of sentences differs in only one word.

Folder structure:

  1. There are 8 folders for each of the eight main categories in the book.
  2. Within each folder, there is a sub-folder for each sub-category within the main category if any.
  3. Inside each main folder or sub-folder, there are folders for each type of error.
  4. Within each error type folder, there are two files; one for the correctly written sentences (الصواب) and another for the erroneous sentences (الخطأ).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published