Skip to content

mohataher/arabic_big_corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Arabic Big Corpus

This repository is under development.

Alt text

Introduction

Text file containing big Arabic corpus. This file is inspired by big.txt file by Peter Norvig. While his file is for English, arabic_big.txt is the one for Arabic Language.

Sources

Included

Not included Yet (work in progress)

  • LABR - LABR: Large Scale Arabic Book Reviews Dataset.
  • SaudiNewsNet - This repo contains a set of Arabic newspaper articles alongwith metadata, extracted from various Saudi newspapers.
  • Arabic-Wikipedia-Corpus
  • akec - Arabic Keyphrase Extraction Corpus.
  • El-Haj list - list made by Dr. El-Haj of several academic papers.

More sources to be added.

Contribute

Contribution is welcome to enlarge and enhance the content. If you're interested, feel free to send a pull request or please get in touch.

About

Text file containing big Arabic corpus

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published