Pre-process arabic text (remove diacritics, punctuations, and repeating characters) and Extract named-entity.
- Install dependencies from https://polyglot.readthedocs.io/en/latest/Installation.html
- Download Necessary Models:
polyglot download embeddings2.ar ner2.ar
python NERArabic.py [-h] -i INFILE -o OUTFILE
optional arguments: -h, --help show this help message and exit -i INFILE, --infile INFILE input file. -o OUTFILE, --outfile OUTFILE out file.
python NERArabic.py -i infile.txt -o outfile.txt