# Bert For Arabic QA


This is the code for training, evaluating and demoing BERT trained on our created datasets Arabic-SQuAD and ARCD described in our WANLP 2019 paper Neural Arabic Question Answering.

Owner: Hussein Mozannar

## Downloading necessary files


We will clone our SOQAL repo, BERT repo and BERT pre-trained models.

In [None]:
%tensorflow_version 1.x

TensorFlow 1.x selected.


In [None]:
!git clone https://github.com/husseinmozannar/SOQAL
!wget -cq https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip
!unzip -qq multi_cased_L-12_H-768_A-12.zip

Cloning into 'SOQAL'...
remote: Enumerating objects: 143, done.[K
remote: Total 143 (delta 0), reused 0 (delta 0), pack-reused 143[K
Receiving objects: 100% (143/143), 6.53 MiB | 13.49 MiB/s, done.
Resolving deltas: 100% (40/40), done.


## Train BERT on Arabic-SQuAD

The following command will train BERT on the Arabic-SQuAD dataset (a google machine translation of SQuAD) for 2 epochs.

In [None]:
!python SOQAL/bert/run_squad.py \
  --vocab_file=multi_cased_L-12_H-768_A-12/vocab.txt \
  --bert_config_file=multi_cased_L-12_H-768_A-12/bert_config.json \
  --init_checkpoint=multi_cased_L-12_H-768_A-12/bert_model.ckpt \
  --do_train=True \
  --train_file=SOQAL/data/Arabic-SQuAD.json \
  --do_predict=True \
  --predict_file=SOQAL/data/arcd.json \
  --train_batch_size=12 \
  --learning_rate=3e-5 \
  --num_train_epochs=2 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --do_lower_case=False\
  --output_dir=arabic_squad_train/


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
I0713 15:51:48.364101 139905089730432 tpu_estimator.py:2307] global_step/sec: 1.64119
INFO:tensorflow:examples/sec: 19.6943
I0713 15:51:48.364347 139905089730432 tpu_estimator.py:2308] examples/sec: 19.6943
INFO:tensorflow:global_step/sec: 1.64434
I0713 15:51:48.972222 139905089730432 tpu_estimator.py:2307] global_step/sec: 1.64434
INFO:tensorflow:examples/sec: 19.732
I0713 15:51:48.972463 139905089730432 tpu_estimator.py:2308] examples/sec: 19.732
INFO:tensorflow:global_step/sec: 1.64717
I0713 15:51:49.579364 139905089730432 tpu_estimator.py:2307] global_step/sec: 1.64717
INFO:tensorflow:examples/sec: 19.766
I0713 15:51:49.579620 139905089730432 tpu_estimator.py:2308] examples/sec: 19.766
INFO:tensorflow:global_step/sec: 1.64551
I0713 15:51:50.187120 139905089730432 tpu_estimator.py:2307] global_step/sec: 1.64551
INFO:tensorflow:examples/sec: 19.7461
I0713 15:51:50.187369 139905089730432 tpu_estimator.py:2308] examples/s

## Evaluate BERT on ARCD

Get predictions from BERT on the Arabic Reading Comprehension Dataset (ARCD)

In [None]:
!python SOQAL/bert/run_squad.py \
  --vocab_file=multi_cased_L-12_H-768_A-12/vocab.txt \
  --bert_config_file=multi_cased_L-12_H-768_A-12/bert_config.json \
  --init_checkpoint=multi_cased_L-12_H-768_A-12/bert_model.ckpt \
  --do_train=False \
  --train_file=SOQAL/data/Arabic-SQuAD.json \
  --do_predict=True \
  --predict_file=SOQAL/data/arcd.json \
  --train_batch_size=12 \
  --learning_rate=3e-5 \
  --num_train_epochs=2 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --do_lower_case=False\
  --output_dir=arabic_squad_train/





W0713 16:29:50.679621 140453795653504 module_wrapper.py:139] From SOQAL/bert/run_squad.py:1123: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0713 16:29:50.680025 140453795653504 module_wrapper.py:139] From SOQAL/bert/run_squad.py:1123: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0713 16:29:50.680216 140453795653504 module_wrapper.py:139] From /content/SOQAL/bert/modeling.py:92: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0713 16:29:50.680938 140453795653504 module_wrapper.py:139] From SOQAL/bert/run_squad.py:1129: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://githu

Obtain metrics EM, F1 and SM on ARCD. This follows the offical SQuAD 1.1 evaluation script

In [None]:
from __future__ import print_function
from collections import Counter
import string
import re
import argparse
import json
import sys
import nltk
import random
nltk.download('punkt')
from random import randint

def arabic_clean_str(text):
    '''
    this method normalizes up an arabic string, currently not used in evaluation, but should be used in the future
    '''
    search = ["أ", "إ", "آ", "ة", "_", "-", "/", ".", "،", " و ", " يا ", '"', "ـ", "'", "ى", "\\", '\n', '\t',
              '&quot;', '?', '؟', '!']
    replace = ["ا", "ا", "ا", "ه", " ", " ", "", "", "", " و", " يا", "", "", "", "ي", "", ' ', ' ', ' ', ' ? ', ' ؟ ',
               ' ! ']

    # remove tashkeel
    p_tashkeel = re.compile(r'[\u0617-\u061A\u064B-\u0652]')
    text = re.sub(p_tashkeel, "", text)

    # remove longation
    p_longation = re.compile(r'(.)\1+')
    subst = r"\1\1"
    text = re.sub(p_longation, subst, text)

    text = text.replace('وو', 'و')
    text = text.replace('يي', 'ي')
    text = text.replace('اا', 'ا')

    for i in range(0, len(search)):
        text = text.replace(search[i], replace[i])

    # trim
    text = text.strip()
    return text


def normalize_answer(s):
    """Lower text and remove punctuation, articles and extra whitespace."""

    def remove_articles(text):
        return re.sub(r'\b(a|an|the)\b', ' ', text)
    
    def remove_articles_ar(text):
        return re.sub('\sال^|ال', ' ', text)

    def white_space_fix(text):
        return ' '.join(text.split())

    def remove_punc(text):
        exclude = set(string.punctuation)
        return ''.join(ch for ch in text if ch not in exclude)

    def lower(text):
        return text.lower()

    return white_space_fix(remove_articles_ar(remove_articles(remove_punc(lower(s)))))


def f1_score(prediction, ground_truth):
    prediction_tokens = normalize_answer(prediction).split()
    ground_truth_tokens = normalize_answer(ground_truth).split()
    common = Counter(prediction_tokens) & Counter(ground_truth_tokens)
    num_same = sum(common.values())
    if num_same == 0:
        return 0
    precision = 1.0 * num_same / len(prediction_tokens)
    recall = 1.0 * num_same / len(ground_truth_tokens)
    f1 = (2 * precision * recall) / (precision + recall)
    return f1


def exact_match_score(prediction, ground_truth):
    return (normalize_answer(prediction) == normalize_answer(ground_truth))


def metric_max_over_ground_truths(metric_fn, prediction, ground_truths):
    scores_for_ground_truths = []
    for ground_truth in ground_truths:
        score = metric_fn(prediction, ground_truth)
        scores_for_ground_truths.append(score)
    return max(scores_for_ground_truths)


def evaluate(dataset, predictions):
    f1 = exact_match = total = exact_sentence = inclusion = random = 0
    for article in dataset:
        for paragraph in article['paragraphs']:
            for qa in paragraph['qas']:
                total += 1
                if qa['id'] not in predictions:
                    message = 'Unanswered question ' + qa['id'] + \
                              ' will receive score 0.'
                    print(message, file=sys.stderr)
                    continue
                ground_truths = list(map(lambda x: x['text'], qa['answers']))
                prediction = predictions[qa['id']]
                sents = nltk.sent_tokenize(paragraph['context'])
                indx_g = -1
                indx_p = -1
                i = 0
                for sent in sents:
                    if sent.find(ground_truths[0]) != -1:
                        indx_g = i
                    if sent.find(prediction) != -1:
                        indx_p = i
                    i += 1
                test = randint(0,i)
                if test == indx_g:
                    random += 1
                if prediction.find(ground_truths[0]) != -1 or ground_truths[0].find(prediction):
                    inclusion += 1
                if indx_g == indx_p and indx_p != -1:
                    exact_sentence += 1
                exact_match += metric_max_over_ground_truths(
                    exact_match_score, prediction, ground_truths)
                f1 += metric_max_over_ground_truths(
                    f1_score, prediction, ground_truths)
    inclusion = inclusion / total
    random = random / total
    exact_sentence = 100 * exact_sentence / total
    exact_match = 100.0 * exact_match / total
    f1 = 100.0 * f1 / total

    return {'exact_match': exact_match, 'f1': f1, 'exact_sentence': exact_sentence}


with open("SOQAL/data/arcd.json") as dataset_file:
  dataset_json = json.load(dataset_file)
  dataset = dataset_json['data']
with open("arabic_squad_train/predictions.json") as prediction_file:
  predictions = json.load(prediction_file)
print(json.dumps(evaluate(dataset, predictions)))




[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
{"exact_match": 20.21505376344086, "f1": 52.098255315053166, "exact_sentence": 91.11111111111111}
