# Code by TCE team at Qur'an QA 2023 shared task B

# Installation

I use [rclone](https://rclone.org/) to access my drive without asking for permission everytime.
The code accesses a file called colab4 which has my drive access token, you may replicate this on your side or just ignore this altogether and download files manually.  

In [None]:
!lscpu
!nvidia-smi
!free -g

In [None]:
!curl https://rclone.org/install.sh | bash 2> null 1>null

In [None]:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["WANDB_DISABLED"] = "true"

In [None]:
!rclone

## Clone repo and prepare the datasets

In [None]:
repo_url = f"https://github.com/mohammed-elkomy/TCE-QQA2023-TASK-B"
!git clone $repo_url
%cd TCE-QQA2023-TASK-B
!pip install -r requirements.txt

### Download and create datasets

In [None]:
!git pull

!python data_scripts/download_datasets.py
!python data_scripts/generate/generate_faithful_splits.py
!python data_scripts/generate/qrcd_merge_train_dev.py

In [None]:
!md5sum data/* | sort -k 2

##### Download pretrained models
Download those files from drive or huggingface
1. araelectra-base-discriminator-2703-arabic_tydiqa
2. bert-base-arabertv02-440-arabic_tydiqa
3. bert-base-arabic-camelbert-ca-78123-arabic_tydiqa

## Fine-tuning

Make sure to use colab for this notebook in order to see the interactive form of experiments.

* We have 6 different models to choose from listed below.
* Set the number of models to train, we train 20 models to get average performance.
* choose the experiment mode

    1.  QQA23_TaskB_qrcd_v1.2 ➡ normal training with official training data and validation with official validation data.  
    2. my-faithful-processed ➡ combines training and validation data to create faithful splits to address leakage (check the paper for more).
    3.   QQA23_TaskB_qrcd_v1.2_merged ➡ combining training and validation for training and perform inference using hidden split (done for testing phase).

* loss type: is the way of training
    1. first ➡ only use the first answer for multi-answer questions.
    2. MAL ➡ the model trained to jointly optimize on **all** answers for multi-answer samples

---

**Once the training is made you will find a dump file saved!**

something like: bert-base-arabertv02-fine-tuned-2e-05-first-827-QQA23_TaskB_qrcd_v1.2_train.zip
This is a bert-base-arabertv02 fine-tuned model with:
1. learning rate of 2e-05.
2. first learning method.
3. A random starting seed of 827.
4. QQA23_TaskB_qrcd_v1.2_train training data is used

This dump file contains all summary results and model predictions for each sample.
You can look at the **analysis** directory of the repo for more details.




In [None]:
import os
from random import choice
import glob

model_name = "aubmindlab/araelectra-base-discriminator"  # @param ["bert-base-arabic-camelbert-ca-78123-arabic_tydiqa", "araelectra-base-discriminator-2703-arabic_tydiqa", "bert-base-arabertv02-440-arabic_tydiqa","------", "aubmindlab/bert-base-arabertv02", "CAMeL-Lab/bert-base-arabic-camelbert-ca", "aubmindlab/araelectra-base-discriminator" ]

num_models = 19 # @param {type:"integer"}

experiment_mode = "my-faithful-processed"  # @param ["QQA23_TaskB_qrcd_v1.2", "QQA23_TaskB_qrcd_v1.2_merged", "my-faithful-processed"]
loss_type = "MAL"  # @param ['first', 'MAL']

lr = "2e-5"  # @param ["2e-5","1e-5","5e-6","1e-6"]

pairwise_decoder = True # insensitive parameter (linear vs all-pairs decoding)

for idx in range(num_models):
    out_file = f"{idx}-out.txt"
    err_file = f"{idx}-err.txt"
    if experiment_mode == "my-faithful-processed":
        train_file = choice(glob.glob("data/my-faithful-processed*_train*"))
        train_file = os.path.split(train_file)[-1]
        validation_file = train_file.replace("_train","_dev")
    elif experiment_mode == "QQA23_TaskB_qrcd_v1.2_merged":
        train_file = "QQA23_TaskB_qrcd_v1.2_merged_preprocessed.jsonl"
        validation_file = None
    elif experiment_mode == "QQA23_TaskB_qrcd_v1.2":
        train_file = experiment_mode+ "_train_preprocessed.jsonl"
        validation_file = experiment_mode+ "_dev_preprocessed.jsonl"
    else:
        train_file = experiment_mode+ "_train.jsonl"
        validation_file = experiment_mode+ "_dev.jsonl"

    output_folder = os.path.split(model_name)[-1] + f"-fine-tuned-{float(lr)}" + "-" +loss_type
    if not pairwise_decoder:
        output_folder += "-linear"

    batch_size = 8 if "large" in model_name else 16

    print(train_file)
    print(validation_file)
    train_file = os.path.join("../../data",train_file)

    !git pull
    !rm -r $output_folder

    if validation_file:
        validation_file = os.path.join("../../data",validation_file)
        !python "runners/run_qa.py" \
            --model_name_or_path  $model_name \
            --dataset "data_scripts/loader_scripts/qrcd_v1_2_dataset_loader.py" \
            --do_train \
            --do_eval \
            --do_predict \
            --per_device_train_batch_size $batch_size \
            --learning_rate $lr \
            --num_train_epochs 50 \
            --max_seq_length 384 \
            --doc_stride 128 \
            --max_answer_length 35 \
            --output_dir $output_folder \
            --overwrite_output_dir  \
            --overwrite_cache \
            --train_file $train_file \
            --validation_file $validation_file \
            --test_file  $validation_file \
            --save_total_limit 2 \
            --save_strategy "epoch" \
            --eval_steps 3 \
            --eval_metric "metrics/QQA23_metric.py" \
            --evaluation_strategy "epoch" \
            --metric_for_best_model 'eval_pAP@10' \
            --load_best_model_at_end  True \
            --greater_is_better True \
            --pairwise_decoder $pairwise_decoder \
            --loss_type  $loss_type  >$out_file 2> $err_file
    else:
        validation_file = "../../data/QQA23_TaskB_qrcd_v1.2_dev_preprocessed.jsonl"
        # competition test phase
        !python "runners/run_qa.py" \
            --model_name_or_path  $model_name \
            --dataset "data_scripts/loader_scripts/qrcd_v1_2_dataset_loader.py" \
            --do_train \
            --do_predict \
            --per_device_train_batch_size $batch_size \
            --learning_rate $lr \
            --num_train_epochs 15 \
            --max_seq_length 384 \
            --doc_stride 128 \
            --max_answer_length 35 \
            --output_dir $output_folder \
            --overwrite_output_dir  \
            --overwrite_cache \
            --train_file $train_file \
            --test_file  $validation_file \
            --save_total_limit 2 \
            --save_strategy "epoch" \
            --eval_metric "metrics/QQA23_metric.py" \
            --disable_early_stopping True \
            --pairwise_decoder $pairwise_decoder \
            --loss_type  $loss_type

#

# Analysis and ensemble

**Once the training is made you will find a dump file saved!**

Something like: bert-base-arabertv02-fine-tuned-2e-05-first-827-QQA23_TaskB_qrcd_v1.2_train.zip
This is a bert-base-arabertv02 fine-tuned model with:
1. learning rate of 2e-05.
2. first learning method.
3. A random starting seed of 827.
4. QQA23_TaskB_qrcd_v1.2_train training data is used

This dump file contains all summary results and model predictions for each sample.
You can look at the **analysis** directory of the repo for more details.
You can group dump files into folders:
1. run **performance_analysis.py** script
2. Then run **ensemble_analysis.py** to get ensemble results from the **performance_analysis.py** script results.
3. run **generate_reports.py**  to get excel aggregated results and kernel density plots for different runs.


In [None]:
!python analysis/performance_analysis.py

In [None]:
!python analysis/ensemble_analysis.py

In [None]:
!python analysis/generate_reports.py