## LMQG Model Development

<b>Model</b>
<br>T5 Small / BART Base / T5 Base / T5 Large / BART Large
<br>Gold QA (i.e., the model using the human-labeled gold
annotations)

<b>QAG approaches</b>
<br>End2Endv / MultiTask / Pipeline

<b>Dataset</b>
<br>SQuADShifts (Amazon / Wiki / NYT / Reddit)

<b>Experiment</b>
<br>All / Amazon / Wiki / NYT / Reddit

In [None]:
%pip install lmqg
%pip install peft
%pip install easydict
%pip install termcolor
%pip install openpyxl
%pip install ipywidgets
!python -m spacy download en

In [1]:
import os
import json
import torch
from pprint import pprint
# from termcolor import coloredg
from lmqg import TransformersQG

In [2]:
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(torch.cuda.current_device()))

True
2
NVIDIA GeForce RTX 4090


In [3]:
pwd

'/'

In [4]:
cd workspace

/workspace


In [5]:
import torch
torch.cuda.empty_cache()

In [6]:
import torch, gc
gc.collect()
torch.cuda.empty_cache()

In [7]:
import wandb

wandb.login()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mminseok0809[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

## Flan T5 Small Model

### Original Model (Squad)

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_small_squad_qg

In [14]:
wandb.finish()

### Lora Model (Squad)

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_small_squad_qg_lora_r8_al16

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_small_squad_qg_lora_r16_al16

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_small_squad_qg_lora_r32_al16

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_small_squad_qg_lora_r64_al16

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_small_squad_qg_lora_r128_al16

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_small_squad_qg_lora_r8_al32

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_small_squad_qg_lora_r16_al32

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_small_squad_qg_lora_r32_al32

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_small_squad_qg_lora_r64_al32

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_small_squad_qg_lora_r128_al32

In [14]:
wandb.finish()

## Flan T5 Large Model

### Original Model (Squad)

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_large_squad_qg

In [None]:
wandb.finish()

### Lora Model (Squad)

In [8]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_large_squad_qg_lora_r8_al32_training

[34m[1mwandb[0m: Currently logged in as: [33mminseok0809[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: wandb version 0.15.9 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade
[34m[1mwandb[0m: Tracking run with wandb version 0.15.8
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/workspace/wandb/run-20230831_084137-r3cvbrl4[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33mflan_t5_large_squad_qg_lora_r8_al32[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/minseok0809/lmqg_qg_squad[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/minseok0809/lmqg_qg_squad/runs/r3cvbrl4[0m


trainable model parameters: 2359296
all model parameters: 785454080
percentage of trainable model parameters: 0.30% 



Epoch 1:   9%|██▌                        | 1762/18792 [06:25<1:02:52,  4.51it/s]

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_large_squad_qg_lora_r8_al32_evaluation

In [None]:
wandb.finish()

## Flan T5 XXL Model

### Original Model (Squad)

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_xxl_squad_qg

In [None]:
wandb.finish()

### Lora Model (Squad)

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_xxl_squad_qg_lora_r8_al32_training 

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_xxl_squad_qg_lora_r8_al32_evaluation

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_xxl_squad_qg_lora_r1_al32_training 

In [None]:
!python -m lm-question-generation.lmqg.grid_searcher_flan_t5_xxl_squad_qg_lora_r1_al32_evaluation

In [None]:
wandb.finish()

## T5 Small Grid Search

In [None]:
from lmqg import GridSearcher

trainer = GridSearcher(
    checkpoint_dir='tmp_ckpt_t5_small_0819',
    dataset_path='lmqg/qg_squad',
    model='t5-small',
    epoch=3,
    epoch_partial=1,
    batch=64,
    n_max_config=5,
    gradient_accumulation_steps=[2], 
    lr=[1e-04],
    label_smoothing=[0, 0.15]
)
trainer.run()

Grid Search
<br>gradient_accumulation_steps=[2, 4], 
<br>lr=[1e-04, 5e-04, 1e-03],
<br>label_smoothing=[0, 0.15]

The number of model: 
<br>gradient_accumulation_steps * lr * label_smoothing 
<br>3 * 2 * 2 = 12

In [58]:
with open('tmp_ckpt_t5_small_0818/best_model/config.json', 'r') as f:
    t5_small_best_model_config = json.load(f)

with open('tmp_ckpt_t5_small_0818/best_model/trainer_config.json', 'r') as f:
    t5_small_best_model_hyperparameter_search = json.load(f)

hyperparameters = ['model', 'epoch', 'batch',
                   'lr', 'gradient_accumulation_steps', 'label_smoothing']
grid_serach_hyperparameter = ['lr', 'gradient_accumulation_steps', 'label_smoothing']

print(colored("Best Model of T5 Small", attrs=['bold']))
print()
for value, key in t5_small_best_model_config.items():
    if '_name_or_path' in value:
        print("{} : {}".format(value, key))   
    
for value, key in t5_small_best_model_hyperparameter_search.items():
    if any(hyperparameter in value for hyperparameter in hyperparameters):
        if any(hyperparameter in value for hyperparameter in grid_serach_hyperparameter):
            print("{} : {}  [Grid Search Result]".format(value, key))
        else:
            print("{} : {}".format(value, key))

Best Model of T5 Small

_name_or_path : tmp_ckpt/model_cghqta/epoch_5
model : t5-small
epoch : 12
batch : 64
lr : 0.0005  [Grid Search Result]
gradient_accumulation_steps : 4  [Grid Search Result]
label_smoothing : 0.15  [Grid Search Result]


## T5 Small Best Model

### End2end QAG

In [49]:
# lmqg/t5-small-squad-qag
CUDA_VISIBLE_DEVICES="0"

!python -m lm-question-generation.lmqg.grid_searcher_t5_small_squad_qag_0821 \
#    --checkpoint_dir='tmp_ckpt_t5_small_squad_qag_0821' \
#    --dataset_path='lmqg/qg_squadshifts' \
#    --dataset_name='all' \
#    --model='t5-small' \
#    --epoch=15 \
#    --epoch_partial=5 \
#    --batch=64 \
#    --n_max_config=5 \
#    --gradient_accumulation_steps=[4] \
#    --lr=[5e-04] \
#    --label_smoothing=[0, 0.15] \
#    --language="en"

  1%|▎                                    | 112/12564 [00:00<00:11, 1119.87it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (539 > 512). Running this sequence through the model will result in indexing errors
100%|████████████████████████████████████| 12564/12564 [00:12<00:00, 987.38it/s]
100%|███████████████████████████████████| 18844/18844 [00:09<00:00, 2063.41it/s]
100%|█████████████████████████████████████| 6283/6283 [00:02<00:00, 2109.38it/s]
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
calculating scores...
computing bert embedding.
100%|█████████████████████████████████████████| 375/375 [00:05<00:00, 67.28it/s]
computing greedy matching.
100%|████████████████████████████████████████| 

### Multitask QAG

In [50]:
CUDA_VISIBLE_DEVICES="0"

!python -m lm-question-generation.lmqg.grid_searcher_t5_small_squad_qg_ae_0821 \
#    --checkpoint_dir='tmp_ckpt_t5_small_squad_qg_ae_0821 \
#    --dataset_path='lmqg/qg_squadshifts' \
#    --dataset_name='all' \
#    --model='t5-small' \
#    --epoch=15 \
#    --epoch_partial=5 \
#    --batch=64 \
#    --n_max_config=5 \
#    --gradient_accumulation_steps=[4] \
#    --lr=[5e-04] \
#    --label_smoothing=[0, 0.15] \
#    --language="en"

Downloading (…)okenizer_config.json: 100%|██| 2.36k/2.36k [00:00<00:00, 478kB/s]
Downloading spiece.model: 100%|██████████████| 792k/792k [00:00<00:00, 1.00MB/s]
Downloading (…)/main/tokenizer.json: 100%|█| 2.42M/2.42M [00:00<00:00, 7.62MB/s]
Downloading (…)in/added_tokens.json: 100%|███| 20.0/20.0 [00:00<00:00, 23.1kB/s]
Downloading (…)cial_tokens_map.json: 100%|█████| 123/123 [00:00<00:00, 71.8kB/s]
Downloading (…)lve/main/config.json: 100%|██| 1.54k/1.54k [00:00<00:00, 336kB/s]
Downloading pytorch_model.bin: 100%|█████████| 242M/242M [00:22<00:00, 10.9MB/s]
  1%|▎                                    | 107/12564 [00:00<00:11, 1065.80it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (539 > 512). Running this sequence through the model will result in indexing errors
100%|████████████████████████████████████| 12564/12564 [00:12<00:00, 982.50it/s]
^C
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/runpy.py", line 194

### Pipeline QAG

In [None]:
CUDA_VISIBLE_DEVICES="0"

!python -m lm-question-generation.lmqg.grid_searcher_t5_small_squad_qg_0821 \
#    --checkpoint_dir='tmp_ckpt_t5_small_squad_qg_0821 \
#    --dataset_path='lmqg/t5-small-squad-qg' \
#    --dataset_name='all' \
#    --model='t5-small' \
#    --epoch=15 \
#    --epoch_partial=5 \
#    --batch=64 \
#    --n_max_config=5 \
#    --gradient_accumulation_steps=[4] \
#    --lr=[5e-04] \
#    --label_smoothing=[0, 0.15] \
#    --language="en"

In [None]:
CUDA_VISIBLE_DEVICES="0"

!python -m lm-question-generation.lmqg.grid_searcher_t5_small_squad_ae_0821 \
#    --checkpoint_dir='tmp_ckpt_t5_small_squad_ae_0821' \
#    --dataset_path='lmqg/t5-small-squad-ae' \
#    --dataset_name='all' \
#    --model='t5-small' \
#    --epoch=15 \
#    --epoch_partial=5 \
#    --batch=64 \
#    --n_max_config=5 \
#    --gradient_accumulation_steps=[4] \
#    --lr=[5e-04] \
#    --label_smoothing=[0, 0.15] \
#    --language="en"

### Source

[lmqg/t5-small-squad-qa](https://huggingface.co/lmqg/t5-small-squad-qa)