Thanks You & Reproducing Baselines #1

ednussi · 2021-04-14T03:19:00Z

Thank you very much for posting this code!
It is extremely helpful in reproducing the results.

I wanted to inquire if you can share details about reproducing the baselines provided in the paper, as we've been having some troubles with reproducing those numbers, specifically over the roberta-base vanilla experiment.

See here for more details.
Thanks!

oriram · 2021-04-14T11:52:34Z

Hi @ednussi, thanks for expressing interest in our work!
Can you please share the command line you used?
In addition, did you use our mrqa-few-shot splits?

ednussi · 2021-04-18T01:46:17Z

Is there a way to reproduce the RoBERTa & SpanBERT baseline reported and shown in your paper from figure 1 (and figure 4) from the given repo? If so how would I run it?

As of now, I've tried to reproduce it by creating a very simplified and standalone code as mentioned above in https://github.com/ednussi/thesis_public , following your paper.

My setup is as follow:
Data: SQuAD (Rajpurkar et al., 2016)
Sampled in logarithmic scale #q-a-c triplets.
Repeat over 5 random seeds - results reported as averages.

Model: RoBERTa (Liu et al., 2019) (huggingface implementation)

FineTuning Setup:
Solver: Adam, default configuration of the HuggingFace Transformers package (Wolf et al., 2020)

Steps: max(10 epochs, 200 steps)
Batch size: 12
learning rate: 3*10^-5 for first 10% steps, following a linear decay.

As you can see in my repo, my current setup reaches only about 28 average F1 score over 256 samples (max /min shown in the grayed areas).

Could you please help me identify if I incorrectly understood or am missing something in my configuration to reproduce the baseline? For now, I have already noticed my implementation differs from your in some parts (e.g. lr_scheduler assignment with AdamW) and currently trying to match as many code components as possible.

oriram · 2021-04-18T06:46:01Z

Hi @ednussi ,
I do notice some differences (e.g., warm-up, max answer length).
However, in order to reproduce our results, I would recommend using our repo.
See for example this script.

ednussi · 2021-04-18T17:33:59Z

Thanks, switched to tried preproducing w/ your repo. Currently cloned and tried running:
python finetuning/run_mrqa.py --model_type=bert --model_name_or_path=$MODEL --qass_head=False --tokenizer_name=$MODEL --output_dir=$OUTPUT_DIR --train_file="squad/squad-train-seed-42-num-examples-16.jsonl" --predict_file="squad/dev.jsonl" --do_train --do_eval --cache_dir=.cache --max_seq_length=384 --doc_stride=128 --threads=4 --save_steps=50000 --per_gpu_train_batch_size=12 --per_gpu_eval_batch_size=16 --learning_rate=3e-5 --max_answer_length=10 --warmup_ratio=0.1 --min_steps=200 --num_train_epochs=10 --seed=42 --use_cache=False --evaluate_every_epoch=False --overwrite_output_dir

But fails with:

multiprocessing.pool.RemoteTraceback:

Traceback (most recent call last):
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\site-packages\transformers\data\processors\squad.py", li
ne 181, in squad_convert_example_to_features
    encoded_dict = tokenizer.encode_plus(  # TODO(thom) update this logic
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\site-packages\transformers\tokenization_utils_base.py",
line 2344, in encode_plus
    return self._encode_plus(
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\site-packages\transformers\tokenization_utils_fast.py",
line 458, in _encode_plus
    batched_output = self._batch_encode_plus(
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\site-packages\transformers\tokenization_utils_fast.py",
line 385, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
TypeError: TextInputSequence must be str

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "finetuning/run_mrqa.py", line 985, in <module>
    main()
  File "finetuning/run_mrqa.py", line 902, in main
    train_dataset = load_and_cache_examples(args, tokenizer, evaluate=False, output_examples=False,
  File "finetuning/run_mrqa.py", line 536, in load_and_cache_examples
    features, dataset = squad_convert_examples_to_features(
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\site-packages\transformers\data\processors\squad.py", li
ne 377, in squad_convert_examples_to_features
    features = list(
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\site-packages\tqdm\std.py", line 1133, in __iter__
    for obj in iterable:
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\multiprocessing\pool.py", line 420, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\multiprocessing\pool.py", line 868, in next
    raise value
TypeError: TextInputSequence must be str

Full log preceding fail:

2021-04-18 13:30:33.485893: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dyna
mic library cudart64_110.dll
04/18/2021 13:30:35 - WARNING - __main__ -   Process rank: -1, device: cuda, n_gpu: 1, distributed training: False,
16-bits training: False
Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at SpanBERT/spanbert-base-ca
sed and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
04/18/2021 13:30:38 - INFO - __main__ -   Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='.c
ache', config_name='', data_dir=None, dataset='squad', dataset_format='mrqa', device=device(type='cuda'), disable_se
gments_embeddings=False, do_eval=True, do_lower_case=False, do_train=True, doc_stride=128, dont_output_nbest=False,
eval_all_checkpoints=False, eval_steps=5000, evaluate_during_training=False, evaluate_every_epoch=False, fp16=False,
 fp16_opt_level='O1', gradient_accumulation_steps=1, initialize_new_qass=True, lang_id=0, learning_rate=3e-05, local
_rank=-1, logging_steps=500, max_answer_length=10, max_grad_norm=1.0, max_query_length=64, max_seq_length=384, max_s
teps=-1, min_steps=200, model_name_or_path='SpanBERT/spanbert-base-cased', model_type='bert', n_best_size=20, n_gpu=
1, nbest_calculation=False, no_cuda=False, null_score_diff_threshold=0.0, num_train_epochs=10.0, output_dir='output'
, overwrite_cache=False, overwrite_output_dir=True, per_gpu_eval_batch_size=16, per_gpu_train_batch_size=12, predict
_file='../mrqa-few-shot/squad/dev.jsonl', qass_head=False, save_steps=50000, seed=42, server_ip='', server_port='',
threads=4, tokenizer_name='SpanBERT/spanbert-base-cased', train_file='squad/squad-train-seed-42-num-examples-16.json
l', use_cache=False, verbose_logging=False, version_2_with_negative=False, warmup_ratio=0.1, weight_decay=0.0)
04/18/2021 13:30:38 - INFO - __main__ -   Creating features from dataset file at .
{"header": {"dataset": "SQuAD", "split": "train"}}

100%|████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 7973.96it/s]
2021-04-18 13:30:39.922294: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dyna
mic library cudart64_110.dll
2021-04-18 13:30:42.852378: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dyna
mic library cudart64_110.dll
2021-04-18 13:30:45.768860: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dyna
mic library cudart64_110.dll
2021-04-18 13:30:48.681627: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dyna
mic library cudart64_110.dll
convert squad examples to features:   0%|                                                   | 0/16 [00:00<?, ?it/s]

oriram · 2021-04-18T17:55:13Z

Are you using our finetuning/requirements.txt file?

ednussi · 2021-04-18T18:09:29Z

Good point, recreating a fresh venv resolved my issue and it is currently running the fine-tuning.
Thanks for your help and looking forward to report I was able to reproduce.

ednussi · 2021-04-21T00:51:06Z

Just quickly reopening to share I was able to reproduce the roberta-base results, running on a single rtx2080 by running this shell script:

export MODEL="roberta-base"
export OUTPUT_DIR="output"
for i in 64 128 256
do
  for j in 42 43 44 45 46
  do
    echo "Loop $i-$j"
    python run_mrqa.py --model_type=$MODEL --model_name_or_path=$MODEL --qass_head=False --tokenizer_name=$MODEL --output_dir="output$i-$j" --train_file="splinter/squad/squad-train-seed-$j-num-examples-$i.jsonl" --predict_file="splinter/squad/dev.jsonl" --do_train --do_eval --cache_dir=.cache --max_seq_length=384 --doc_stride=128 --threads=4 --save_steps=50000 --per_gpu_train_batch_size=12 --per_gpu_eval_batch_size=12 --learning_rate=3e-5 --max_answer_length=10 --warmup_ratio=0.1 --min_steps=200 --num_train_epochs=10 --seed=$j --use_cache=False --evaluate_every_epoch=False

Thanks for all the help!

ednussi · 2021-04-21T04:33:44Z

Lastly, since you have not provided a license, I wanted to kindly inquire if I may use the code in my research.
Explicitly, I'd like to use it for academic purposes only, as a base for my M.Sc. CS Thesis work at HUJI.
I rarely come across code this clear and of this quality in academia and it will potentially speed up my research substantially without the need to reimplement submodules by hand.

oriram · 2021-04-21T06:33:47Z

Happy to see you managed to reproduce our results!
I added the MIT license to our repo, and of course you're most welcome to use it :)

ednussi closed this as completed Apr 18, 2021

ednussi reopened this Apr 21, 2021

ednussi closed this as completed Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thanks You & Reproducing Baselines #1

Thanks You & Reproducing Baselines #1

ednussi commented Apr 14, 2021

oriram commented Apr 14, 2021 •

edited

ednussi commented Apr 18, 2021 •

edited

oriram commented Apr 18, 2021

ednussi commented Apr 18, 2021 •

edited

oriram commented Apr 18, 2021

ednussi commented Apr 18, 2021

ednussi commented Apr 21, 2021

ednussi commented Apr 21, 2021 •

edited

oriram commented Apr 21, 2021

Thanks You & Reproducing Baselines #1

Thanks You & Reproducing Baselines #1

Comments

ednussi commented Apr 14, 2021

oriram commented Apr 14, 2021 • edited

ednussi commented Apr 18, 2021 • edited

oriram commented Apr 18, 2021

ednussi commented Apr 18, 2021 • edited

oriram commented Apr 18, 2021

ednussi commented Apr 18, 2021

ednussi commented Apr 21, 2021

ednussi commented Apr 21, 2021 • edited

oriram commented Apr 21, 2021

oriram commented Apr 14, 2021 •

edited

ednussi commented Apr 18, 2021 •

edited

ednussi commented Apr 18, 2021 •

edited

ednussi commented Apr 21, 2021 •

edited