Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thanks You & Reproducing Baselines #1

Closed
ednussi opened this issue Apr 14, 2021 · 9 comments
Closed

Thanks You & Reproducing Baselines #1

ednussi opened this issue Apr 14, 2021 · 9 comments

Comments

@ednussi
Copy link

ednussi commented Apr 14, 2021

Thank you very much for posting this code!
It is extremely helpful in reproducing the results.

I wanted to inquire if you can share details about reproducing the baselines provided in the paper, as we've been having some troubles with reproducing those numbers, specifically over the roberta-base vanilla experiment.

See here for more details.
Thanks!

@oriram
Copy link
Owner

oriram commented Apr 14, 2021

Hi @ednussi, thanks for expressing interest in our work!
Can you please share the command line you used?
In addition, did you use our mrqa-few-shot splits?

@ednussi
Copy link
Author

ednussi commented Apr 18, 2021

Is there a way to reproduce the RoBERTa & SpanBERT baseline reported and shown in your paper from figure 1 (and figure 4) from the given repo? If so how would I run it?
Figure 1 from paper

As of now, I've tried to reproduce it by creating a very simplified and standalone code as mentioned above in https://github.com/ednussi/thesis_public , following your paper.

My setup is as follow:
Data: SQuAD (Rajpurkar et al., 2016)
Sampled in logarithmic scale #q-a-c triplets.
Repeat over 5 random seeds - results reported as averages.

Model: RoBERTa (Liu et al., 2019) (huggingface implementation)

FineTuning Setup:
Solver: Adam, default configuration of the HuggingFace Transformers package (Wolf et al., 2020)

Steps: max(10 epochs, 200 steps)
Batch size: 12
learning rate: 3*10^-5 for first 10% steps, following a linear decay.

As you can see in my repo, my current setup reaches only about 28 average F1 score over 256 samples (max /min shown in the grayed areas).
image

Could you please help me identify if I incorrectly understood or am missing something in my configuration to reproduce the baseline? For now, I have already noticed my implementation differs from your in some parts (e.g. lr_scheduler assignment with AdamW) and currently trying to match as many code components as possible.

@oriram
Copy link
Owner

oriram commented Apr 18, 2021

Hi @ednussi ,
I do notice some differences (e.g., warm-up, max answer length).
However, in order to reproduce our results, I would recommend using our repo.
See for example this script.

@ednussi
Copy link
Author

ednussi commented Apr 18, 2021

Thanks, switched to tried preproducing w/ your repo. Currently cloned and tried running:
python finetuning/run_mrqa.py --model_type=bert --model_name_or_path=$MODEL --qass_head=False --tokenizer_name=$MODEL --output_dir=$OUTPUT_DIR --train_file="squad/squad-train-seed-42-num-examples-16.jsonl" --predict_file="squad/dev.jsonl" --do_train --do_eval --cache_dir=.cache --max_seq_length=384 --doc_stride=128 --threads=4 --save_steps=50000 --per_gpu_train_batch_size=12 --per_gpu_eval_batch_size=16 --learning_rate=3e-5 --max_answer_length=10 --warmup_ratio=0.1 --min_steps=200 --num_train_epochs=10 --seed=42 --use_cache=False --evaluate_every_epoch=False --overwrite_output_dir

But fails with:

multiprocessing.pool.RemoteTraceback:

Traceback (most recent call last):
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\site-packages\transformers\data\processors\squad.py", li
ne 181, in squad_convert_example_to_features
    encoded_dict = tokenizer.encode_plus(  # TODO(thom) update this logic
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\site-packages\transformers\tokenization_utils_base.py",
line 2344, in encode_plus
    return self._encode_plus(
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\site-packages\transformers\tokenization_utils_fast.py",
line 458, in _encode_plus
    batched_output = self._batch_encode_plus(
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\site-packages\transformers\tokenization_utils_fast.py",
line 385, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
TypeError: TextInputSequence must be str

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "finetuning/run_mrqa.py", line 985, in <module>
    main()
  File "finetuning/run_mrqa.py", line 902, in main
    train_dataset = load_and_cache_examples(args, tokenizer, evaluate=False, output_examples=False,
  File "finetuning/run_mrqa.py", line 536, in load_and_cache_examples
    features, dataset = squad_convert_examples_to_features(
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\site-packages\transformers\data\processors\squad.py", li
ne 377, in squad_convert_examples_to_features
    features = list(
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\site-packages\tqdm\std.py", line 1133, in __iter__
    for obj in iterable:
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\multiprocessing\pool.py", line 420, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "D:\Users\erann\miniconda3\envs\thesis-conda-env\lib\multiprocessing\pool.py", line 868, in next
    raise value
TypeError: TextInputSequence must be str

Full log preceding fail:

2021-04-18 13:30:33.485893: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dyna
mic library cudart64_110.dll
04/18/2021 13:30:35 - WARNING - __main__ -   Process rank: -1, device: cuda, n_gpu: 1, distributed training: False,
16-bits training: False
Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at SpanBERT/spanbert-base-ca
sed and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
04/18/2021 13:30:38 - INFO - __main__ -   Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='.c
ache', config_name='', data_dir=None, dataset='squad', dataset_format='mrqa', device=device(type='cuda'), disable_se
gments_embeddings=False, do_eval=True, do_lower_case=False, do_train=True, doc_stride=128, dont_output_nbest=False,
eval_all_checkpoints=False, eval_steps=5000, evaluate_during_training=False, evaluate_every_epoch=False, fp16=False,
 fp16_opt_level='O1', gradient_accumulation_steps=1, initialize_new_qass=True, lang_id=0, learning_rate=3e-05, local
_rank=-1, logging_steps=500, max_answer_length=10, max_grad_norm=1.0, max_query_length=64, max_seq_length=384, max_s
teps=-1, min_steps=200, model_name_or_path='SpanBERT/spanbert-base-cased', model_type='bert', n_best_size=20, n_gpu=
1, nbest_calculation=False, no_cuda=False, null_score_diff_threshold=0.0, num_train_epochs=10.0, output_dir='output'
, overwrite_cache=False, overwrite_output_dir=True, per_gpu_eval_batch_size=16, per_gpu_train_batch_size=12, predict
_file='../mrqa-few-shot/squad/dev.jsonl', qass_head=False, save_steps=50000, seed=42, server_ip='', server_port='',
threads=4, tokenizer_name='SpanBERT/spanbert-base-cased', train_file='squad/squad-train-seed-42-num-examples-16.json
l', use_cache=False, verbose_logging=False, version_2_with_negative=False, warmup_ratio=0.1, weight_decay=0.0)
04/18/2021 13:30:38 - INFO - __main__ -   Creating features from dataset file at .
{"header": {"dataset": "SQuAD", "split": "train"}}

100%|████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 7973.96it/s]
2021-04-18 13:30:39.922294: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dyna
mic library cudart64_110.dll
2021-04-18 13:30:42.852378: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dyna
mic library cudart64_110.dll
2021-04-18 13:30:45.768860: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dyna
mic library cudart64_110.dll
2021-04-18 13:30:48.681627: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dyna
mic library cudart64_110.dll
convert squad examples to features:   0%|                                                   | 0/16 [00:00<?, ?it/s]

@oriram
Copy link
Owner

oriram commented Apr 18, 2021

Are you using our finetuning/requirements.txt file?

@ednussi
Copy link
Author

ednussi commented Apr 18, 2021

Good point, recreating a fresh venv resolved my issue and it is currently running the fine-tuning.
Thanks for your help and looking forward to report I was able to reproduce.

@ednussi ednussi closed this as completed Apr 18, 2021
@ednussi ednussi reopened this Apr 21, 2021
@ednussi
Copy link
Author

ednussi commented Apr 21, 2021

Just quickly reopening to share I was able to reproduce the roberta-base results, running on a single rtx2080 by running this shell script:

export MODEL="roberta-base"
export OUTPUT_DIR="output"
for i in 64 128 256
do
  for j in 42 43 44 45 46
  do
    echo "Loop $i-$j"
    python run_mrqa.py --model_type=$MODEL --model_name_or_path=$MODEL --qass_head=False --tokenizer_name=$MODEL --output_dir="output$i-$j" --train_file="splinter/squad/squad-train-seed-$j-num-examples-$i.jsonl" --predict_file="splinter/squad/dev.jsonl" --do_train --do_eval --cache_dir=.cache --max_seq_length=384 --doc_stride=128 --threads=4 --save_steps=50000 --per_gpu_train_batch_size=12 --per_gpu_eval_batch_size=12 --learning_rate=3e-5 --max_answer_length=10 --warmup_ratio=0.1 --min_steps=200 --num_train_epochs=10 --seed=$j --use_cache=False --evaluate_every_epoch=False

Thanks for all the help!
image

@ednussi
Copy link
Author

ednussi commented Apr 21, 2021

Lastly, since you have not provided a license, I wanted to kindly inquire if I may use the code in my research.
Explicitly, I'd like to use it for academic purposes only, as a base for my M.Sc. CS Thesis work at HUJI.
I rarely come across code this clear and of this quality in academia and it will potentially speed up my research substantially without the need to reimplement submodules by hand.

@oriram
Copy link
Owner

oriram commented Apr 21, 2021

Happy to see you managed to reproduce our results!
I added the MIT license to our repo, and of course you're most welcome to use it :)

@ednussi ednussi closed this as completed Apr 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants