[examples] Add trainer support for question-answering #4829

patil-suraj · 2020-06-07T04:57:50Z

This PR adds trainer support for question-answering task. Regarding issue #4784

TODOs

Add automatic data loading. Right now it requires the user to specify data directory. Decided not to use tfds because I think it will be soon replaced by nlp here
Add evaluation
Test all models.

codecov · 2020-06-07T05:04:02Z

Codecov Report

Merging #4829 into master will increase coverage by 1.00%.
The diff coverage is 49.41%.

@@            Coverage Diff             @@
##           master    #4829      +/-   ##
==========================================
+ Coverage   76.84%   77.84%   +1.00%     
==========================================
  Files         141      142       +1     
  Lines       24685    24768      +83     
==========================================
+ Hits        18969    19281     +312     
+ Misses       5716     5487     -229

Impacted Files	Coverage Δ
src/transformers/data/datasets/squad.py	`47.56% <47.56%> (ø)`
src/transformers/__init__.py	`99.22% <100.00%> (ø)`
src/transformers/data/datasets/__init__.py	`100.00% <100.00%> (ø)`
src/transformers/modeling_tf_openai.py	`20.78% <0.00%> (-74.20%)`	⬇️
src/transformers/modeling_tf_bert.py	`73.37% <0.00%> (-25.00%)`	⬇️
src/transformers/modeling_openai.py	`79.72% <0.00%> (-1.38%)`	⬇️
src/transformers/generation_tf_utils.py	`86.21% <0.00%> (+1.00%)`	⬆️
src/transformers/modeling_tf_electra.py	`95.38% <0.00%> (+68.46%)`	⬆️
src/transformers/modeling_tf_mobilebert.py	`96.72% <0.00%> (+73.10%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d2a9399...5497ae6. Read the comment docs.

patrickvonplaten · 2020-06-07T12:14:55Z

Hi @patil-suraj,

I think @julien-c can answer questions regarding the Trainer better :-)

examples/question-answering/run_squad_trainer.py

julien-c · 2020-06-15T15:15:30Z

src/transformers/data/data_collator.py

+        for key in keys:
+            inputs[key] = torch.stack([example[key] for example in batch])
+
+        return inputs


pretty clean implementation of data collator. cc @sgugger

Yes, quick question though, why does the default not work here?

(I think it's ~identical to the default one so maybe we can merge them and just use the default here)

default collator needs List[InputDataClass] but I'm returning List[Dict[str, torch.Tensor]] so I was not able to use the default one

InputDataClass is just a type alias for anything. I think the default should work fine for you (but we can change its implementation to use this).
Note that #5015 will change how data collator work a little bit and a few names, so we'll need a few adjustments here.

InputDataClass is just a type alias for anything.

Yes, but I think it assumes that it will be class as it uses getattr and vars. How should I proceed ?

maybe we can change the default implem to accept more input types

Let me know if you want to do it @sgugger or @patil-suraj

@julien-c can we just add a simple type check in default collator, i.e if the input is dict we can call example.get instead of getattr, or maybe convert the dict to a simple class ?

I'll add this this morning with the backward compatibility to the old DataCollator style.

julien-c

Looks great!

My main question is, have you reproduced the training results that are documented in https://github.com/huggingface/transformers/blob/master/examples/question-answering/README.md ?

If you have, can you link to a webpage on your favorite experiment tracking service (weights and biases, cc @borisdayma:), Tensorboard.dev, etc) for reference?

Also you can just replace run_squad.py instead of creating a new file, but we can do this at the end/just before merging.

Finally, you can add a checkmark to the Big Table Of Tasks

julien-c · 2020-06-15T15:27:52Z

src/transformers/data/datasets/squad.py

+class SquadDataset(Dataset):
+    """
+    This will be superseded by a framework-agnostic approach
+    soon.


Just adding a note/clarification that this will be replaced by the https://github.com/huggingface/nlp library soon-ish, cc @thomwolf

Yes, that will be actually better than this ;)

borisdayma · 2020-06-15T15:33:29Z

Just in case you wanted to use Weights & Biases, you should just have to do a pip install wandb and it should automatically track everything.

patil-suraj · 2020-06-15T15:54:50Z

My main question is, have you reproduced the training results that are documented in https://github.com/huggingface/transformers/blob/master/examples/question-answering/README.md ?

I didn't train bert-base (just trained for 1 epoch to see if the implementation was working) but instead I used it to train electra-base and it gave better results than mentioned in the paper

In the paper the authors mentioned that electra-base achieves 84.5 EM and 90.8 F1. I was able to achieve 85.05 EM and 91.60 F1. Sadly didn't use wandb, you can find the colab here

It uses the same code, just copy pasted in colab. But if required I can try to reproduce the documented results.

julien-c · 2020-06-15T18:23:50Z

I didn't train bert-base (just trained for 1 epoch to see if the implementation was working)

I can do it tomorrow morning, I currently have a V100 on hand:)

borisdayma · 2020-06-15T19:42:07Z

Just a note that I tried python run_squad_trainer.py --model_name_or_path bert-base-uncased --model_type bert --data_dir squad --output_dir /tmp/debug_squad/ --overwrite_output_dir --do_train --do_eval --evaluate_during_training --logging_steps 100.

For some reason I don't get any evaluation metric during training (I was expecting loss or eval_loss).

patil-suraj · 2020-06-16T10:02:02Z

Just in case you wanted to use Weights & Biases, you should just have to do a pip install wandb and it should automatically track everything.

@borisdayma yes, there are no start and end positions in eval dataset which is why eval loss is not calculated. I will add that. Were you able to see training loss ?
Thanks !

julien-c · 2020-06-16T12:12:56Z

yes, there are no start and end positions in eval dataset which is why eval loss is not calculated. I will add that. Were you able to see training loss ?

Hmm, I'm pretty sure the dev-v1.1.json file has the same labels as the training one (start positions). Otherwise we wouldn't have any eval results at all in the readme. No?

pinging @LysandreJik on this:)

borisdayma · 2020-06-16T13:10:30Z

@borisdayma yes, there are no start and end positions in eval dataset which is why eval loss is not calculated. I will add that. Were you able to see training loss ?

Yes, training loss was logged.

LysandreJik · 2020-06-16T13:45:36Z

@julien-c In the two TensorDatasets created (one for training and one for evaluation), only the training has the correct start_position and end_position.

I believe this is because while the training dataset only has one possible answer per question, the dev and validation datasets both have multiple answers per question (usually different-lengths spans).

julien-c · 2020-06-18T12:45:43Z

@LysandreJik So I guess we should update the eval dataset to pick one start_position (or the most frequent one) – how do people do it usually with SQuAD eval, do you know @thomwolf?

Maybe this can be done in a second PR though. Everyone ok with merging this (renaming run_squad_trainer.py to run_squad.py)?

LysandreJik

Yes, sounds good!

sgugger · 2020-06-29T17:52:00Z

@patil-suraj Can you resolve the conflicts and switch to the new default_data_collator now that it should work for your dict inputs?
I can take over if you don't have time, but this is the only thing standing in the way of merging this PR.

patil-suraj · 2020-06-29T17:57:23Z

@sgugger Yes, I'll switch to the new data collator.

patil-suraj · 2020-07-01T14:10:03Z

Hi @sgugger, you can take this over, I'm running short on time ;(

patil-suraj · 2020-07-07T13:54:48Z

Thanks @sgugger :)

julien-c · 2020-07-07T17:47:08Z

@sgugger can you please rename run_squad_trainer.py to run_squad.py? see also #5547

patil-suraj added 6 commits June 7, 2020 08:53

add SquadDataset

02f51b8

add DataCollatorForQuestionAnswering

98bd259

update __init__

39aa94b

add run_squad with trainer

4eaa555

add DataCollatorForQuestionAnswering in __init__

5527121

pass data_collator to trainer

081b40c

daniel-shan reviewed Jun 7, 2020

View reviewed changes

examples/question-answering/run_squad_trainer.py Show resolved Hide resolved

julien-c reviewed Jun 15, 2020

View reviewed changes

julien-c requested a review from sgugger June 15, 2020 15:15

julien-c approved these changes Jun 15, 2020

View reviewed changes

julien-c reviewed Jun 15, 2020

View reviewed changes

doc tweak

f54332f

avacaondata mentioned this pull request Jun 18, 2020

Trainer evaluation doesn't return eval loss for question-answering. #5104

Closed

LysandreJik approved these changes Jun 18, 2020

View reviewed changes

sgugger added 2 commits July 7, 2020 08:30

Merge branch 'master' into run_squad_trainer

ab6e170

Update run_squad_trainer.py

9bf2ce5

sgugger added 2 commits July 7, 2020 08:39

Update __init__.py

4a1307d

Update __init__.py

5497ae6

sgugger merged commit e49393c into huggingface:master Jul 7, 2020

sgugger mentioned this pull request Jul 7, 2020

Rename files #5582

Closed

This was referenced Sep 8, 2020

run_squad.py not working on 3.1.0 version #6997

Closed

SQuAD: Implement eval in Trainer-backed run_squad_trainer #7032

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[examples] Add trainer support for question-answering #4829

[examples] Add trainer support for question-answering #4829

patil-suraj commented Jun 7, 2020 •

edited

Loading

codecov bot commented Jun 7, 2020 •

edited

Loading

patrickvonplaten commented Jun 7, 2020

julien-c Jun 15, 2020

sgugger Jun 15, 2020

julien-c Jun 15, 2020

patil-suraj Jun 15, 2020

sgugger Jun 15, 2020 •

edited

Loading

patil-suraj Jun 15, 2020

julien-c Jun 15, 2020

julien-c Jun 16, 2020

patil-suraj Jun 16, 2020

sgugger Jun 16, 2020

julien-c left a comment

julien-c Jun 15, 2020

patil-suraj Jun 15, 2020

borisdayma commented Jun 15, 2020

patil-suraj commented Jun 15, 2020

julien-c commented Jun 15, 2020

borisdayma commented Jun 15, 2020

patil-suraj commented Jun 16, 2020

julien-c commented Jun 16, 2020 •

edited

Loading

borisdayma commented Jun 16, 2020

LysandreJik commented Jun 16, 2020

julien-c commented Jun 18, 2020

LysandreJik left a comment

sgugger commented Jun 29, 2020

patil-suraj commented Jun 29, 2020

patil-suraj commented Jul 1, 2020

patil-suraj commented Jul 7, 2020

julien-c commented Jul 7, 2020

[examples] Add trainer support for question-answering #4829

[examples] Add trainer support for question-answering #4829

Conversation

patil-suraj commented Jun 7, 2020 • edited Loading

codecov bot commented Jun 7, 2020 • edited Loading

Codecov Report

patrickvonplaten commented Jun 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgugger Jun 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julien-c left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

borisdayma commented Jun 15, 2020

patil-suraj commented Jun 15, 2020

julien-c commented Jun 15, 2020

borisdayma commented Jun 15, 2020

patil-suraj commented Jun 16, 2020

julien-c commented Jun 16, 2020 • edited Loading

borisdayma commented Jun 16, 2020

LysandreJik commented Jun 16, 2020

julien-c commented Jun 18, 2020

LysandreJik left a comment

Choose a reason for hiding this comment

sgugger commented Jun 29, 2020

patil-suraj commented Jun 29, 2020

patil-suraj commented Jul 1, 2020

patil-suraj commented Jul 7, 2020

julien-c commented Jul 7, 2020

patil-suraj commented Jun 7, 2020 •

edited

Loading

codecov bot commented Jun 7, 2020 •

edited

Loading

sgugger Jun 15, 2020 •

edited

Loading

julien-c commented Jun 16, 2020 •

edited

Loading