Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[examples] Add trainer support for question-answering #4829

Merged
merged 11 commits into from
Jul 7, 2020

Conversation

patil-suraj
Copy link
Contributor

@patil-suraj patil-suraj commented Jun 7, 2020

This PR adds trainer support for question-answering task. Regarding issue #4784

TODOs

  • Add automatic data loading. Right now it requires the user to specify data directory. Decided not to use tfds because I think it will be soon replaced by nlp here
  • Add evaluation
  • Test all models.

@julien-c @patrickvonplaten

@codecov
Copy link

codecov bot commented Jun 7, 2020

Codecov Report

Merging #4829 into master will increase coverage by 1.00%.
The diff coverage is 49.41%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4829      +/-   ##
==========================================
+ Coverage   76.84%   77.84%   +1.00%     
==========================================
  Files         141      142       +1     
  Lines       24685    24768      +83     
==========================================
+ Hits        18969    19281     +312     
+ Misses       5716     5487     -229     
Impacted Files Coverage Δ
src/transformers/data/datasets/squad.py 47.56% <47.56%> (ø)
src/transformers/__init__.py 99.22% <100.00%> (ø)
src/transformers/data/datasets/__init__.py 100.00% <100.00%> (ø)
src/transformers/modeling_tf_openai.py 20.78% <0.00%> (-74.20%) ⬇️
src/transformers/modeling_tf_bert.py 73.37% <0.00%> (-25.00%) ⬇️
src/transformers/modeling_openai.py 79.72% <0.00%> (-1.38%) ⬇️
src/transformers/generation_tf_utils.py 86.21% <0.00%> (+1.00%) ⬆️
src/transformers/modeling_tf_electra.py 95.38% <0.00%> (+68.46%) ⬆️
src/transformers/modeling_tf_mobilebert.py 96.72% <0.00%> (+73.10%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d2a9399...5497ae6. Read the comment docs.

@patrickvonplaten
Copy link
Contributor

Hi @patil-suraj,

I think @julien-c can answer questions regarding the Trainer better :-)

for key in keys:
inputs[key] = torch.stack([example[key] for example in batch])

return inputs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty clean implementation of data collator. cc @sgugger

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, quick question though, why does the default not work here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think it's ~identical to the default one so maybe we can merge them and just use the default here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default collator needs List[InputDataClass] but I'm returning List[Dict[str, torch.Tensor]] so I was not able to use the default one

Copy link
Collaborator

@sgugger sgugger Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InputDataClass is just a type alias for anything. I think the default should work fine for you (but we can change its implementation to use this).
Note that #5015 will change how data collator work a little bit and a few names, so we'll need a few adjustments here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InputDataClass is just a type alias for anything.

Yes, but I think it assumes that it will be class as it uses getattr and vars. How should I proceed ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can change the default implem to accept more input types

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you want to do it @sgugger or @patil-suraj

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julien-c can we just add a simple type check in default collator, i.e if the input is dict we can call example.get instead of getattr, or maybe convert the dict to a simple class ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add this this morning with the backward compatibility to the old DataCollator style.

@julien-c julien-c requested a review from sgugger June 15, 2020 15:15
Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

My main question is, have you reproduced the training results that are documented in https://github.com/huggingface/transformers/blob/master/examples/question-answering/README.md ?

If you have, can you link to a webpage on your favorite experiment tracking service (weights and biases, cc @borisdayma:), Tensorboard.dev, etc) for reference?

Also you can just replace run_squad.py instead of creating a new file, but we can do this at the end/just before merging.

Finally, you can add a checkmark to the Big Table Of Tasks

class SquadDataset(Dataset):
"""
This will be superseded by a framework-agnostic approach
soon.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding a note/clarification that this will be replaced by the https://github.com/huggingface/nlp library soon-ish, cc @thomwolf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that will be actually better than this ;)

@borisdayma
Copy link
Contributor

Just in case you wanted to use Weights & Biases, you should just have to do a pip install wandb and it should automatically track everything.

@patil-suraj
Copy link
Contributor Author

My main question is, have you reproduced the training results that are documented in https://github.com/huggingface/transformers/blob/master/examples/question-answering/README.md ?

I didn't train bert-base (just trained for 1 epoch to see if the implementation was working) but instead I used it to train electra-base and it gave better results than mentioned in the paper

In the paper the authors mentioned that electra-base achieves 84.5 EM and 90.8 F1. I was able to achieve 85.05 EM and 91.60 F1. Sadly didn't use wandb, you can find the colab here

It uses the same code, just copy pasted in colab. But if required I can try to reproduce the documented results.

@julien-c
Copy link
Member

I didn't train bert-base (just trained for 1 epoch to see if the implementation was working)

I can do it tomorrow morning, I currently have a V100 on hand:)

@borisdayma
Copy link
Contributor

Just a note that I tried python run_squad_trainer.py --model_name_or_path bert-base-uncased --model_type bert --data_dir squad --output_dir /tmp/debug_squad/ --overwrite_output_dir --do_train --do_eval --evaluate_during_training --logging_steps 100.

For some reason I don't get any evaluation metric during training (I was expecting loss or eval_loss).

@patil-suraj
Copy link
Contributor Author

Just in case you wanted to use Weights & Biases, you should just have to do a pip install wandb and it should automatically track everything.

@borisdayma yes, there are no start and end positions in eval dataset which is why eval loss is not calculated. I will add that. Were you able to see training loss ?
Thanks !

@julien-c
Copy link
Member

julien-c commented Jun 16, 2020

yes, there are no start and end positions in eval dataset which is why eval loss is not calculated. I will add that. Were you able to see training loss ?

Hmm, I'm pretty sure the dev-v1.1.json file has the same labels as the training one (start positions). Otherwise we wouldn't have any eval results at all in the readme. No?

pinging @LysandreJik on this:)

@borisdayma
Copy link
Contributor

@borisdayma yes, there are no start and end positions in eval dataset which is why eval loss is not calculated. I will add that. Were you able to see training loss ?

Yes, training loss was logged.

@LysandreJik
Copy link
Member

@julien-c In the two TensorDatasets created (one for training and one for evaluation), only the training has the correct start_position and end_position.

I believe this is because while the training dataset only has one possible answer per question, the dev and validation datasets both have multiple answers per question (usually different-lengths spans).

@julien-c
Copy link
Member

@LysandreJik So I guess we should update the eval dataset to pick one start_position (or the most frequent one) – how do people do it usually with SQuAD eval, do you know @thomwolf?

Maybe this can be done in a second PR though. Everyone ok with merging this (renaming run_squad_trainer.py to run_squad.py)?

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good!

@sgugger
Copy link
Collaborator

sgugger commented Jun 29, 2020

@patil-suraj Can you resolve the conflicts and switch to the new default_data_collator now that it should work for your dict inputs?
I can take over if you don't have time, but this is the only thing standing in the way of merging this PR.

@patil-suraj
Copy link
Contributor Author

@sgugger Yes, I'll switch to the new data collator.

@patil-suraj
Copy link
Contributor Author

Hi @sgugger, you can take this over, I'm running short on time ;(

@sgugger sgugger merged commit e49393c into huggingface:master Jul 7, 2020
@patil-suraj
Copy link
Contributor Author

Thanks @sgugger :)

@julien-c
Copy link
Member

julien-c commented Jul 7, 2020

@sgugger can you please rename run_squad_trainer.py to run_squad.py? see also #5547

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants