-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[examples] Add trainer support for question-answering #4829
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4829 +/- ##
==========================================
+ Coverage 76.84% 77.84% +1.00%
==========================================
Files 141 142 +1
Lines 24685 24768 +83
==========================================
+ Hits 18969 19281 +312
+ Misses 5716 5487 -229
Continue to review full report at Codecov.
|
Hi @patil-suraj, I think @julien-c can answer questions regarding the Trainer better :-) |
for key in keys: | ||
inputs[key] = torch.stack([example[key] for example in batch]) | ||
|
||
return inputs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty clean implementation of data collator. cc @sgugger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, quick question though, why does the default not work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think it's ~identical to the default one so maybe we can merge them and just use the default here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default collator needs List[InputDataClass]
but I'm returning List[Dict[str, torch.Tensor]]
so I was not able to use the default one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InputDataClass is just a type alias for anything. I think the default should work fine for you (but we can change its implementation to use this).
Note that #5015 will change how data collator work a little bit and a few names, so we'll need a few adjustments here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InputDataClass is just a type alias for anything.
Yes, but I think it assumes that it will be class as it uses getattr
and vars
. How should I proceed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can change the default implem to accept more input types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you want to do it @sgugger or @patil-suraj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@julien-c can we just add a simple type check in default collator, i.e if the input is dict
we can call example.get
instead of getattr
, or maybe convert the dict
to a simple class
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add this this morning with the backward compatibility to the old DataCollator
style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
My main question is, have you reproduced the training results that are documented in https://github.com/huggingface/transformers/blob/master/examples/question-answering/README.md ?
If you have, can you link to a webpage on your favorite experiment tracking service (weights and biases, cc @borisdayma:), Tensorboard.dev, etc) for reference?
Also you can just replace run_squad.py instead of creating a new file, but we can do this at the end/just before merging.
Finally, you can add a checkmark to the Big Table Of Tasks
class SquadDataset(Dataset): | ||
""" | ||
This will be superseded by a framework-agnostic approach | ||
soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding a note/clarification that this will be replaced by the https://github.com/huggingface/nlp library soon-ish, cc @thomwolf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that will be actually better than this ;)
Just in case you wanted to use Weights & Biases, you should just have to do a |
I didn't train In the paper the authors mentioned that electra-base achieves 84.5 EM and 90.8 F1. I was able to achieve 85.05 EM and 91.60 F1. Sadly didn't use wandb, you can find the colab here It uses the same code, just copy pasted in colab. But if required I can try to reproduce the documented results. |
I can do it tomorrow morning, I currently have a V100 on hand:) |
Just a note that I tried For some reason I don't get any evaluation metric during training (I was expecting |
@borisdayma yes, there are no start and end positions in eval dataset which is why eval loss is not calculated. I will add that. Were you able to see training loss ? |
Hmm, I'm pretty sure the dev-v1.1.json file has the same labels as the training one (start positions). Otherwise we wouldn't have any eval results at all in the readme. No? pinging @LysandreJik on this:) |
Yes, training loss was logged. |
@julien-c In the two I believe this is because while the training dataset only has one possible answer per question, the dev and validation datasets both have multiple answers per question (usually different-lengths spans). |
@LysandreJik So I guess we should update the eval dataset to pick one start_position (or the most frequent one) – how do people do it usually with SQuAD eval, do you know @thomwolf? Maybe this can be done in a second PR though. Everyone ok with merging this (renaming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sounds good!
@patil-suraj Can you resolve the conflicts and switch to the new |
@sgugger Yes, I'll switch to the new data collator. |
Hi @sgugger, you can take this over, I'm running short on time ;( |
Thanks @sgugger :) |
This PR adds trainer support for question-answering task. Regarding issue #4784
TODOs
tfds
because I think it will be soon replaced bynlp
here@julien-c @patrickvonplaten