Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[squad] make examples and dataset accessible from SquadDataset object #6710

Merged
merged 2 commits into from
Aug 25, 2020

Conversation

lazovich
Copy link
Contributor

In order to do evaluation on the SQuAD dataset using squad_evaluate, the user needs access to both the examples loaded in the dataset and the TensorDataset that contains values like unique_id and the like that are used in constructing the list of SquadResult objects. This PR surfaces the examples and dataset to the user so that they can access it directly.

For example of why access to those is needed, see how evaluation is currently done in examples/run_squad.py. The SquadDataset object attempts to wrap up some of this functionality, but without access to examples and dataset the evaluation is not possible.

@codecov
Copy link

codecov bot commented Aug 25, 2020

Codecov Report

Merging #6710 into master will decrease coverage by 1.27%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6710      +/-   ##
==========================================
- Coverage   80.06%   78.78%   -1.28%     
==========================================
  Files         156      156              
  Lines       28386    28391       +5     
==========================================
- Hits        22726    22367     -359     
- Misses       5660     6024     +364     
Impacted Files Coverage Δ
src/transformers/data/datasets/squad.py 44.31% <0.00%> (-2.67%) ⬇️
src/transformers/modeling_tf_xlnet.py 21.12% <0.00%> (-71.05%) ⬇️
src/transformers/modeling_roberta.py 77.37% <0.00%> (-19.71%) ⬇️
src/transformers/modeling_tf_utils.py 85.34% <0.00%> (-1.63%) ⬇️
src/transformers/generation_tf_utils.py 85.71% <0.00%> (-0.76%) ⬇️
src/transformers/file_utils.py 82.41% <0.00%> (+0.25%) ⬆️
src/transformers/modeling_utils.py 88.05% <0.00%> (+0.55%) ⬆️
src/transformers/modeling_t5.py 83.83% <0.00%> (+6.20%) ⬆️
src/transformers/modeling_tf_roberta.py 93.22% <0.00%> (+47.80%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0344428...20d0228. Read the comment docs.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but will break existing saved features. I don't think that's too much of an issue though, what do you think @sgugger?

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the breaking change should be handled more carefully: cached dataset for squad are used by the run_squad.py script so a lot of users probably have some and the code will suddenly fail for them.

src/transformers/data/datasets/squad.py Outdated Show resolved Hide resolved
src/transformers/data/datasets/squad.py Outdated Show resolved Hide resolved
src/transformers/data/datasets/squad.py Show resolved Hide resolved
@lazovich
Copy link
Contributor Author

@sgugger @LysandreJik thanks so much for the comments/suggestions! I have updated the code to include support for the legacy cache format. I had a question on one comment, but if there are any other changes needed please let me know.

@lazovich lazovich requested a review from sgugger August 25, 2020 16:42
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your changes, it's good to go now!

@sgugger sgugger merged commit 7e6397a into huggingface:master Aug 25, 2020
Zigur pushed a commit to Zigur/transformers that referenced this pull request Oct 26, 2020
…huggingface#6710)

* [squad] make examples and dataset accessible from SquadDataset object

* [squad] add support for legacy cache files
fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
…huggingface#6710)

* [squad] make examples and dataset accessible from SquadDataset object

* [squad] add support for legacy cache files
fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants