Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from return_tuple to return_dict #6138

Merged
merged 9 commits into from
Jul 30, 2020
Merged

Switch from return_tuple to return_dict #6138

merged 9 commits into from
Jul 30, 2020

Conversation

sgugger
Copy link
Collaborator

@sgugger sgugger commented Jul 29, 2020

This is the first step in the change of model outputs as described on the forum.

This PR removes the argument return_tuple and introduces return_dict (that works the other way round) and all models now return tuple by default (100% full backward compatibility) unless you opt-in the new model output types with return_dict=True. The model output class is changed to the dict-like one that should work equally well for TensorFlow.

I have normally updated all examples in the docs to instantiate the model with return_dict=True but more docs will follow in other PRs. For the tests, I have set return_dict=True in one of the common tests just to make sure it actually works. Step 2 (in a follow-up PR) will be to use it in all tests.

Step 3 is then going to update the TensorFlow models to use this ModelOutput.

@@ -51,12 +51,6 @@
model. Initializing with a config file does not load the weights associated with the model, only the
configuration.
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
output_attentions (:obj:`bool`, `optional`, defaults to :obj:`None`):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the documentation of the init, not the forward, so this shouldn't have been added here.

@@ -53,12 +53,6 @@
config (:class:`~transformers.XLMRobertaConfig`): Model configuration class with all the parameters of the
model. Initializing with a config file does not load the weights associated with the model, only the configuration.
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
output_attentions (:obj:`bool`, `optional`, defaults to :obj:`None`):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the documentation of the init, not the forward, so this shouldn't have been added here.

@@ -661,9 +661,7 @@ def _prepare_inputs(

if self.args.past_index >= 0 and self._past is not None:
inputs["mems"] = self._past
# Our model outputs do not work with DataParallel, so forcing return tuple.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new model output works with DataParallel, so no precautions needed anymore.

@codecov
Copy link

codecov bot commented Jul 29, 2020

Codecov Report

Merging #6138 into master will increase coverage by 0.99%.
The diff coverage is 71.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6138      +/-   ##
==========================================
+ Coverage   78.49%   79.48%   +0.99%     
==========================================
  Files         146      146              
  Lines       26335    26441     +106     
==========================================
+ Hits        20671    21017     +346     
+ Misses       5664     5424     -240     
Impacted Files Coverage Δ
src/transformers/__init__.py 99.24% <ø> (ø)
src/transformers/modeling_auto.py 78.48% <ø> (ø)
src/transformers/modeling_camembert.py 100.00% <ø> (ø)
src/transformers/modeling_encoder_decoder.py 92.20% <ø> (ø)
src/transformers/modeling_xlm_roberta.py 100.00% <ø> (ø)
src/transformers/trainer.py 40.96% <ø> (-0.04%) ⬇️
src/transformers/training_args_tf.py 47.45% <0.00%> (ø)
src/transformers/trainer_tf.py 16.54% <8.66%> (+0.06%) ⬆️
src/transformers/modeling_mmbt.py 23.47% <11.11%> (-0.63%) ⬇️
src/transformers/file_utils.py 80.56% <68.08%> (-1.65%) ⬇️
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a8ae27...60928b0. Read the comment docs.

LysandreJik and others added 2 commits July 29, 2020 14:26
…tipleC… (#5614)

* Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests

* AutoModels


Tiny tweaks

* Style

* Final changes before merge

* Re-order for simpler review

* Final fixes

* Addressing @sgugger's comments

* Test MultipleChoice
* Fully rework training/prediction loops

* fix method name

* Fix variable name

* Fix property name

* Fix scope

* Fix method name

* Fix tuple index

* Fix tuple index

* Fix indentation

* Fix variable name

* fix eval before log

* Add drop remainder for test dataset

* Fix step number + fix logging datetime

* fix eval loss value

* use global step instead of step + fix logging at step 0

* Fix logging datetime

* Fix global_step usage

* Fix breaking loop + logging datetime

* Fix step in prediction loop

* Fix step breaking

* Fix train/test loops

* Force TF at least 2.2 for the trainer

* Use assert_cardinality to facilitate the dataset size computation

* Log steps per epoch

* Make tfds compliant with TPU

* Make tfds compliant with TPU

* Use TF dataset enumerate instead of the Python one

* revert previous commit

* Fix data_dir

* Apply style

* rebase on master

* Address Sylvain's comments

* Address Sylvain's and Lysandre comments

* Trigger CI

* Remove unused import
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for taking care of it!

I think we'll have to take care of the XLMForMultipleChoice model which was added when you were coding this. Sorry about that 😬

Copy link
Member

@thomwolf thomwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@sgugger sgugger merged commit 91cb954 into master Jul 30, 2020
@sgugger sgugger deleted the dict_model_output branch July 30, 2020 13:17
@sgugger sgugger mentioned this pull request Jul 30, 2020
@g-karthik
Copy link

g-karthik commented Sep 22, 2020

@sgugger thanks very much for this PR!

return_dict seems to work with the from_pretrained() method for models, but what if I didn't want to use from_pretrained() and simply instantiated the model from scratch as follows:

config_class = GPT2Config
model_class = GPT2DoubleHeadsModel
config = config_class.from_pretrained("gpt2")
model = model_class(config)

I still want to be able to use return_dict. How would I go about doing that?

It looks like I could pass return_dict explicitly in the forward() for the from-scratch case. However, I want the forward() call in my code to be consistent across the from-scratch and the from_pretrained() settings, in order to decouple the model instantiation from the actual trainer loop.

How should this be handled?

Would the solution be something like this:

config_class = GPT2Config
model_class = GPT2DoubleHeadsModel
config = config_class.from_pretrained("gpt2", use_return_dict=True)
model = model_class(config)

I tried this solution but it didn't work, it gave me the following error:

>>> from transformers import GPT2Config
>>> config = GPT2Config.from_pretrained("gpt2", use_return_dict=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/transformers/configuration_utils.py", line 312, in from_pretrained
    return cls.from_dict(config_dict, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/configuration_utils.py", line 406, in from_dict
    setattr(config, key, value)
AttributeError: can't set attribute

@sgugger
Copy link
Collaborator Author

sgugger commented Sep 22, 2020

The right line is:

config = config_class.from_pretrained("gpt2", return_dict=True)

use_return_dict is an inner attribute that combines return_dict and torchscript (since torchscript is incompatible with return_dict=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants