Switch from return_tuple to return_dict #6138

sgugger · 2020-07-29T17:03:44Z

This is the first step in the change of model outputs as described on the forum.

This PR removes the argument return_tuple and introduces return_dict (that works the other way round) and all models now return tuple by default (100% full backward compatibility) unless you opt-in the new model output types with return_dict=True. The model output class is changed to the dict-like one that should work equally well for TensorFlow.

I have normally updated all examples in the docs to instantiate the model with return_dict=True but more docs will follow in other PRs. For the tests, I have set return_dict=True in one of the common tests just to make sure it actually works. Step 2 (in a follow-up PR) will be to use it in all tests.

Step 3 is then going to update the TensorFlow models to use this ModelOutput.

sgugger · 2020-07-29T17:06:07Z

src/transformers/modeling_camembert.py

@@ -51,12 +51,6 @@
            model. Initializing with a config file does not load the weights associated with the model, only the
            configuration.
            Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
-        output_attentions (:obj:`bool`, `optional`, defaults to :obj:`None`):


This is the documentation of the init, not the forward, so this shouldn't have been added here.

sgugger · 2020-07-29T17:06:37Z

src/transformers/modeling_xlm_roberta.py

@@ -53,12 +53,6 @@
        config (:class:`~transformers.XLMRobertaConfig`): Model configuration class with all the parameters of the
            model. Initializing with a config file does not load the weights associated with the model, only the configuration.
            Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
-        output_attentions (:obj:`bool`, `optional`, defaults to :obj:`None`):


This is the documentation of the init, not the forward, so this shouldn't have been added here.

sgugger · 2020-07-29T17:07:13Z

src/transformers/trainer.py

@@ -661,9 +661,7 @@ def _prepare_inputs(

        if self.args.past_index >= 0 and self._past is not None:
            inputs["mems"] = self._past
-        # Our model outputs do not work with DataParallel, so forcing return tuple.


The new model output works with DataParallel, so no precautions needed anymore.

codecov · 2020-07-29T17:22:21Z

Codecov Report

Merging #6138 into master will increase coverage by 0.99%.
The diff coverage is 71.96%.

@@            Coverage Diff             @@
##           master    #6138      +/-   ##
==========================================
+ Coverage   78.49%   79.48%   +0.99%     
==========================================
  Files         146      146              
  Lines       26335    26441     +106     
==========================================
+ Hits        20671    21017     +346     
+ Misses       5664     5424     -240

Impacted Files	Coverage Δ
src/transformers/__init__.py	`99.24% <ø> (ø)`
src/transformers/modeling_auto.py	`78.48% <ø> (ø)`
src/transformers/modeling_camembert.py	`100.00% <ø> (ø)`
src/transformers/modeling_encoder_decoder.py	`92.20% <ø> (ø)`
src/transformers/modeling_xlm_roberta.py	`100.00% <ø> (ø)`
src/transformers/trainer.py	`40.96% <ø> (-0.04%)`	⬇️
src/transformers/training_args_tf.py	`47.45% <0.00%> (ø)`
src/transformers/trainer_tf.py	`16.54% <8.66%> (+0.06%)`	⬆️
src/transformers/modeling_mmbt.py	`23.47% <11.11%> (-0.63%)`	⬇️
src/transformers/file_utils.py	`80.56% <68.08%> (-1.65%)`	⬇️
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a8ae27...60928b0. Read the comment docs.

@sgugger

…tipleC… (#5614) * Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests * AutoModels Tiny tweaks * Style * Final changes before merge * Re-order for simpler review * Final fixes * Addressing @sgugger's comments * Test MultipleChoice

* Fully rework training/prediction loops * fix method name * Fix variable name * Fix property name * Fix scope * Fix method name * Fix tuple index * Fix tuple index * Fix indentation * Fix variable name * fix eval before log * Add drop remainder for test dataset * Fix step number + fix logging datetime * fix eval loss value * use global step instead of step + fix logging at step 0 * Fix logging datetime * Fix global_step usage * Fix breaking loop + logging datetime * Fix step in prediction loop * Fix step breaking * Fix train/test loops * Force TF at least 2.2 for the trainer * Use assert_cardinality to facilitate the dataset size computation * Log steps per epoch * Make tfds compliant with TPU * Make tfds compliant with TPU * Use TF dataset enumerate instead of the Python one * revert previous commit * Fix data_dir * Apply style * rebase on master * Address Sylvain's comments * Address Sylvain's and Lysandre comments * Trigger CI * Remove unused import

LysandreJik

LGTM, thanks for taking care of it!

I think we'll have to take care of the XLMForMultipleChoice model which was added when you were coding this. Sorry about that 😬

…el_output

thomwolf

Great!

g-karthik · 2020-09-22T00:08:30Z

@sgugger thanks very much for this PR!

return_dict seems to work with the from_pretrained() method for models, but what if I didn't want to use from_pretrained() and simply instantiated the model from scratch as follows:

config_class = GPT2Config
model_class = GPT2DoubleHeadsModel
config = config_class.from_pretrained("gpt2")
model = model_class(config)

I still want to be able to use return_dict. How would I go about doing that?

It looks like I could pass return_dict explicitly in the forward() for the from-scratch case. However, I want the forward() call in my code to be consistent across the from-scratch and the from_pretrained() settings, in order to decouple the model instantiation from the actual trainer loop.

How should this be handled?

Would the solution be something like this:

config_class = GPT2Config
model_class = GPT2DoubleHeadsModel
config = config_class.from_pretrained("gpt2", use_return_dict=True)
model = model_class(config)

I tried this solution but it didn't work, it gave me the following error:

>>> from transformers import GPT2Config
>>> config = GPT2Config.from_pretrained("gpt2", use_return_dict=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/transformers/configuration_utils.py", line 312, in from_pretrained
    return cls.from_dict(config_dict, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/configuration_utils.py", line 406, in from_dict
    setattr(config, key, value)
AttributeError: can't set attribute

sgugger · 2020-09-22T00:21:26Z

The right line is:

config = config_class.from_pretrained("gpt2", return_dict=True)

use_return_dict is an inner attribute that combines return_dict and torchscript (since torchscript is incompatible with return_dict=True)

Switch from return_tuple to return_dict

b94e78f

sgugger requested review from thomwolf and LysandreJik July 29, 2020 17:03

Merge branch 'master' into dict_model_output

eff3526

sgugger commented Jul 29, 2020

View reviewed changes

Fix test

d7b454b

LysandreJik and others added 2 commits July 29, 2020 14:26

LysandreJik approved these changes Jul 29, 2020

View reviewed changes

sgugger added 4 commits July 29, 2020 15:51

Switch from return_tuple to return_dict

06f4374

Fix test

a71a70c

Add recent model

c82e57d

Merge remote-tracking branch 'origin/dict_model_output' into dict_mod…

60928b0

…el_output

sgugger mentioned this pull request Jul 29, 2020

[ModelOutput] Proposal to fix compatibility issue with torch.DataParallel #5740

Closed

thomwolf approved these changes Jul 30, 2020

View reviewed changes

sgugger merged commit 91cb954 into master Jul 30, 2020

sgugger deleted the dict_model_output branch July 30, 2020 13:17

sgugger mentioned this pull request Jul 30, 2020

Model output test #6155

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from return_tuple to return_dict #6138

Switch from return_tuple to return_dict #6138

sgugger commented Jul 29, 2020

sgugger Jul 29, 2020

sgugger Jul 29, 2020

sgugger Jul 29, 2020

codecov bot commented Jul 29, 2020 •

edited

Loading

LysandreJik left a comment

thomwolf left a comment

g-karthik commented Sep 22, 2020 •

edited

Loading

sgugger commented Sep 22, 2020

Switch from return_tuple to return_dict #6138

Switch from return_tuple to return_dict #6138

Conversation

sgugger commented Jul 29, 2020

sgugger Jul 29, 2020

Choose a reason for hiding this comment

sgugger Jul 29, 2020

Choose a reason for hiding this comment

sgugger Jul 29, 2020

Choose a reason for hiding this comment

codecov bot commented Jul 29, 2020 • edited Loading

Codecov Report

LysandreJik left a comment

Choose a reason for hiding this comment

thomwolf left a comment

Choose a reason for hiding this comment

g-karthik commented Sep 22, 2020 • edited Loading

sgugger commented Sep 22, 2020

codecov bot commented Jul 29, 2020 •

edited

Loading

g-karthik commented Sep 22, 2020 •

edited

Loading