Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from return_tuple to return_dict #6138

Merged
merged 9 commits into from
Jul 30, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 7 additions & 10 deletions docs/source/quicktour.rst
Original file line number Diff line number Diff line change
Expand Up @@ -230,19 +230,16 @@ final activations of the model.
>>> ## PYTORCH CODE
>>> print(pt_outputs)
SequenceClassifierOutput(loss=None, logits=tensor([[-4.0833, 4.3364],
[ 0.0818, -0.0418]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)
(tensor([[-4.0833, 4.3364],
[ 0.0818, -0.0418]], grad_fn=<AddmmBackward>),)
>>> ## TENSORFLOW CODE
>>> print(tf_outputs)
(<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[-4.0832963 , 4.336414 ],
[ 0.08181786, -0.04179301]], dtype=float32)>,)
The model can return more than just the final activations, which is why the PyTorch output is a special class and the
TensorFlow output is a tuple. Here we only asked for the final activations, so we get a tuple with one element on the
TensorFlow side and a :class:`~transformers.modeling_outputs.SequenceClassifierOutput` with just the ``logits`` field
filled on the PyTorch side.

The model can return more than just the final activations, which is why the output is a tuple. Here we only asked for
the final activations, so we get a tuple with one element.
.. note::

All 🤗 Transformers models (PyTorch or TensorFlow) return the activations of the model *before* the final
Expand All @@ -254,7 +251,7 @@ Let's apply the SoftMax activation to get predictions.
>>> ## PYTORCH CODE
>>> import torch.nn.functional as F
>>> pt_predictions = F.softmax(pt_outputs.logits, dim=-1)
>>> pt_predictions = F.softmax(pt_outputs[0], dim=-1)
>>> ## TENSORFLOW CODE
>>> import tensorflow as tf
>>> tf_predictions = tf.nn.softmax(tf_outputs[0], axis=-1)
Expand Down Expand Up @@ -341,8 +338,8 @@ code is easy to access and tweak if you need to.

In our previous example, the model was called "distilbert-base-uncased-finetuned-sst-2-english", which means it's
using the :doc:`DistilBERT </model_doc/distilbert>` architecture. As
:class:`~transformers.AutoModelForSequenceClassification` (or :class:`~transformers.TFAutoModelForSequenceClassification`
if you are using TensorFlow)` was used, the model automatically created is then a
:class:`~transformers.AutoModelForSequenceClassification` (or :class:`~transformers.TFAutoModelForSequenceClassification`
if you are using TensorFlow) was used, the model automatically created is then a
:class:`~transformers.DistilBertForSequenceClassification`. You can look at its documentation for all details relevant
to that specific model, or browse the source code. This is how you would directly instantiate model and tokenizer
without the auto magic:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/training.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ put it in train mode.
.. code-block:: python
from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', return_dict=True)
model.train()
This is useful because it allows us to make use of the pre-trained BERT
Expand Down
2 changes: 1 addition & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Examples

Version 2.9 of 🤗 Transformers introduces a new [`Trainer`](https://github.com/huggingface/transformers/blob/master/src/transformers/trainer.py) class for PyTorch, and its equivalent [`TFTrainer`](https://github.com/huggingface/transformers/blob/master/src/transformers/trainer_tf.py) for TF 2.
Running the examples requires PyTorch 1.3.1+ or TensorFlow 2.1+.
Running the examples requires PyTorch 1.3.1+ or TensorFlow 2.2+.

Here is the list of all our examples:
- **grouped by task** (all official examples work for multiple models)
Expand Down
2 changes: 2 additions & 0 deletions examples/multiple-choice/utils_multiple_choice.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,8 @@ def gen():
)

def get_dataset(self):
self.dataset = self.dataset.apply(tf.data.experimental.assert_cardinality(len(self.features)))

return self.dataset

def __len__(self):
Expand Down
5 changes: 0 additions & 5 deletions examples/question-answering/run_squad.py
Original file line number Diff line number Diff line change
Expand Up @@ -199,9 +199,6 @@ def train(args, train_dataset, model, tokenizer):
{"langs": (torch.ones(batch[0].shape, dtype=torch.int64) * args.lang_id).to(args.device)}
)

if isinstance(model, torch.nn.DataParallel):
inputs["return_tuple"] = True

outputs = model(**inputs)
# model outputs are always tuple in transformers (see doc)
loss = outputs[0]
Expand Down Expand Up @@ -316,8 +313,6 @@ def evaluate(args, model, tokenizer, prefix=""):
inputs.update(
{"langs": (torch.ones(batch[0].shape, dtype=torch.int64) * args.lang_id).to(args.device)}
)
if isinstance(model, torch.nn.DataParallel):
inputs["return_tuple"] = True
outputs = model(**inputs)

for i, feature_index in enumerate(feature_indices):
Expand Down
11 changes: 9 additions & 2 deletions examples/question-answering/run_tf_squad.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@
from dataclasses import dataclass, field
from typing import Optional

import tensorflow as tf

from transformers import (
AutoConfig,
AutoTokenizer,
Expand Down Expand Up @@ -68,6 +70,7 @@ class DataTrainingArguments:
data_dir: Optional[str] = field(
default=None, metadata={"help": "The input data dir. Should contain the .json files for the SQuAD task."}
)
use_tfds: Optional[bool] = field(default=True, metadata={"help": "If TFDS should be used or not."})
max_seq_length: int = field(
default=128,
metadata={
Expand Down Expand Up @@ -170,7 +173,7 @@ def main():
)

# Get datasets
if not data_args.data_dir:
if data_args.use_tfds:
if data_args.version_2_with_negative:
logger.warn("tensorflow_datasets does not handle version 2 of SQuAD. Switch to version 1 automatically")

Expand All @@ -179,7 +182,7 @@ def main():
except ImportError:
raise ImportError("If not data_dir is specified, tensorflow_datasets needs to be installed.")

tfds_examples = tfds.load("squad")
tfds_examples = tfds.load("squad", data_dir=data_args.data_dir)
train_examples = (
SquadV1Processor().get_examples_from_dataset(tfds_examples, evaluate=False)
if training_args.do_train
Expand Down Expand Up @@ -209,6 +212,8 @@ def main():
else None
)

train_dataset = train_dataset.apply(tf.data.experimental.assert_cardinality(len(train_examples)))

eval_dataset = (
squad_convert_examples_to_features(
examples=eval_examples,
Expand All @@ -223,6 +228,8 @@ def main():
else None
)

eval_dataset = eval_dataset.apply(tf.data.experimental.assert_cardinality(len(eval_examples)))

# Initialize our Trainer
trainer = TFTrainer(model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset,)

Expand Down
2 changes: 1 addition & 1 deletion examples/seq2seq/test_seq2seq_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ def test_distill_checkpointing_with_teacher(self):
evaluate_checkpoint(ckpts[0], dest_dir=Path(tempfile.mkdtemp()))

def test_loss_fn(self):
model = AutoModelForSeq2SeqLM.from_pretrained(BART_TINY)
model = AutoModelForSeq2SeqLM.from_pretrained(BART_TINY, return_dict=True)
input_ids, mask = model.dummy_inputs["input_ids"], model.dummy_inputs["attention_mask"]
target_ids = torch.tensor([[0, 4, 8, 2], [0, 8, 2, 1]], dtype=torch.long, device=model.device)
decoder_input_ids = target_ids[:, :-1].contiguous() # Why this line?
Expand Down
27 changes: 22 additions & 5 deletions examples/text-classification/run_tf_glue.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from typing import Dict, Optional

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

from transformers import (
Expand All @@ -35,7 +36,11 @@ class Split(Enum):


def get_tfds(
task_name: str, tokenizer: PreTrainedTokenizer, max_seq_length: Optional[int] = None, mode: Split = Split.train
task_name: str,
tokenizer: PreTrainedTokenizer,
max_seq_length: Optional[int] = None,
mode: Split = Split.train,
data_dir: str = None,
):
if task_name == "mnli-mm" and mode == Split.dev:
tfds_name = "mnli_mismatched"
Expand All @@ -50,9 +55,11 @@ def get_tfds(
else:
tfds_name = task_name

ds = tfds.load("glue/" + tfds_name, split=mode.value)
ds, info = tfds.load("glue/" + tfds_name, split=mode.value, with_info=True, data_dir=data_dir)
ds = glue_convert_examples_to_features(ds, tokenizer, max_seq_length, task_name)
ds = ds.apply(tf.data.experimental.assert_cardinality(info.splits[mode.value].num_examples))

return glue_convert_examples_to_features(ds, tokenizer, max_seq_length, task_name)
return ds


logger = logging.getLogger(__name__)
Expand All @@ -69,6 +76,7 @@ class GlueDataTrainingArguments:
"""

task_name: str = field(metadata={"help": "The name of the task to train on: " + ", ".join(glue_processors.keys())})
data_dir: Optional[str] = field(default=None, metadata={"help": "The input/output data dir for TFDS."})
max_seq_length: int = field(
default=128,
metadata={
Expand Down Expand Up @@ -171,13 +179,22 @@ def main():

# Get datasets
train_dataset = (
get_tfds(task_name=data_args.task_name, tokenizer=tokenizer, max_seq_length=data_args.max_seq_length)
get_tfds(
task_name=data_args.task_name,
tokenizer=tokenizer,
max_seq_length=data_args.max_seq_length,
data_dir=data_args.data_dir,
)
if training_args.do_train
else None
)
eval_dataset = (
get_tfds(
task_name=data_args.task_name, tokenizer=tokenizer, max_seq_length=data_args.max_seq_length, mode=Split.dev
task_name=data_args.task_name,
tokenizer=tokenizer,
max_seq_length=data_args.max_seq_length,
mode=Split.dev,
data_dir=data_args.data_dir,
)
if training_args.do_eval
else None
Expand Down
6 changes: 0 additions & 6 deletions examples/token-classification/run_tf_ner.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@

import logging
import os
import warnings
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Tuple

Expand Down Expand Up @@ -185,11 +184,6 @@ def align_predictions(predictions: np.ndarray, label_ids: np.ndarray) -> Tuple[L

for i in range(batch_size):
for j in range(seq_len):
if label_ids[i, j] == -1:
label_ids[i, j] = -100
warnings.warn(
"Using `-1` to mask the loss for the token is depreciated. Please use `-100` instead."
)
if label_ids[i, j] != -100:
out_label_list[i].append(label_map[label_ids[i][j]])
preds_list[i].append(label_map[preds[i][j]])
Expand Down
4 changes: 3 additions & 1 deletion examples/token-classification/utils_ner.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ class TFNerDataset:
"""

features: List[InputFeatures]
pad_token_label_id: int = -1
pad_token_label_id: int = -100
# Use cross entropy ignore_index as padding label id so that only
# real label ids contribute to the loss later.

Expand Down Expand Up @@ -221,6 +221,8 @@ def gen():
)

def get_dataset(self):
self.dataset = self.dataset.apply(tf.data.experimental.assert_cardinality(len(self.features)))

return self.dataset

def __len__(self):
Expand Down
3 changes: 3 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,7 @@
XLMForTokenClassification,
XLMForQuestionAnswering,
XLMForQuestionAnsweringSimple,
XLMForMultipleChoice,
XLM_PRETRAINED_MODEL_ARCHIVE_LIST,
)
from .modeling_bart import (
Expand Down Expand Up @@ -356,6 +357,8 @@
FlaubertForTokenClassification,
FlaubertForQuestionAnswering,
FlaubertForQuestionAnsweringSimple,
FlaubertForTokenClassification,
FlaubertForMultipleChoice,
FLAUBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
)

Expand Down
15 changes: 8 additions & 7 deletions src/transformers/configuration_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,9 @@ class PretrainedConfig(object):
Whether or not the model should returns all attentions.
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not the model should return the last key/values attentions (not used by all models).
return_tuple (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not the model should return tuples instead of :obj:`ModelOutput` objects.
return_dict (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not the model should return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
is_encoder_decoder (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether the model is used as an encoder/decoder or not.
is_decoder (:obj:`bool`, `optional`, defaults to :obj:`False`):
Expand Down Expand Up @@ -133,7 +134,7 @@ class PretrainedConfig(object):

def __init__(self, **kwargs):
# Attributes with defaults
self.return_tuple = kwargs.pop("return_tuple", False)
self.return_dict = kwargs.pop("return_dict", False)
self.output_hidden_states = kwargs.pop("output_hidden_states", False)
self.output_attentions = kwargs.pop("output_attentions", False)
self.use_cache = kwargs.pop("use_cache", True) # Not used by all models
Expand Down Expand Up @@ -194,12 +195,12 @@ def __init__(self, **kwargs):
raise err

@property
def use_return_tuple(self) -> bool:
def use_return_dict(self) -> bool:
"""
:obj:`bool`: Whether or not the model should return a tuple.
:obj:`bool`: Whether or not return :class:`~transformers.file_utils.ModelOutput` instead of tuples.
"""
# If torchscript is set, force return_tuple to avoid jit errors
return self.return_tuple or self.torchscript
# If torchscript is set, force `return_dict=False` to avoid jit errors
return self.return_dict and not self.torchscript

@property
def num_labels(self) -> int:
Expand Down
Loading