Tensorflow improvements #4530

jplu · 2020-05-22T22:45:50Z

Hello,

Here a quite big PR that propose the following updates:

Loss computation is now attached to their respective class, such as PyTorch.
Remove useless mode and loss_name parameters for the TF Trainer.
Add missing task models to different Transformers
Bugfix on T5 keras serialization + tests
Add tests for TF Flaubert and XLM-Roberta
Bugfix in TF Trainer for Tensorflow 2.2

Reviews are welcome :)

/cc @julien-c @LysandreJik @thomwolf

codecov-commenter · 2020-05-22T22:52:09Z

Codecov Report

Merging #4530 into master will increase coverage by 0.38%.
The diff coverage is 41.45%.

@@            Coverage Diff             @@
##           master    #4530      +/-   ##
==========================================
+ Coverage   75.63%   76.01%   +0.38%     
==========================================
  Files         128      128              
  Lines       20979    21417     +438     
==========================================
+ Hits        15867    16280     +413     
- Misses       5112     5137      +25

Impacted Files	Coverage Δ
src/transformers/data/processors/squad.py	`28.66% <ø> (ø)`
src/transformers/training_args_tf.py	`51.16% <ø> (-4.16%)`	⬇️
src/transformers/trainer_tf.py	`18.86% <17.94%> (+0.94%)`	⬆️
src/transformers/modeling_tf_xlm.py	`76.10% <27.47%> (-14.30%)`	⬇️
src/transformers/modeling_tf_xlnet.py	`80.53% <27.50%> (-9.80%)`	⬇️
src/transformers/modeling_tf_distilbert.py	`82.88% <32.00%> (-12.24%)`	⬇️
src/transformers/modeling_tf_roberta.py	`74.74% <34.21%> (-25.26%)`	⬇️
src/transformers/modeling_tf_electra.py	`91.17% <38.70%> (-7.89%)`	⬇️
src/transformers/modeling_tf_albert.py	`75.39% <45.45%> (-3.30%)`	⬇️
src/transformers/modeling_tf_utils.py	`87.20% <50.00%> (-1.60%)`	⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d976ef2...5b456e2. Read the comment docs.

jplu · 2020-05-22T22:59:01Z

Some commits are missing... I think it is due to the high number of error rate from Github.

julien-c · 2020-05-27T01:50:47Z

src/transformers/modeling_tf_auto.py

        if not isinstance(config, PretrainedConfig):
            config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)

        for config_class, model_class in TF_MODEL_WITH_LM_HEAD_MAPPING.items():
-            if isinstance(config, config_class):
+            # Not using isinstance() here to do not take into account inheritance
+            if config_class == type(config):


in pytorch the different configs are sorted so that you never get a child class before its parent (precisely to prevent this), but this is reasonable solution too

julien-c · 2020-05-27T01:57:36Z

src/transformers/modeling_tf_utils.py

+        return loss_fn(labels, reduced_logits)
+
+
+class TFSequenceClassificationAndMultipleChoiceLoss:


should we split into two different classes?

My thought was, as it is the exact same loss computation why not merge the two names, but your proposal might be more insightful indeed.

Maybe just alias one to the other or do a trivial sub-class

Very good point, I will do the update

Should be ok now.

LysandreJik

This is great work, love the added flexibility to the API and how it similar to our Pytorch model's API it can be. I like the coding style.

I find this is a bit different to the PyTorch API as:

it uses Mixins, and I'm okay with them as I think the readability is still good, while it does add a (maybe unnecessary?) layer of abstraction. It does greatly improve code sharing across models though, which is welcome.
Loss isn't computed when passing labels, but by directly calling model.compute_loss(x, y)

I'm not opposed to the first point, but a bit more to the second point. Is there something that prevents using labels in TensorFlow as we do it in PyTorch? As we're aiming at API compatibility, I think this is something we should get right.

LysandreJik · 2020-05-29T14:27:40Z

src/transformers/modeling_tf_albert.py

-            print("isdict(1)")
            input_ids = inputs.get("input_ids")
-            print(input_ids)
-


Nice catch!

jplu · 2020-05-29T15:30:42Z

Thanks @LysandreJik for your constructive comments!

For the second point, before to answer in order to be sure, you mean that it would be more convenient that the output of the call(...) methods in the TF tasks model returns the same tuple (loss), logits, (hidden_states), (attentions) than the forward(...) methods in PT tasks model?

LysandreJik · 2020-05-29T15:37:51Z

Yes, that's what I mean. I think having this to be the same as the PyTorch API would make sense. It wouldn't be a breaking change either, as it would require the labels to be passed to the model.

I think doing this could still leverage Mixins, by calling a self._compute_loss or self.compute_loss if we want to expose this method as well. I have no strong opinion on that last item.

jplu · 2020-05-29T15:41:39Z

Ok, indeed makes sense and I don't think it is a problem to do that way, I will work on this today to see if there is any issue that would not allow us to do that.

julien-c · 2020-06-01T10:01:17Z

I agree with @LysandreJik's 2nd point – maybe we can even take advantage of this to implement named tuples for TF models output, like @thomwolf and @patrickvonplaten intend to do for PyTorch (as it's going to be a breaking change in TF models anyways, maybe we can do this at the same time?)

jplu · 2020-06-01T10:27:48Z

Since my last commit, now the TF models return the loss such as the PT ones if the labels are given.

About the named tuples, looks to be a good idea indeed, but I think we should implement this in another PR in order to release this in same time than for PT. No?

julien-c · 2020-06-01T10:41:50Z

About the named tuples [...] we should implement this in another PR in order to release this in same time than for PT. No?

Yes, makes sense!

jplu · 2020-06-01T12:11:30Z

Ok, looks good to me, I have tested the new models with different examples that use the trainer and they all work, tests looks to be ok as well except the quality one that I don't know how to fix 😄

…ls to TF Flaubert

…eters for the QuestionAnswering task models.

patrickvonplaten · 2020-06-03T09:55:07Z

src/transformers/modeling_tf_t5.py

@@ -25,7 +25,8 @@

 from .configuration_t5 import T5Config
 from .file_utils import DUMMY_INPUTS, DUMMY_MASK, add_start_docstrings, add_start_docstrings_to_callable
-from .modeling_tf_utils import TFPreTrainedModel, TFSharedEmbeddings, shape_list
+from .modeling_tf_utils import TFPreTrainedModel, TFSharedEmbeddings, keras_serializable, shape_list


Thanks for the changes here! Looks good to me

patrickvonplaten · 2020-06-03T09:55:58Z

src/transformers/modeling_tf_transfo_xl.py

@@ -734,7 +734,7 @@ def call(self, inputs, **kwargs):
        return outputs


-class TFTransfoXLLMHead(tf.keras.layers.Layer):
+class TFTransfoXLWithLMHeadModel(tf.keras.layers.Layer):


This is a breaking change, no? Maybe we want to add an alias for TFTransfoXLLMHead for backward compatibility @LysandreJik

It is my bad, I just done a commit to rename it.

patrickvonplaten · 2020-06-03T09:57:09Z

src/transformers/optimization_tf.py

-    )
-
-    return optimizer
+    return optimizer, lr_schedule


This is also a breaking change - we should document this well so that the user knows

Indeed, this is not backwards compatible. @jplu, do you expect this method to be currently used by users outside of the Trainer? Would this breaking change impact those users?

Honestly no, I not expecting users to use this outside the TF Trainer. Also the TF trainer has been updated to use this new return format, such as the PT one. Including the examples.

Right, I thought so too!

patrickvonplaten · 2020-06-03T10:03:59Z

examples/benchmarking/plot_csv_file.py

@@ -3,9 +3,9 @@
 from dataclasses import dataclass, field
 from typing import Optional

+import matplotlib.pyplot as plt


I had problems with isort on this file as well :D I think you might just want to reverse this change manually to fix isort. It seems like you also have the wrong isort version...quite annoying this isort bug

Done ! And thanks for the hint 😄

patrickvonplaten · 2020-06-03T10:20:03Z

A more general question regarding training in TensorFlow (I'm not super familiar with TF 2.0 training, so I'm asking primarily to learn a bit :-) ):
I remember that when TF 2.0 was not out, most people used Keras to train a model with
model.fit(x_train, y_train) => is this still the case?
or are people more and more switching to the TF 2.0 training style as shown here: https://www.tensorflow.org/tutorials/quickstart/advanced and which basically consists of using
optimizer.apply_gradients(zip(gradients, model.trainable_variables)). This is also what we do in the TF trainer right?

Was it possible and recommended to train transformer models with keras' model.train() before TF Trainer and is it still possible now?

jplu · 2020-06-03T10:27:08Z

This is a good question! Short answer: yes it is still possible but witthout any gradient accumulation, that's mostly why the trainer uses the advanced training of TensorFlow.

I'm currently preparing a next PR that will integrate the new Model.train_step feature added in TF 2.2. Basically this update allows you to create your own train step, and then integrate the missing gradient accumulation but this new PR will be only for TF >= 2.2.

LysandreJik · 2020-06-03T17:41:15Z

@patrickvonplaten It was possible and we definitely aim to keep compatibility with keras' fit method. We don't have many tutorials that cover it, though, having some would probably make it easier for new users coming from Keras to use our lib.

@julien-c, we've had the offline approval from @thomwolf, feel free to merge when you want. Glad to welcome this in the library!

julien-c · 2020-06-04T23:41:14Z

Just tweaked the training_args.logging_dir to keep the same default as pytorch (I like that it creates a new subfolder each time you relaunch a training)

Great job @jplu, thank you 💪

jplu requested review from julien-c, thomwolf and LysandreJik May 22, 2020 22:53

jplu marked this pull request as draft May 25, 2020 18:41

julien-c reviewed May 27, 2020

View reviewed changes

patrickvonplaten self-requested a review May 27, 2020 15:16

jplu marked this pull request as ready for review May 28, 2020 15:17

LysandreJik reviewed May 29, 2020

View reviewed changes

jplu force-pushed the add-tf-losses branch from c429ee4 to 18e0464 Compare June 1, 2020 09:28

jplu added 11 commits June 2, 2020 17:27

Better None gradients handling

df32b75

Apply Style

a401466

Apply Style

20a25f5

Create a loss class per task to compute its respective loss

74ddd2d

Add loss classes to the ALBERT TF models

399d409

Add loss classes to the BERT TF models

51ec062

Add question answering and multiple choice to TF Camembert

304c48d

Remove prints

737aa64

Add multiple choice model to TF DistilBERT + loss computation

0c65d78

Add question answering model to TF Electra + loss computation

8f0be05

Add token classification, question answering and multiple choice mode…

1b5801a

…ls to TF Flaubert

jplu added 11 commits June 2, 2020 17:46

Fix TF optimizations tests and apply style

1ad9685

Remove useless parameter

d727ef0

Bugfix and apply style

bfbc968

Fix TF Trainer prediction

695ae86

Now the TF models return the loss such as their PyTorch couterparts

435a9d3

Apply Style

6a17087

Ignore some tests output

bbe8f4c

Take into account the SQuAD cls_index, p_mask and is_impossible param…

9eb791f

…eters for the QuestionAnswering task models.

Fix names for SQuAD data

fcbe02a

Apply Style

ec29f29

Fix conflicts with 2.11 release

bc6397b

jplu force-pushed the add-tf-losses branch from 6938221 to bc6397b Compare June 2, 2020 15:54

Fix conflicts with 2.11

23acaca

patrickvonplaten reviewed Jun 3, 2020

View reviewed changes

jplu added 2 commits June 3, 2020 12:00

Fix wrongname

1515ee5

Add better documentation on the new create_optimizer function

042da5f

patrickvonplaten reviewed Jun 3, 2020

View reviewed changes

Fix isort

364ebde

logging_dir: use same default as PyTorch

5b456e2

julien-c merged commit f9414f7 into huggingface:master Jun 4, 2020

jplu deleted the add-tf-losses branch June 6, 2020 07:49

daniel-shan mentioned this pull request Jun 6, 2020

Updates args in tf squad example. #4820

Merged

LysandreJik mentioned this pull request Jun 8, 2020

TFXLMRobertaForSequenceClassification: call() got an unexpected keyword argument 'labels' #4848

Closed

1 task

jarednielsen mentioned this pull request Jun 9, 2020

Add masked_lm_labels argument to TFAlbertForMaskedLM #2848

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorflow improvements #4530

Tensorflow improvements #4530

jplu commented May 22, 2020 •

edited

codecov-commenter commented May 22, 2020 •

edited by codecov bot

jplu commented May 22, 2020

julien-c May 27, 2020

julien-c May 27, 2020

jplu May 27, 2020

julien-c May 27, 2020

jplu May 27, 2020

jplu May 27, 2020

LysandreJik left a comment

LysandreJik May 29, 2020

jplu commented May 29, 2020

LysandreJik commented May 29, 2020

jplu commented May 29, 2020

julien-c commented Jun 1, 2020

jplu commented Jun 1, 2020

julien-c commented Jun 1, 2020

jplu commented Jun 1, 2020

patrickvonplaten Jun 3, 2020

patrickvonplaten Jun 3, 2020

jplu Jun 3, 2020

patrickvonplaten Jun 3, 2020

jplu Jun 3, 2020

LysandreJik Jun 3, 2020

jplu Jun 3, 2020 •

edited

LysandreJik Jun 3, 2020

patrickvonplaten Jun 3, 2020 •

edited

jplu Jun 3, 2020

patrickvonplaten commented Jun 3, 2020 •

edited

jplu commented Jun 3, 2020

LysandreJik commented Jun 3, 2020

julien-c commented Jun 4, 2020

		return loss_fn(labels, reduced_logits)


		class TFSequenceClassificationAndMultipleChoiceLoss:

Tensorflow improvements #4530

Tensorflow improvements #4530

Conversation

jplu commented May 22, 2020 • edited

codecov-commenter commented May 22, 2020 • edited by codecov bot

Codecov Report

jplu commented May 22, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jplu commented May 29, 2020

LysandreJik commented May 29, 2020

jplu commented May 29, 2020

julien-c commented Jun 1, 2020

jplu commented Jun 1, 2020

julien-c commented Jun 1, 2020

jplu commented Jun 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jplu Jun 3, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten Jun 3, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten commented Jun 3, 2020 • edited

jplu commented Jun 3, 2020

LysandreJik commented Jun 3, 2020

julien-c commented Jun 4, 2020

jplu commented May 22, 2020 •

edited

codecov-commenter commented May 22, 2020 •

edited by codecov bot

jplu Jun 3, 2020 •

edited

patrickvonplaten Jun 3, 2020 •

edited

patrickvonplaten commented Jun 3, 2020 •

edited