New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensorflow improvements #4530
Tensorflow improvements #4530
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4530 +/- ##
==========================================
+ Coverage 75.63% 76.01% +0.38%
==========================================
Files 128 128
Lines 20979 21417 +438
==========================================
+ Hits 15867 16280 +413
- Misses 5112 5137 +25
Continue to review full report at Codecov.
|
Some commits are missing... I think it is due to the high number of error rate from Github. |
if not isinstance(config, PretrainedConfig): | ||
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs) | ||
|
||
for config_class, model_class in TF_MODEL_WITH_LM_HEAD_MAPPING.items(): | ||
if isinstance(config, config_class): | ||
# Not using isinstance() here to do not take into account inheritance | ||
if config_class == type(config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in pytorch the different configs are sorted so that you never get a child class before its parent (precisely to prevent this), but this is reasonable solution too
return loss_fn(labels, reduced_logits) | ||
|
||
|
||
class TFSequenceClassificationAndMultipleChoiceLoss: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we split into two different classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was, as it is the exact same loss computation why not merge the two names, but your proposal might be more insightful indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just alias one to the other or do a trivial sub-class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point, I will do the update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be ok now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work, love the added flexibility to the API and how it similar to our Pytorch model's API it can be. I like the coding style.
I find this is a bit different to the PyTorch API as:
- it uses Mixins, and I'm okay with them as I think the readability is still good, while it does add a (maybe unnecessary?) layer of abstraction. It does greatly improve code sharing across models though, which is welcome.
- Loss isn't computed when passing labels, but by directly calling
model.compute_loss(x, y)
I'm not opposed to the first point, but a bit more to the second point. Is there something that prevents using labels
in TensorFlow as we do it in PyTorch? As we're aiming at API compatibility, I think this is something we should get right.
print("isdict(1)") | ||
input_ids = inputs.get("input_ids") | ||
print(input_ids) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
Thanks @LysandreJik for your constructive comments! For the second point, before to answer in order to be sure, you mean that it would be more convenient that the output of the |
Yes, that's what I mean. I think having this to be the same as the PyTorch API would make sense. It wouldn't be a breaking change either, as it would require the I think doing this could still leverage Mixins, by calling a |
Ok, indeed makes sense and I don't think it is a problem to do that way, I will work on this today to see if there is any issue that would not allow us to do that. |
I agree with @LysandreJik's 2nd point – maybe we can even take advantage of this to implement named tuples for TF models output, like @thomwolf and @patrickvonplaten intend to do for PyTorch (as it's going to be a breaking change in TF models anyways, maybe we can do this at the same time?) |
Since my last commit, now the TF models return the loss such as the PT ones if the labels are given. About the named tuples, looks to be a good idea indeed, but I think we should implement this in another PR in order to release this in same time than for PT. No? |
Yes, makes sense! |
Ok, looks good to me, I have tested the new models with different examples that use the trainer and they all work, tests looks to be ok as well except the quality one that I don't know how to fix 😄 |
…ls to TF Flaubert
…eters for the QuestionAnswering task models.
@@ -25,7 +25,8 @@ | |||
|
|||
from .configuration_t5 import T5Config | |||
from .file_utils import DUMMY_INPUTS, DUMMY_MASK, add_start_docstrings, add_start_docstrings_to_callable | |||
from .modeling_tf_utils import TFPreTrainedModel, TFSharedEmbeddings, shape_list | |||
from .modeling_tf_utils import TFPreTrainedModel, TFSharedEmbeddings, keras_serializable, shape_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes here! Looks good to me
@@ -734,7 +734,7 @@ def call(self, inputs, **kwargs): | |||
return outputs | |||
|
|||
|
|||
class TFTransfoXLLMHead(tf.keras.layers.Layer): | |||
class TFTransfoXLWithLMHeadModel(tf.keras.layers.Layer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change, no? Maybe we want to add an alias for TFTransfoXLLMHead
for backward compatibility @LysandreJik
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is my bad, I just done a commit to rename it.
) | ||
|
||
return optimizer | ||
return optimizer, lr_schedule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a breaking change - we should document this well so that the user knows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this is not backwards compatible. @jplu, do you expect this method to be currently used by users outside of the Trainer
? Would this breaking change impact those users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly no, I not expecting users to use this outside the TF Trainer. Also the TF trainer has been updated to use this new return format, such as the PT one. Including the examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I thought so too!
@@ -3,9 +3,9 @@ | |||
from dataclasses import dataclass, field | |||
from typing import Optional | |||
|
|||
import matplotlib.pyplot as plt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had problems with isort on this file as well :D I think you might just want to reverse this change manually to fix isort. It seems like you also have the wrong isort version...quite annoying this isort bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ! And thanks for the hint 😄
A more general question regarding training in TensorFlow (I'm not super familiar with TF 2.0 training, so I'm asking primarily to learn a bit :-) ): Was it possible and recommended to train transformer models with keras' |
This is a good question! Short answer: yes it is still possible but witthout any gradient accumulation, that's mostly why the trainer uses the advanced training of TensorFlow. I'm currently preparing a next PR that will integrate the new |
@patrickvonplaten It was possible and we definitely aim to keep compatibility with keras' @julien-c, we've had the offline approval from @thomwolf, feel free to merge when you want. Glad to welcome this in the library! |
Just tweaked the training_args.logging_dir to keep the same default as pytorch (I like that it creates a new subfolder each time you relaunch a training) Great job @jplu, thank you 💪 |
Hello,
Here a quite big PR that propose the following updates:
mode
andloss_name
parameters for the TF Trainer.Reviews are welcome :)
/cc @julien-c @LysandreJik @thomwolf