Custom TF weights loading #7422

jplu · 2020-09-28T09:44:42Z

This PR provides a custom weight loading function in order to take into account dynamic model architecture building. More precisely, the brand new loading function takes into account the authorized_unexpected_keys and authorized_missing_keys class attributes enabling the possibility to ignore some layers in the models.

codecov · 2020-09-28T10:12:04Z

Codecov Report

Merging #7422 into master will decrease coverage by 0.21%.
The diff coverage is 87.50%.

@@            Coverage Diff             @@
##           master    #7422      +/-   ##
==========================================
- Coverage   78.51%   78.30%   -0.22%     
==========================================
  Files         184      181       -3     
  Lines       36734    35917     -817     
==========================================
- Hits        28843    28125     -718     
+ Misses       7891     7792      -99

Impacted Files	Coverage Δ
src/transformers/modeling_tf_pytorch_utils.py	`88.37% <50.00%> (-1.94%)`	⬇️
src/transformers/modeling_tf_utils.py	`88.06% <92.30%> (+0.84%)`	⬆️
src/transformers/modeling_tf_bert.py	`98.91% <100.00%> (+<0.01%)`	⬆️
src/transformers/modeling_tf_mobilebert.py	`24.59% <0.00%> (-72.35%)`	⬇️
src/transformers/configuration_mobilebert.py	`26.47% <0.00%> (-70.59%)`	⬇️
src/transformers/modeling_mobilebert.py	`23.51% <0.00%> (-65.93%)`	⬇️
src/transformers/modeling_tf_xlm.py	`58.52% <0.00%> (-34.74%)`	⬇️
src/transformers/modeling_lxmert.py	`69.91% <0.00%> (-20.82%)`	⬇️
src/transformers/trainer_utils.py	`63.30% <0.00%> (-5.35%)`	⬇️
src/transformers/trainer.py	`63.23% <0.00%> (-1.59%)`	⬇️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95f792a...6f52cc9. Read the comment docs.

sgugger

This is great, thanks for adding this to the TF side!
I have two nits on the docs and I know I'm the one to blame since I know where you copied it from ;-). We can fix the PreTrainedModel docstrings to match in this PR or I'll do another one for that.

src/transformers/modeling_tf_utils.py

jplu · 2020-09-28T14:57:11Z

Just merged your suggestions :)

patrickvonplaten · 2020-09-28T16:18:12Z

src/transformers/modeling_tf_bert.py

@@ -518,7 +518,7 @@ def __init__(self, config, **kwargs):
        self.return_dict = config.use_return_dict
        self.embeddings = TFBertEmbeddings(config, name="embeddings")
        self.encoder = TFBertEncoder(config, name="encoder")
-        self.pooler = TFBertPooler(config, name="pooler")
+        self.pooler = TFBertPooler(config, name="pooler") if add_pooling_layer else None


Thanks a lot for adding this - it's great that it works now :-) ! Could you maybe also add it for TFAlbert, TFMobileNet and TFLongformer as is done in the PyTorch versions?

patrickvonplaten · 2020-09-28T16:18:29Z

src/transformers/modeling_tf_bert.py

@@ -853,8 +855,7 @@ def call(self, inputs, **kwargs):

 @add_start_docstrings("""Bert Model with a `language modeling` head on top. """, BERT_START_DOCSTRING)
 class TFBertForMaskedLM(TFBertPreTrainedModel, TFMaskedLanguageModelingLoss):
-
-    authorized_missing_keys = [r"pooler"]
+    authorized_unexpected_keys = [r"pooler", r"nsp___cls"]


awesome :-)

patrickvonplaten · 2020-09-28T16:20:10Z

src/transformers/modeling_tf_bert.py

@@ -781,6 +781,8 @@ class TFBertForPreTrainingOutput(ModelOutput):
    BERT_START_DOCSTRING,
 )
 class TFBertModel(TFBertPreTrainedModel):
+    authorized_unexpected_keys = [r"nsp___cls"]


Why do we need to add "nsp__cls" to authorized_unexpected_keys here? I'm not sure that we need it

I think if one initializes a TFBertModel with nsp layer it is correct to show a warning that nsp__cls is not used because nsp__cls is not part of TFBertMainLayer in contrast to pooler.

We don't need it, just in case it appears in the weights we don't raise the exception. Do you think it would be better to authorize none keys here?

Fine, let's remove it then!

patrickvonplaten · 2020-09-28T16:21:52Z

src/transformers/modeling_tf_utils.py

+        resolved_archive_file (:obj:`str`):
+            The location of the H5 file.
+    """
+    from tensorflow.python.keras import backend as K


Can we move this import to the top?

patrickvonplaten

This is a great addition! This function allows better warnings and to remove the unnecessary self.pooler layer for some models.

Two things I would improve:

Can you add the add_pooling_layer logic to TFAbert, TFMobileBert and TFLongformer as well?
I don't think we should add layers such as nsp__cls to authorized_unexpected_keys as explained below

jplu · 2020-09-28T16:53:58Z

@patrickvonplaten I have done some updates, let me know if it looks like what you have in mind.

jplu · 2020-09-28T17:43:59Z

There is an issue with Longformer apparently.

jplu · 2020-09-28T20:06:39Z

Ok, I found why, and I should have thought about this much before.... 😣

We cannot have None into a tuple, the logic works only when the return_dict is True.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

jplu · 2020-10-05T08:07:47Z

@LysandreJik are we able to merge?

LysandreJik

Great, thanks a lot @jplu!

LysandreJik · 2020-10-05T12:52:42Z

tests/test_modeling_tf_bert.py

-    @slow
    def test_model_from_pretrained(self):
-        # for model_name in TF_BERT_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
-        for model_name in ["bert-base-uncased"]:
-            model = TFBertModel.from_pretrained(model_name)
-            self.assertIsNotNone(model)
+        model = TFBertModel.from_pretrained("jplu/tiny-tf-bert-random")
+        self.assertIsNotNone(model)
+
+    def test_custom_load_tf_weights(self):
+        model, output_loading_info = TFBertForTokenClassification.from_pretrained(
+            "jplu/tiny-tf-bert-random", use_cdn=False, output_loading_info=True
+        )
+        self.assertEqual(sorted(output_loading_info["unexpected_keys"]), ["mlm___cls", "nsp___cls"])
+        for layer in output_loading_info["missing_keys"]:
+            self.assertTrue(layer.split("_")[0] in ["dropout", "classifier"])


I think it makes sense to put tiny models here and test those without slow tests in a single model, so as to have feedback on this without relying on the slow testing suite, but on each PR's CI.

patrickvonplaten · 2020-10-05T13:29:49Z

Good to merge for me

LysandreJik · 2020-10-05T13:58:01Z

Ran the slow tests, they pass.

* First try * Fix TF utils * Handle authorized unexpected keys when loading weights * Add several more authorized unexpected keys * Apply style * Fix test * Address Patrick's comments. * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply style * Make return_dict the default behavior and display a warning message * Revert * Replace wrong keyword * Revert code * Add forgot key * Fix bug in loading PT models from a TF one. * Fix sort * Add a test for custom load weights in BERT * Apply style * Remove unused import Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

This reverts commit c8b54af.

jplu requested review from patrickvonplaten, thomwolf, LysandreJik and sgugger September 28, 2020 09:45

jplu marked this pull request as draft September 28, 2020 09:52

jplu marked this pull request as ready for review September 28, 2020 10:15

sgugger approved these changes Sep 28, 2020

View reviewed changes

src/transformers/modeling_tf_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_tf_utils.py Outdated Show resolved Hide resolved

jplu mentioned this pull request Sep 28, 2020

Missing keys when loading weights in TF are not useful #7383

Closed

patrickvonplaten reviewed Sep 28, 2020

View reviewed changes

jplu marked this pull request as draft September 28, 2020 17:43

jplu marked this pull request as ready for review September 29, 2020 14:38

jplu force-pushed the tf-weight-load branch from 6cccda6 to 48104a2 Compare September 29, 2020 17:49

jplu and others added 9 commits October 5, 2020 09:55

First try

c1b7892

Fix TF utils

e15f40f

Handle authorized unexpected keys when loading weights

265ca0d

Add several more authorized unexpected keys

337f988

Apply style

9c5417c

Fix test

049dfd4

Address Patrick's comments.

6c95a89

Update src/transformers/modeling_tf_utils.py

859742b

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/modeling_tf_utils.py

14dafb8

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

jplu added 8 commits October 5, 2020 09:55

Apply style

2820721

Make return_dict the default behavior and display a warning message

b3b7bd0

Revert

856c7de

Replace wrong keyword

94673bb

Revert code

ab042c5

Add forgot key

268ca54

Fix bug in loading PT models from a TF one.

1ead699

Fix sort

f40dbd7

jplu force-pushed the tf-weight-load branch from 295ab51 to f40dbd7 Compare October 5, 2020 07:55

jplu added 3 commits October 5, 2020 14:14

Add a test for custom load weights in BERT

03117e9

Apply style

6f52cc9

Remove unused import

294c56b

LysandreJik approved these changes Oct 5, 2020

View reviewed changes

LysandreJik reviewed Oct 5, 2020

View reviewed changes

LysandreJik force-pushed the tf-weight-load branch from 51c36e4 to 294c56b Compare October 5, 2020 13:58

LysandreJik merged commit 9cf7b23 into huggingface:master Oct 5, 2020

fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020

Revert "Custom TF weights loading (huggingface#7422)"

df97476

This reverts commit c8b54af.

jplu deleted the tf-weight-load branch June 13, 2023 14:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom TF weights loading #7422

Custom TF weights loading #7422

jplu commented Sep 28, 2020

codecov bot commented Sep 28, 2020 •

edited

sgugger left a comment

jplu commented Sep 28, 2020

patrickvonplaten Sep 28, 2020

jplu Sep 28, 2020

patrickvonplaten Sep 28, 2020

patrickvonplaten Sep 28, 2020

patrickvonplaten Sep 28, 2020

jplu Sep 28, 2020

jplu Sep 28, 2020

patrickvonplaten Sep 28, 2020

patrickvonplaten left a comment

jplu commented Sep 28, 2020

jplu commented Sep 28, 2020

jplu commented Sep 28, 2020

jplu commented Oct 5, 2020

LysandreJik left a comment

LysandreJik Oct 5, 2020

patrickvonplaten commented Oct 5, 2020

LysandreJik commented Oct 5, 2020

Custom TF weights loading #7422

Custom TF weights loading #7422

Conversation

jplu commented Sep 28, 2020

codecov bot commented Sep 28, 2020 • edited

Codecov Report

sgugger left a comment

Choose a reason for hiding this comment

jplu commented Sep 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

jplu commented Sep 28, 2020

jplu commented Sep 28, 2020

jplu commented Sep 28, 2020

jplu commented Oct 5, 2020

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten commented Oct 5, 2020

LysandreJik commented Oct 5, 2020

codecov bot commented Sep 28, 2020 •

edited