Fix T5 and BART for TF #9063

jplu · 2020-12-11T15:10:33Z

What does this PR do?

This PR fix the TensorFlow implementation of T5 and BART to make them graph compilation+execution compliant and then be able to create a savedmodel for each them.

The slow tests test_saved_model_with_hidden_states_output and test_saved_model_with_attentions_output are now passing for both models.

sgugger

Way cleaner! Thanks a lot for doing this!

sgugger · 2020-12-11T17:47:53Z

tests/test_modeling_tf_common.py

+                    key = np.array(key, dtype=bool)
+                    pt_inputs_dict[name] = torch.from_numpy(key).to(torch.long)


Any reason to change a bool to a tensor? We can just keep it as a boolean, no?

This is because now, the use_cache parameter returned by the _prepare_for_class method in the T5 test file is now a primitive boolean and not anymore a tensor. Thanks to the changes the value is not forced to be a tensor anymore :)

Yes I understand that. PyTorch accepts regular booleans in its models, that's why I'm confused why we convert it to a PyTorch tensor here.

This is because of the .numpy() to convert TF tensors to numpy array. Simple booleans don't have this attribute.

My question is why not just pass pt_inputs_dict[name] = key

Ah ok!! Simply because I didn't know 😄 Just pushed the update^^

LysandreJik

Overall way cleaner. Happy to see the cast_bool_to_primitive go!
Thanks for working on this.

LysandreJik · 2020-12-14T02:35:16Z

src/transformers/models/bart/modeling_tf_bart.py

-        output_attentions = output_attentions if output_attentions is not None else self.output_attentions
-        output_hidden_states = output_hidden_states if output_hidden_states is not None else self.output_hidden_states
-        use_cache = use_cache if use_cache is not None else self.use_cache
-        return_dict = return_dict if return_dict is not None else self.config.return_dict
-        if use_cache:
-            assert not training, "Training + use cache are incompatible"
        # check attention mask and invert
-        use_cache = cast_bool_to_primitive(use_cache)


Does removing this mean the TFBartDecoder will not be able to handle the output attentions/hidden states/cache/return dict parameters on its own, but will instead rely on being called by the class TFBart{Model, ForConditionalGeneration}?

If @patrickvonplaten is doing the same refactor for TF BART as he did for the PT version, then one of the enhancements it offers is being able to use the (TF)BartDecoder as a standalone model. Will this change prevent this?

If the (TF)BartDecoder aims to be used as a standalone, the input_processing must be added. I can add it just in case, it won't arm the current bebavior, just an "over" processing. Should I do it?

I think @patrickvonplaten will probably take care of that in his PR, so no problem here. I would still wait for his review before merging!

I haven't looked at having neither TFBart nor TFT5 as "encoder-only" or "decoder-only" model yet because a) there is not TFEncoderDecoder model and b) because of the issues that will be solved in this PR. So I'm 100% fine to delete it for now here

LysandreJik · 2020-12-14T02:35:41Z

src/transformers/models/t5/modeling_tf_t5.py

-            decoder_hidden_states=decoder_outputs[2],
-            decoder_attentions=decoder_outputs[3],
-            encoder_last_hidden_state=inputs["encoder_outputs"][0],
-            encoder_hidden_states=inputs["encoder_outputs"][1],
-            encoder_attentions=inputs["encoder_outputs"][2],
+            decoder_hidden_states=decoder_outputs.hidden_states,
+            decoder_attentions=decoder_outputs.attentions,
+            encoder_last_hidden_state=inputs["encoder_outputs"].last_hidden_state,
+            encoder_hidden_states=inputs["encoder_outputs"].hidden_states,
+            encoder_attentions=inputs["encoder_outputs"].attentions,


Infinitely cleaner

patrickvonplaten · 2020-12-14T15:50:28Z

src/transformers/modeling_tf_utils.py

@@ -354,7 +354,8 @@ def input_processing(func, config, input_ids, **kwargs):
        if isinstance(v, allowed_types) or v is None:
            output[k] = v
        else:
-            raise ValueError(f"Data of type {type(v)} is not allowed only tf.Tensor is accepted for {k}.")
+            print(k, v)


think we can delete print(k, v) no?

patrickvonplaten · 2020-12-14T15:50:41Z

src/transformers/modeling_tf_utils.py

@@ -1366,31 +1367,6 @@ def get_initializer(initializer_range: float = 0.02) -> tf.initializers.Truncate
    return tf.keras.initializers.TruncatedNormal(stddev=initializer_range)


-def cast_bool_to_primitive(bool_variable: Union[tf.Tensor, bool], default_tensor_to_true=False) -> bool:


patrickvonplaten · 2020-12-14T15:55:23Z

src/transformers/models/bart/modeling_tf_bart.py

@@ -1046,7 +1029,7 @@ def __init__(self, config, *inputs, **kwargs):
        self.use_cache = config.use_cache
        # final_bias_logits is registered as a buffer in pytorch, so not trainable for the the sake of consistency.
        self.final_logits_bias = self.add_weight(
-            name="/final_logits_bias", shape=[1, config.vocab_size], initializer="zeros", trainable=False


great catch!

patrickvonplaten · 2020-12-14T15:57:56Z

tests/test_modeling_tf_common.py

@@ -574,7 +591,10 @@ def check_hidden_states_output(config, inputs_dict, model_class):
                self.model_tester, "expected_num_hidden_layers", self.model_tester.num_hidden_layers + 1
            )

-            hidden_states = outputs[-1]
+            if hasattr(outputs, "hidden_states"):


can we have an if model.config.is_encoder_decoder case here instead ? Seems more in line with PT tests and we should test both decoder_hidden_states and encoder_hidden_states for encoder-decoder

jplu · 2020-12-14T16:21:07Z

I should have addressed everybody's comments :)

patrickvonplaten · 2020-12-14T17:12:39Z

tests/test_modeling_tf_common.py

+            if self.is_encoder_decoder:
+                hidden_states = outputs.decoder_hidden_states
+            else:
+                hidden_states = outputs.hidden_states


Sorry, I meant something a bit more like this:

if model.config.is_encoder_decoder: encoder_hidden_states = outputs.encoder_hidden_states decoder_hidden_states = outputs.decoder_hidden_states self.assertEqual(config.output_attentions, False) self.assertEqual(len(encoder_hidden_states), expected_num_layers) self.assertListEqual( list(encoder_hidden_states[0].shape[-2:]), [self.model_tester.seq_length, self.model_tester.hidden_size], ) self.assertEqual(len(decoder_hidden_states), expected_num_layers) self.assertListEqual( list(decoder_hidden_states[0].shape[-2:]), [self.model_tester.seq_length, self.model_tester.hidden_size], ) else: hidden_states = outputs.hidden_states self.assertEqual(config.output_attentions, False) self.assertEqual(len(hidden_states), expected_num_layers) self.assertListEqual( list(hidden_states[0].shape[-2:]), [self.model_tester.seq_length, self.model_tester.hidden_size], )

Always good to check for both encoder & decoder for encoder_decoder and 2) I prefer to stop using the self.is_encoder_decoder flag (the config better defines whether a model is an encoder-decoder - not really the test case IMO)

Ok, just pushed the fix!!!

patrickvonplaten

Just have one comment left for the test. After this I think we can merge and I'll rebase my TFBart refactor PR on the new changes here :-)

jplu force-pushed the fix-t5-bart branch from 6b558a8 to 51ecd72 Compare December 11, 2020 15:32

patrickvonplaten self-requested a review December 11, 2020 16:01

jplu requested review from sgugger and LysandreJik December 11, 2020 17:10

sgugger approved these changes Dec 11, 2020

View reviewed changes

LysandreJik approved these changes Dec 14, 2020

View reviewed changes

jplu added 10 commits December 14, 2020 11:02

Fix T5 for graphe compilation+execution

1fe010b

Fix BART

66f75af

Fix import

3392ae3

Fix naming

e39a405

fix attribute name

c709cd4

Oops

4f72df5

fix import

bf336c3

fix tests

5201830

fix tests

a094d2a

Update test

b63cab0

jplu force-pushed the fix-t5-bart branch from 6f37f3b to b63cab0 Compare December 14, 2020 10:02

Add mising import

38843c1

patrickvonplaten reviewed Dec 14, 2020

View reviewed changes

jplu added 2 commits December 14, 2020 17:19

Address Patrick's comments

0d05ab9

Style

db7b0f9

patrickvonplaten reviewed Dec 14, 2020

View reviewed changes

patrickvonplaten approved these changes Dec 14, 2020

View reviewed changes

patrickvonplaten mentioned this pull request Dec 14, 2020

[TF Bart] Refactor TFBart #9029

Merged

10 tasks

Address Patrick's comment

bc77f8a

patrickvonplaten merged commit df3f4d2 into huggingface:master Dec 14, 2020

jplu deleted the fix-t5-bart branch December 14, 2020 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix T5 and BART for TF #9063

Fix T5 and BART for TF #9063

jplu commented Dec 11, 2020

sgugger left a comment

sgugger Dec 11, 2020

jplu Dec 11, 2020 •

edited

sgugger Dec 11, 2020

jplu Dec 11, 2020

sgugger Dec 11, 2020

jplu Dec 11, 2020

LysandreJik left a comment

LysandreJik Dec 14, 2020

jplu Dec 14, 2020

LysandreJik Dec 14, 2020

patrickvonplaten Dec 14, 2020

LysandreJik Dec 14, 2020

patrickvonplaten Dec 14, 2020

patrickvonplaten Dec 14, 2020

patrickvonplaten Dec 14, 2020

patrickvonplaten Dec 14, 2020

jplu Dec 14, 2020

jplu commented Dec 14, 2020

patrickvonplaten Dec 14, 2020

jplu Dec 14, 2020

patrickvonplaten left a comment

		key = np.array(key, dtype=bool)
		pt_inputs_dict[name] = torch.from_numpy(key).to(torch.long)

		@@ -1366,31 +1367,6 @@ def get_initializer(initializer_range: float = 0.02) -> tf.initializers.Truncate
		return tf.keras.initializers.TruncatedNormal(stddev=initializer_range)


		def cast_bool_to_primitive(bool_variable: Union[tf.Tensor, bool], default_tensor_to_true=False) -> bool:

Fix T5 and BART for TF #9063

Fix T5 and BART for TF #9063

Conversation

jplu commented Dec 11, 2020

What does this PR do?

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jplu Dec 11, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jplu commented Dec 14, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

jplu Dec 11, 2020 •

edited