Test model outputs equivalence #6445

LysandreJik · 2020-08-12T15:43:46Z

Adds a test to check that the model outputs keep the same values and order as the tuple output.

codecov · 2020-08-13T10:15:26Z

Codecov Report

Merging #6445 into master will increase coverage by 0.38%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #6445      +/-   ##
==========================================
+ Coverage   79.95%   80.33%   +0.38%     
==========================================
  Files         153      153              
  Lines       27932    27928       -4     
==========================================
+ Hits        22332    22437     +105     
+ Misses       5600     5491     -109

Impacted Files	Coverage Δ
src/transformers/modeling_tf_longformer.py	`98.69% <100.00%> (+0.63%)`	⬆️
src/transformers/modeling_tf_bert.py	`69.06% <0.00%> (-27.52%)`	⬇️
src/transformers/modeling_bert.py	`88.26% <0.00%> (-0.17%)`	⬇️
src/transformers/modeling_xlm.py	`91.23% <0.00%> (+0.21%)`	⬆️
src/transformers/modeling_t5.py	`83.71% <0.00%> (+0.37%)`	⬆️
src/transformers/modeling_reformer.py	`96.09% <0.00%> (+0.41%)`	⬆️
src/transformers/modeling_tf_transfo_xl.py	`88.13% <0.00%> (+0.48%)`	⬆️
src/transformers/modeling_transfo_xl.py	`79.69% <0.00%> (+0.56%)`	⬆️
src/transformers/modeling_tf_flaubert.py	`88.34% <0.00%> (+0.61%)`	⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96c3329...400c5ad. Read the comment docs.

LysandreJik · 2020-08-13T09:57:56Z

src/transformers/modeling_tf_longformer.py

-        return outputs  # outputs, (hidden states), (attentions)
+        if not return_dict:
+            return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None)
+        return TFBaseModelOutput(
+            last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions
+        )


@patrickvonplaten Longformer wasn't working if using return_dict=True, because the encoder only output a tuple, which is then used as such:

encoder_outputs = self.encoder( [embedding_output, extended_attention_mask, output_attentions, output_hidden_states, padding_len], training=training, )

When using the return_dict flag, this could would then fail:

return TFBaseModelOutputWithPooling( last_hidden_state=sequence_output, pooler_output=pooled_output, hidden_states=encoder_outputs.hidden_states, attentions=encoder_outputs.attentions, )

because encoder_outputs wasn't a dict/named tuple.

Good catch!

LysandreJik · 2020-08-13T10:00:49Z

src/transformers/modeling_tf_longformer.py

-            [embedding_output, extended_attention_mask, output_attentions, output_hidden_states, padding_len],
+            embedding_output,
+            attention_mask=extended_attention_mask,
+            padding_len=padding_len,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,


@patrickvonplaten After the previous fix was done, there was an issue about output_attentions being a tf.Tensor that couldn't be used as a Python bool. The fix was to pass the arguments to the encoder as kwargs instead of as a dict.

I see, that's were good to know, thanks!

It would be good to try the same in TF T5 which also passes the inputs as a a big list. Maybe this was the cause of our bugs.

LysandreJik · 2020-08-13T10:26:48Z

tests/test_modeling_common.py

+        if return_labels:
+            if model_class in MODEL_FOR_MULTIPLE_CHOICE_MAPPING.values():
+                inputs_dict["labels"] = torch.ones(self.model_tester.batch_size, dtype=torch.long, device=torch_device)


Forgot to add this in torch when I added the loss computation tests in TF. Will add tests for loss computation in torch soon.

(nit) The last 4 elifs are identical maybe we can shorten the code there a bit with one long if statement

LysandreJik · 2020-08-13T10:28:43Z

tests/test_modeling_t5.py

+        # For common tests
+        self.seq_length = self.decoder_seq_length


@patrickvonplaten this value is needed for the labels building. Is it alright if I add this?

patrickvonplaten · 2020-08-13T10:57:51Z

Awesome that we can remove the cast_to_bool hack here. Maybe we can remove it in t5_modeling_tf_ as well

patrickvonplaten

Awesome. Great test to check for output equivalence!

sgugger

Looks great to me! Thanks for all the work!

sgugger · 2020-08-13T12:30:29Z

src/transformers/modeling_tf_longformer.py

-        return outputs  # outputs, (hidden states), (attentions)
+        if not return_dict:
+            return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None)
+        return TFBaseModelOutput(
+            last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions
+        )


Good catch!

sgugger · 2020-08-13T12:31:56Z

src/transformers/modeling_tf_longformer.py

-            [embedding_output, extended_attention_mask, output_attentions, output_hidden_states, padding_len],
+            embedding_output,
+            attention_mask=extended_attention_mask,
+            padding_len=padding_len,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,


It would be good to try the same in TF T5 which also passes the inputs as a a big list. Maybe this was the cause of our bugs.

sgugger · 2020-08-13T12:33:24Z

tests/test_modeling_tf_common.py

@@ -88,20 +91,28 @@ def _prepare_for_class(self, inputs_dict, model_class, return_labels=False):

        if return_labels:
            if model_class in TF_MODEL_FOR_MULTIPLE_CHOICE_MAPPING.values():
-                inputs_dict["labels"] = tf.ones(self.model_tester.batch_size)
+                inputs_dict["labels"] = tf.ones(self.model_tester.batch_size, dtype=tf.int32)


Like Patrick's comment above, a lot of the tests have identical code in their conditions.

sgugger · 2020-08-13T12:35:01Z

tests/test_modeling_tf_common.py

+
+            tuple_inputs = self._prepare_for_class(inputs_dict, model_class)
+            dict_inputs = self._prepare_for_class(inputs_dict, model_class)
+            check_equivalence(model, tuple_inputs, dict_inputs, {"output_attentions": True})


Can we add 4 more:

return_labels=True, output_hidden_states=True return_labels=True, output_attentions=True output_hidden_states=True, output_attentions=True return_labels=True, output_hidden_states=True, output_hidden_states=True

for the sake of completeness?

Yes, we can!

sgugger · 2020-08-13T13:38:37Z

Side note, you should double-check the slow tests test_saved_model_with_attentions_output and test_saved_model_with_hidden_states_output still pass with the changes for the longformer model, as they are the ones that fail for t5 when we remove the cast_to_bool thingy.

patrickvonplaten · 2020-08-13T14:29:28Z

Side note, you should double-check the slow tests test_saved_model_with_attentions_output and test_saved_model_with_hidden_states_output still pass with the changes for the longformer model, as they are the ones that fail for t5 when we remove the cast_to_bool thingy.

They did not pass with Longformer before as discussed with @jplu on the PR: #5764 (comment), they should actually pass now I think :-)

This reverts commit 0374349.

LysandreJik force-pushed the test-model-outputs branch from df36a7e to 5325bba Compare August 12, 2020 15:44

LysandreJik changed the title ~~Test model outputs equivalence~~ [WIP] Test model outputs equivalence Aug 12, 2020

LysandreJik mentioned this pull request Aug 13, 2020

Add ModelTesterMixin, UI improvement eltoto1219/transformers#1

Merged

Test model outputs equivalence

1d92630

LysandreJik force-pushed the test-model-outputs branch from a93000d to 1d92630 Compare August 13, 2020 08:01

LysandreJik changed the title ~~[WIP] Test model outputs equivalence~~ Test model outputs equivalence Aug 13, 2020

LysandreJik added 3 commits August 13, 2020 11:30

Fix failing tests

80fd4ed

From dict to kwargs

28fc1bb

DistilBERT

6a72fb6

LysandreJik commented Aug 13, 2020

View reviewed changes

LysandreJik requested review from patrickvonplaten and sgugger August 13, 2020 10:29

patrickvonplaten approved these changes Aug 13, 2020

View reviewed changes

sgugger approved these changes Aug 13, 2020

View reviewed changes

Addressing @sgugger and @patrickvonplaten's comments

400c5ad

LysandreJik force-pushed the test-model-outputs branch from 9d61a77 to 400c5ad Compare August 13, 2020 15:51

LysandreJik merged commit f7cbc13 into master Aug 13, 2020

LysandreJik deleted the test-model-outputs branch August 13, 2020 15:59

fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020

Revert "Test model outputs equivalence (huggingface#6445)"

9c14db5

This reverts commit 0374349.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test model outputs equivalence #6445

Test model outputs equivalence #6445

LysandreJik commented Aug 12, 2020

codecov bot commented Aug 13, 2020 •

edited

Loading

LysandreJik Aug 13, 2020

sgugger Aug 13, 2020

LysandreJik Aug 13, 2020

patrickvonplaten Aug 13, 2020

sgugger Aug 13, 2020

LysandreJik Aug 13, 2020

patrickvonplaten Aug 13, 2020

LysandreJik Aug 13, 2020

patrickvonplaten commented Aug 13, 2020

patrickvonplaten left a comment

sgugger left a comment

sgugger Aug 13, 2020

sgugger Aug 13, 2020

sgugger Aug 13, 2020

sgugger Aug 13, 2020

LysandreJik Aug 13, 2020

sgugger commented Aug 13, 2020

patrickvonplaten commented Aug 13, 2020

Test model outputs equivalence #6445

Test model outputs equivalence #6445

Conversation

LysandreJik commented Aug 12, 2020

codecov bot commented Aug 13, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten commented Aug 13, 2020

patrickvonplaten left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgugger commented Aug 13, 2020

patrickvonplaten commented Aug 13, 2020

codecov bot commented Aug 13, 2020 •

edited

Loading