Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test model outputs equivalence #6445

Merged
merged 5 commits into from
Aug 13, 2020
Merged

Test model outputs equivalence #6445

merged 5 commits into from
Aug 13, 2020

Conversation

LysandreJik
Copy link
Member

Adds a test to check that the model outputs keep the same values and order as the tuple output.

@LysandreJik LysandreJik changed the title Test model outputs equivalence [WIP] Test model outputs equivalence Aug 12, 2020
@LysandreJik LysandreJik changed the title [WIP] Test model outputs equivalence Test model outputs equivalence Aug 13, 2020
@codecov
Copy link

codecov bot commented Aug 13, 2020

Codecov Report

Merging #6445 into master will increase coverage by 0.38%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6445      +/-   ##
==========================================
+ Coverage   79.95%   80.33%   +0.38%     
==========================================
  Files         153      153              
  Lines       27932    27928       -4     
==========================================
+ Hits        22332    22437     +105     
+ Misses       5600     5491     -109     
Impacted Files Coverage Δ
src/transformers/modeling_tf_longformer.py 98.69% <100.00%> (+0.63%) ⬆️
src/transformers/modeling_tf_bert.py 69.06% <0.00%> (-27.52%) ⬇️
src/transformers/modeling_bert.py 88.26% <0.00%> (-0.17%) ⬇️
src/transformers/modeling_xlm.py 91.23% <0.00%> (+0.21%) ⬆️
src/transformers/modeling_t5.py 83.71% <0.00%> (+0.37%) ⬆️
src/transformers/modeling_reformer.py 96.09% <0.00%> (+0.41%) ⬆️
src/transformers/modeling_tf_transfo_xl.py 88.13% <0.00%> (+0.48%) ⬆️
src/transformers/modeling_transfo_xl.py 79.69% <0.00%> (+0.56%) ⬆️
src/transformers/modeling_tf_flaubert.py 88.34% <0.00%> (+0.61%) ⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96c3329...400c5ad. Read the comment docs.

Comment on lines -862 to +874
return outputs # outputs, (hidden states), (attentions)
if not return_dict:
return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None)
return TFBaseModelOutput(
last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten Longformer wasn't working if using return_dict=True, because the encoder only output a tuple, which is then used as such:

        encoder_outputs = self.encoder(
            [embedding_output, extended_attention_mask, output_attentions, output_hidden_states, padding_len],
            training=training,
        )

When using the return_dict flag, this could would then fail:

        return TFBaseModelOutputWithPooling(
            last_hidden_state=sequence_output,
            pooler_output=pooled_output,
            hidden_states=encoder_outputs.hidden_states,
            attentions=encoder_outputs.attentions,
        )

because encoder_outputs wasn't a dict/named tuple.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Comment on lines -995 to +1012
[embedding_output, extended_attention_mask, output_attentions, output_hidden_states, padding_len],
embedding_output,
attention_mask=extended_attention_mask,
padding_len=padding_len,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten After the previous fix was done, there was an issue about output_attentions being a tf.Tensor that couldn't be used as a Python bool. The fix was to pass the arguments to the encoder as kwargs instead of as a dict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that's were good to know, thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to try the same in TF T5 which also passes the inputs as a a big list. Maybe this was the cause of our bugs.

Comment on lines +81 to +83
if return_labels:
if model_class in MODEL_FOR_MULTIPLE_CHOICE_MAPPING.values():
inputs_dict["labels"] = torch.ones(self.model_tester.batch_size, dtype=torch.long, device=torch_device)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to add this in torch when I added the loss computation tests in TF. Will add tests for loss computation in torch soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) The last 4 elifs are identical maybe we can shorten the code there a bit with one long if statement

Comment on lines +40 to +41
# For common tests
self.seq_length = self.decoder_seq_length
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten this value is needed for the labels building. Is it alright if I add this?

@patrickvonplaten
Copy link
Contributor

Awesome that we can remove the cast_to_bool hack here. Maybe we can remove it in t5_modeling_tf_ as well

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Great test to check for output equivalence!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Thanks for all the work!

Comment on lines -862 to +874
return outputs # outputs, (hidden states), (attentions)
if not return_dict:
return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None)
return TFBaseModelOutput(
last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Comment on lines -995 to +1012
[embedding_output, extended_attention_mask, output_attentions, output_hidden_states, padding_len],
embedding_output,
attention_mask=extended_attention_mask,
padding_len=padding_len,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to try the same in TF T5 which also passes the inputs as a a big list. Maybe this was the cause of our bugs.

@@ -88,20 +91,28 @@ def _prepare_for_class(self, inputs_dict, model_class, return_labels=False):

if return_labels:
if model_class in TF_MODEL_FOR_MULTIPLE_CHOICE_MAPPING.values():
inputs_dict["labels"] = tf.ones(self.model_tester.batch_size)
inputs_dict["labels"] = tf.ones(self.model_tester.batch_size, dtype=tf.int32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Patrick's comment above, a lot of the tests have identical code in their conditions.


tuple_inputs = self._prepare_for_class(inputs_dict, model_class)
dict_inputs = self._prepare_for_class(inputs_dict, model_class)
check_equivalence(model, tuple_inputs, dict_inputs, {"output_attentions": True})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add 4 more:

return_labels=True, output_hidden_states=True
return_labels=True, output_attentions=True
output_hidden_states=True, output_attentions=True
return_labels=True, output_hidden_states=True, output_hidden_states=True

for the sake of completeness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can!

@sgugger
Copy link
Collaborator

sgugger commented Aug 13, 2020

Side note, you should double-check the slow tests test_saved_model_with_attentions_output and test_saved_model_with_hidden_states_output still pass with the changes for the longformer model, as they are the ones that fail for t5 when we remove the cast_to_bool thingy.

@patrickvonplaten
Copy link
Contributor

Side note, you should double-check the slow tests test_saved_model_with_attentions_output and test_saved_model_with_hidden_states_output still pass with the changes for the longformer model, as they are the ones that fail for t5 when we remove the cast_to_bool thingy.

They did not pass with Longformer before as discussed with @jplu on the PR: #5764 (comment), they should actually pass now I think :-)

@LysandreJik LysandreJik merged commit f7cbc13 into master Aug 13, 2020
@LysandreJik LysandreJik deleted the test-model-outputs branch August 13, 2020 15:59
fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants