-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test model outputs equivalence #6445
Conversation
df36a7e
to
5325bba
Compare
a93000d
to
1d92630
Compare
Codecov Report
@@ Coverage Diff @@
## master #6445 +/- ##
==========================================
+ Coverage 79.95% 80.33% +0.38%
==========================================
Files 153 153
Lines 27932 27928 -4
==========================================
+ Hits 22332 22437 +105
+ Misses 5600 5491 -109
Continue to review full report at Codecov.
|
return outputs # outputs, (hidden states), (attentions) | ||
if not return_dict: | ||
return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None) | ||
return TFBaseModelOutput( | ||
last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickvonplaten Longformer wasn't working if using return_dict=True
, because the encoder only output a tuple, which is then used as such:
encoder_outputs = self.encoder(
[embedding_output, extended_attention_mask, output_attentions, output_hidden_states, padding_len],
training=training,
)
When using the return_dict
flag, this could would then fail:
return TFBaseModelOutputWithPooling(
last_hidden_state=sequence_output,
pooler_output=pooled_output,
hidden_states=encoder_outputs.hidden_states,
attentions=encoder_outputs.attentions,
)
because encoder_outputs
wasn't a dict/named tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
[embedding_output, extended_attention_mask, output_attentions, output_hidden_states, padding_len], | ||
embedding_output, | ||
attention_mask=extended_attention_mask, | ||
padding_len=padding_len, | ||
output_attentions=output_attentions, | ||
output_hidden_states=output_hidden_states, | ||
return_dict=return_dict, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickvonplaten After the previous fix was done, there was an issue about output_attentions
being a tf.Tensor
that couldn't be used as a Python bool. The fix was to pass the arguments to the encoder as kwargs instead of as a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that's were good to know, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to try the same in TF T5 which also passes the inputs as a a big list. Maybe this was the cause of our bugs.
if return_labels: | ||
if model_class in MODEL_FOR_MULTIPLE_CHOICE_MAPPING.values(): | ||
inputs_dict["labels"] = torch.ones(self.model_tester.batch_size, dtype=torch.long, device=torch_device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to add this in torch when I added the loss computation tests in TF. Will add tests for loss computation in torch soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) The last 4 elifs are identical maybe we can shorten the code there a bit with one long if statement
# For common tests | ||
self.seq_length = self.decoder_seq_length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickvonplaten this value is needed for the labels building. Is it alright if I add this?
Awesome that we can remove the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. Great test to check for output equivalence!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Thanks for all the work!
return outputs # outputs, (hidden states), (attentions) | ||
if not return_dict: | ||
return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None) | ||
return TFBaseModelOutput( | ||
last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
[embedding_output, extended_attention_mask, output_attentions, output_hidden_states, padding_len], | ||
embedding_output, | ||
attention_mask=extended_attention_mask, | ||
padding_len=padding_len, | ||
output_attentions=output_attentions, | ||
output_hidden_states=output_hidden_states, | ||
return_dict=return_dict, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to try the same in TF T5 which also passes the inputs as a a big list. Maybe this was the cause of our bugs.
@@ -88,20 +91,28 @@ def _prepare_for_class(self, inputs_dict, model_class, return_labels=False): | |||
|
|||
if return_labels: | |||
if model_class in TF_MODEL_FOR_MULTIPLE_CHOICE_MAPPING.values(): | |||
inputs_dict["labels"] = tf.ones(self.model_tester.batch_size) | |||
inputs_dict["labels"] = tf.ones(self.model_tester.batch_size, dtype=tf.int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like Patrick's comment above, a lot of the tests have identical code in their conditions.
|
||
tuple_inputs = self._prepare_for_class(inputs_dict, model_class) | ||
dict_inputs = self._prepare_for_class(inputs_dict, model_class) | ||
check_equivalence(model, tuple_inputs, dict_inputs, {"output_attentions": True}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add 4 more:
return_labels=True, output_hidden_states=True
return_labels=True, output_attentions=True
output_hidden_states=True, output_attentions=True
return_labels=True, output_hidden_states=True, output_hidden_states=True
for the sake of completeness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can!
Side note, you should double-check the slow tests |
They did not pass with Longformer before as discussed with @jplu on the PR: #5764 (comment), they should actually pass now I think :-) |
9d61a77
to
400c5ad
Compare
This reverts commit 0374349.
Adds a test to check that the model outputs keep the same values and order as the tuple output.