Skip to content

TF Model train and eval step metrics for seq2seq models.#14009

Merged
Rocketknight1 merged 2 commits intohuggingface:masterfrom
pedro-r-marques:keras-fit-metrics
Oct 19, 2021
Merged

TF Model train and eval step metrics for seq2seq models.#14009
Rocketknight1 merged 2 commits intohuggingface:masterfrom
pedro-r-marques:keras-fit-metrics

Conversation

@pedro-r-marques
Copy link
Copy Markdown
Contributor

When using a model with a seq2seq output compute metrics against logits.

What does this PR do?

This PR changes the TF train and test steps so that metrics can be correctly computed when using keras model.fit.

The keras Model train/test step functions are supposed to compare the labels (y_true) and the predictions (y_pred).
The previous code was passing as y_pred the ModelOutput dataclass (basically a dict) of values which results in
TF/keras attempting to compute metrics between the variable 'y' (y_true) and each of the elements in the ModelOutput dict.

Who can review?

@sgugger
@Rocketknight1

@Rocketknight1
Copy link
Copy Markdown
Member

This seems like a good fix, thank you! The changes to train_step are very recent and we were expecting some initial problems like this. I'm going to do some testing with your PR today/tomorrow and hopefully approve and merge after that.

@Rocketknight1
Copy link
Copy Markdown
Member

I did some quick testing - the impact of the change seems limited as y_pred is only used for computing metrics at that stage of the function, so I don't think this can have any major unwanted side-effects. Approving.

@pedro-r-marques
Copy link
Copy Markdown
Contributor Author

pedro-r-marques commented Oct 15, 2021

@Rocketknight1 I did some more testing. When one compiles the model multiple times the metric doesn't work correctly. If a model is compiled only once it does work. I don't know if this is a concern.

When using a model with a seq2seq output compute metrics against logits.
@pedro-r-marques
Copy link
Copy Markdown
Contributor Author

@Rocketknight1 Please take another look. The test makes sure that the metric now works as expected. I also removed a call to compute the loss on test_step which seems to me to be a duplicate.

@Rocketknight1
Copy link
Copy Markdown
Member

@pedro-r-marques You're correct that some vestigial code made it into test_step - that's not great (at our end, you did a good job in spotting it!). Let me double-check that bit and tidy it up in your branch before we merge.

@Rocketknight1
Copy link
Copy Markdown
Member

Done! The old code comes from a period when we were experimenting with a different way of handling the model's internal loss computations, and should have been removed. I fixed it now, and the rest of the code looks good. If you're happy with my changes, we can merge once the tests are good.

@Rocketknight1
Copy link
Copy Markdown
Member

(That torch failure has nothing to do with this PR, don't worry)

@pedro-r-marques
Copy link
Copy Markdown
Contributor Author

@Rocketknight1 Checks are green now. Please merge the PR, if you are happy with it. Thanks a lot !

@Rocketknight1
Copy link
Copy Markdown
Member

The rebase reverted the fix to the vestigial code, so I took it out again. Will merge once it's all green!

@pedro-r-marques
Copy link
Copy Markdown
Contributor Author

@Rocketknight1 apologies for squashing your changes unintentionally :-(. Cleaned up the git log; hopefully preserving your changes this time and took another roll at the CI dice.

@Rocketknight1
Copy link
Copy Markdown
Member

@pedro-r-marques It's okay, I wrecked your changes too. Git is just hard, lol. Anyway, it looks good now and tests are green, so merging!

@Rocketknight1 Rocketknight1 merged commit 122c2f8 into huggingface:master Oct 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants