-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Align subclassing guide with docstring of Layer.add_loss() #1144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I assume,
Lossobjects can be used as well, is this correct? The wording would sound like it's discouraged.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My motivation for this change stems from design discussions with colleagues who had the impression that subclasses of
Losswould be the natural way to encapsulate some custom loss functions that we planned to use withadd_loss().So let's consider what class
Lossadds on top of a simple tensor-to-tensor function from(y_true, y_pred)to a total loss:call()function with automated application of per-example weights...For use in
Model.compile(loss=...), all of this makes sense.For use with
Layer.add_loss(...), not so much:add_loss()only takes the final result, no need to pass a parametrized callback.add_loss()does not track aLossobject for serialization as suchadd_loss()isSUM_OVER_BATCH_SIZE, and that will unexpectedly blow up in your face the moment you try to move the Model under a distribution strategy.So, not only did the similarity in name point us towards an abstraction that was overly complex (items 1-3), it actually hurt us as soon as we wanted to leverage TensorFlow's key feature: scaling up with distributed training (item 4).
Bottom line: Yes, I think it is appropriate to gently discourage an unreflected use of
Lossobjects withadd_loss(). I would have liked to leave a stronger hint at item 4 (see first commit of this PR), but I couldn't find a good link target to provide the necessary context.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Friendly ping... I think the proposed wording is completely correct and useful guidance for the user: While calling a
Lossobject is possible, with some care, a plain Tensor is often more appropriate (more often than the similar names suggest).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, especially given that majority of the
Lossclasses have the corresponding loss functions to use, there really isn't a need to complicate the workflow by using a class wrapper. It doesn't seem appropriate for a model builder to easily make mistake by using a reduction method that doesn't work with distribute strategies either, so discouraging using that is an option too. In any case, the way it's stated here lgtm.Thanks again for the change and patience!