-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Fix the kl_div docs #67443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the kl_div docs #67443
Conversation
Fixes #57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. [ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow For more information, please take a look at the CI Flow Wiki. |
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 1466581 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Fixes #57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. cc brianjo mruberry [ghstack-poisoned]
Fixes #57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. cc brianjo mruberry [ghstack-poisoned]
Fixes #57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. ghstack-source-id: 348e8cd Pull Request resolved: #67443
Fixes #57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. cc brianjo mruberry [ghstack-poisoned]
Fixes #57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. cc brianjo mruberry [ghstack-poisoned]
Fixes #57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. ghstack-source-id: 6fe51a8 Pull Request resolved: #67443
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some style suggestions and fixes inline and one larger possible mixup. Let's check this carefully to not mess this up again.
is set to `False`, the losses are instead summed for each minibatch. Ignored | ||
when :attr:`reduce` is `False`. Default: `True` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RST needs double backticks for inline code. Please revert here and all other occurrences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double backticks are broken in the PyTorch theme. See pytorch/pytorch_sphinx_theme#130
We adopted in linalg
the single backtick partial solution until this problem is solved.
See also the first two posts in #54878
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the formatting is only broken for inline code with spaces. You "fixed" single words here, so this should not be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for consistency with the line
log-space if :attr:`log_target`\ `= True`.
Note that when there's assignment, the previous formatting was just wrong. There is an :attr:
(which, btw, are also broken), then an equal without any formatting and then the value. This happened for example in the line :attr:`reduction` = ``'mean'``
.
To avoid these problems and others, we went provisionally with this formatting until the issue above is fixed. I agree that if the formatting were right, we should just avoid single backticks and :attr:
altogether, but here we are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can't you do something like
log_target=True
? This is the PEP8 style when passing kwargs, e.g.criterion = nn.KLDivLoss(log_target=True)
. :attr:
is not broken, you are using it wrong. It is used to reference attributes of a class and not the parameters of a callable. Sphinx doesn't generate any links for parameters, so there is nothing to cross-link against. Your example should bereduction="mean"
if you want to express that the user needs to set something orreduction=="mean"
for a conditional. Both should be put in double backticks, respectively.
Fixes #57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. cc brianjo mruberry [ghstack-poisoned]
Addressed the points. Thanks @pmeier! |
Fixes #57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. cc brianjo mruberry [ghstack-poisoned]
Fixes #57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. ghstack-source-id: ea73138 Pull Request resolved: #67443
torch/nn/modules/loss.py
Outdated
>>> # Sample a batch of distributions. Usually this would come from the dataset | ||
>>> def normalize(p): | ||
return p / p.sum(1, keepdim=True) | ||
>>> return p / p.sum(1, keepdim=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indented lines need to start with ...
>>> return p / p.sum(1, keepdim=True) | |
... return p / p.sum(1, keepdim=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only nitpicks and style fixes left.
Fixes #57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. cc brianjo mruberry [ghstack-poisoned]
Fixes #57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. ghstack-source-id: 6eb16cb Pull Request resolved: #67443
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! Added a few comments below
Fixes #57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. cc brianjo mruberry [ghstack-poisoned]
Fixes #57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. ghstack-source-id: ed55415 Pull Request resolved: #67443
Thanks for the detailed review @jbschlosser ! I just addressed the comments. Could you have a second look at this, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the in-depth improvements :)
@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@jbschlosser merged this pull request in 8e1ead8. |
Stack from ghstack:
Fixes #57459
After discussing the linked issue, we resolved that
F.kl_div
computesthe right thing as to be consistent with the rest of the losses in
PyTorch.
To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.
These docs also add an example that may further help understanding the
intended use of this loss.
cc @brianjo @mruberry
Differential Revision: D32136888