Skip to content

Conversation

lezcano
Copy link
Collaborator

@lezcano lezcano commented Oct 28, 2021

Stack from ghstack:

Fixes #57459

After discussing the linked issue, we resolved that F.kl_div computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

cc @brianjo @mruberry

Differential Revision: D32136888

Fixes #57459

After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

[ghstack-poisoned]
@pytorch-probot
Copy link

pytorch-probot bot commented Oct 28, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/14665818a3204e01df7b3e4de9e68c77e06cf1f5/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-dynamic ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
docker-builds ciflow/all 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-xenial-py3-clang5-mobile-code-analysis ciflow/all, ciflow/linux, ciflow/mobile 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Oct 28, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 1466581 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@lezcano lezcano requested review from mruberry and pmeier and removed request for albanD October 28, 2021 14:29
@lezcano lezcano added the module: docs Related to our documentation, both in docs/ and docblocks label Oct 28, 2021
Fixes #57459

After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

cc brianjo mruberry

[ghstack-poisoned]
Fixes #57459

After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

cc brianjo mruberry

[ghstack-poisoned]
lezcano added a commit that referenced this pull request Oct 28, 2021
Fixes #57459

After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

ghstack-source-id: 348e8cd
Pull Request resolved: #67443
Fixes #57459

After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

cc brianjo mruberry

[ghstack-poisoned]
Fixes #57459

After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

cc brianjo mruberry

[ghstack-poisoned]
lezcano added a commit that referenced this pull request Oct 28, 2021
Fixes #57459

After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

ghstack-source-id: 6fe51a8
Pull Request resolved: #67443
Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some style suggestions and fixes inline and one larger possible mixup. Let's check this carefully to not mess this up again.

Comment on lines +430 to +431
is set to `False`, the losses are instead summed for each minibatch. Ignored
when :attr:`reduce` is `False`. Default: `True`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RST needs double backticks for inline code. Please revert here and all other occurrences.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double backticks are broken in the PyTorch theme. See pytorch/pytorch_sphinx_theme#130
We adopted in linalg the single backtick partial solution until this problem is solved.
See also the first two posts in #54878

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the formatting is only broken for inline code with spaces. You "fixed" single words here, so this should not be an issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for consistency with the line

log-space if :attr:`log_target`\ `= True`.

Note that when there's assignment, the previous formatting was just wrong. There is an :attr: (which, btw, are also broken), then an equal without any formatting and then the value. This happened for example in the line :attr:`reduction` = ``'mean'`` .

To avoid these problems and others, we went provisionally with this formatting until the issue above is fixed. I agree that if the formatting were right, we should just avoid single backticks and :attr: altogether, but here we are.

Copy link
Collaborator

@pmeier pmeier Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can't you do something like log_target=True? This is the PEP8 style when passing kwargs, e.g. criterion = nn.KLDivLoss(log_target=True).
  2. :attr: is not broken, you are using it wrong. It is used to reference attributes of a class and not the parameters of a callable. Sphinx doesn't generate any links for parameters, so there is nothing to cross-link against. Your example should be reduction="mean" if you want to express that the user needs to set something or reduction=="mean" for a conditional. Both should be put in double backticks, respectively.

Fixes #57459

After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

cc brianjo mruberry

[ghstack-poisoned]
@lezcano lezcano requested a review from pmeier October 29, 2021 08:15
@lezcano
Copy link
Collaborator Author

lezcano commented Oct 29, 2021

Addressed the points. Thanks @pmeier!

Fixes #57459

After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

cc brianjo mruberry

[ghstack-poisoned]
lezcano added a commit that referenced this pull request Oct 29, 2021
Fixes #57459

After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

ghstack-source-id: ea73138
Pull Request resolved: #67443
>>> # Sample a batch of distributions. Usually this would come from the dataset
>>> def normalize(p):
return p / p.sum(1, keepdim=True)
>>> return p / p.sum(1, keepdim=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indented lines need to start with ...

Suggested change
>>> return p / p.sum(1, keepdim=True)
... return p / p.sum(1, keepdim=True)

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nitpicks and style fixes left.

Fixes #57459

After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

cc brianjo mruberry

[ghstack-poisoned]
lezcano added a commit that referenced this pull request Oct 29, 2021
Fixes #57459

After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

ghstack-source-id: 6eb16cb
Pull Request resolved: #67443
Copy link
Contributor

@jbschlosser jbschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! Added a few comments below

@lezcano lezcano requested a review from jbschlosser November 2, 2021 09:33
Fixes #57459

After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

cc brianjo mruberry

[ghstack-poisoned]
lezcano added a commit that referenced this pull request Nov 2, 2021
Fixes #57459

After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

ghstack-source-id: ed55415
Pull Request resolved: #67443
@lezcano
Copy link
Collaborator Author

lezcano commented Nov 2, 2021

Thanks for the detailed review @jbschlosser ! I just addressed the comments. Could you have a second look at this, please?

Copy link
Contributor

@jbschlosser jbschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the in-depth improvements :)

@jbschlosser
Copy link
Contributor

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jbschlosser merged this pull request in 8e1ead8.

@facebook-github-bot facebook-github-bot deleted the gh/Lezcano/23/head branch November 7, 2021 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged module: docs Related to our documentation, both in docs/ and docblocks open source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants