Support for target with class probs in CrossEntropyLoss #61044

jbschlosser · 2021-06-30T16:06:44Z

Fixes #11959

Alternative approach to creating a new CrossEntropyLossWithSoftLabels class. This PR simply adds support for "soft targets" AKA class probabilities to the existing CrossEntropyLoss and NLLLoss classes.

Implementation is dumb and simple right now, but future work can add higher performance kernels for this case.

facebook-github-bot · 2021-06-30T16:06:51Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/61044
📄 Preview docs built from this PR

💊 CI failures summary and remediations

As of commit 9ca7ae3 (more details on the Dr. CI page):

✅ None of the CI failures appear to be your fault 💚

1/1 broken upstream at merge base 52d1ffb from Jul 27 until Jul 28

1 job timed out:

pytorch_xla_linux_bionic_py3_6_clang9_test

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

pytorch_xla_linux_bionic_py3_6_clang9_test from Jul 27 until Jul 28 (de3a4eb - d98b1c4)
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

torch/nn/modules/loss.py

facebook-github-bot · 2021-07-23T13:50:37Z

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zou3519

this looks pretty good, some suggestions and comments

aten/src/ATen/native/LossNLL.cpp

zou3519 · 2021-07-23T19:12:08Z

aten/src/ATen/native/LossNLL.cpp

+      // Compute weighted mean
+      ret = ret.sum() / (target * weight_).sum();


This... seems like a good argument for why a user would expect reduction=mean to return (-(input * target * weight_).mean()), I'm having a hard time coming up with a use case where someone wants probabilities and wants to do a weighted mean over the probabilities and weights.

At any rate we should probably be consistent with our hard targets cross_entropy function...

Yeah, I agree it doesn't make sense to do a weighted mean over probabilities and weights. I did it this way here to maintain consistency with the hard target cross-entropy loss- with one-hot targets, the results are equivalent between soft and hard if done like this :/

Also to be fully precise: I think a mean computation that fits user intuitions would be -(input * target * weight_).sum(1).mean(). As in the non-weighted calculation, sum(1) should be taken first before the mean to be correct.

torch/testing/_internal/common_nn.py

torch/nn/modules/loss.py

torch/nn/functional.py

zou3519

Some minor comments but otherwise this LGTM!

aten/src/ATen/native/LossNLL.cpp

zou3519 · 2021-07-28T13:51:16Z

aten/src/ATen/native/LossNLL.cpp

-      weight,
-      reduction,
-      ignore_index);
+  Tensor ret;


NRVO yes, but I'd expect the compiler to do RVO. Not sure how to test for this though; feel free to leave the code as-is.

torch/nn/modules/loss.py

zou3519 · 2021-07-28T13:59:26Z

torch/nn/modules/loss.py

+      .. math::
+          \ell(x, y) = \begin{cases}
+              \frac{\sum_{n=1}^N l_n}{N}, &
+               \text{if reduction} = \text{`mean';}\\
+                \sum_{n=1}^N l_n,  &
+                \text{if reduction} = \text{`sum'.}
+            \end{cases}


The mean case is only true if the input and target are of size (N, C). Otherwise, we divide by a factor that isn't the batch size -- for a tensor of shape (N, C, d1, d2, ..., dk) we end up dividing by a factor of tensor.numel() / C, right?

Maybe this is OK because we can view data of (N, C, d1, d2, ..., dk) as being a "batch" of (N, d1, d2, ..., dk) distributions.

Yeah, N is doing a lot of work implicitly here. I do think that data of shape (N, C, d1, d2, ..., dk) is conceptually a batch of (N, d1, d2, ..., dk) distributions (and as mentioned before, I think d1, ..., dk should have been added to the left of C for this to be clearer, but that ship has sailed).

While this was carried over to some extent from the old docs, each item in the formula is now more explicitly defined, so I think it needs to be more precise. Specifically, "N is the batch size" should change. Borrowing some terminology from KLDivLoss, we could do something like:

N spans the minibatch dimension as well as dimensions d1, ..., dk in the case of K-dimensional loss

wdyt?

N spans the minibatch dimension as well as dimensions d1, ..., dk in the case of K-dimensional loss

That sounds good

facebook-github-bot · 2021-07-28T16:44:30Z

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-07-28T16:54:46Z

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-07-29T17:06:54Z

@jbschlosser merged this pull request in a42345a.

VitamintK · 2023-01-05T22:01:47Z

the commit message in a42345a (from the first post of this PR) says This PR simply adds support for "soft targets" AKA class probabilities to the existing CrossEntropyLossandNLLLoss classes., but it actually only adds the change for CrossEntropyLoss, right?

jbschlosser · 2023-01-06T19:46:56Z

the commit message in a42345a (from the first post of this PR) says This PR simply adds support for "soft targets" AKA class probabilities to the existing CrossEntropyLossandNLLLoss classes., but it actually only adds the change for CrossEntropyLoss, right?

@VitamintK That's right; it was only added to CrossEntropyLoss.

jbschlosser requested a review from albanD as a code owner June 30, 2021 16:06

facebook-github-bot added the cla signed label Jun 30, 2021

jbschlosser requested a review from zou3519 June 30, 2021 16:06

jbschlosser mentioned this pull request Jul 6, 2021

[feature request] Support soft target distribution in cross entropy loss #11959

Closed

albanD removed their request for review July 12, 2021 01:05

zou3519 reviewed Jul 15, 2021

View reviewed changes

torch/nn/modules/loss.py Outdated Show resolved Hide resolved

torch/nn/modules/loss.py Show resolved Hide resolved

jbschlosser force-pushed the ce_with_soft_support branch from e95f9ae to abd8c30 Compare July 22, 2021 14:33

jbschlosser changed the title ~~Support for target with class probs in NLLLoss / CrossEntropyLoss~~ Support for target with class probs in CrossEntropyLoss Jul 22, 2021

jbschlosser requested a review from zou3519 July 22, 2021 17:10

zou3519 reviewed Jul 23, 2021

View reviewed changes

jbschlosser mentioned this pull request Jul 27, 2021

The class weights implementation is incorrect #61309

Open

jbschlosser requested a review from zou3519 July 27, 2021 20:12

jbschlosser force-pushed the ce_with_soft_support branch from 0ba16ea to 7e7cace Compare July 27, 2021 20:28

zou3519 approved these changes Jul 28, 2021

View reviewed changes

jbschlosser added 10 commits July 28, 2021 11:40

Support for target with class probs in NLLLoss / CrossEntropyLoss

183a8d2

Implement change for CrossEntropyLoss only

25bdf6e

Remove ignore_index from CE reference call

12c28a8

Address PR comments

2927b29

Remove print statements

579c23e

Fix class weighting, making unit weight and no weight cases consistent

6a45a8b

Use SmallBuffer instead of std::vector for weight shape

0a9b375

Fix: docs for weighted mean calculation

d8104f5

Address PR comments

83769ca

Fix: unit weight test for CUDA

8dab918

jbschlosser force-pushed the ce_with_soft_support branch from 5137b89 to 8dab918 Compare July 28, 2021 15:41

Fix: lint

3b7a7e2

Fix and move test

9ca7ae3

facebook-github-bot closed this in a42345a Jul 29, 2021

facebook-github-bot added the Merged label Jul 29, 2021

jbschlosser mentioned this pull request Aug 4, 2021

Implementation of nn.CrossEntropyLossWithSoftLabels #59824

Closed

thomasjpfan mentioned this pull request Aug 12, 2021

ENH Adds label_smoothing to cross entropy loss #63122

Closed

datumbox mentioned this pull request Aug 17, 2021

Update reference scripts to use the "Batteries Included" utils pytorch/vision#4281

Closed

4 tasks

This was referenced Sep 4, 2021

[RFC] Loss Functions in Torchvision pytorch/vision#2980

Open

[RFC] TorchVision with Batteries included - Phase 1 pytorch/vision#3911

Closed

ajaysaini725 mentioned this pull request Oct 21, 2021

Remove soft_cross_entropy mosaicml/composer#29

Closed

mberr mentioned this pull request Oct 23, 2021

Use torch.isin when torch 1.10 is released pykeen/pykeen#487

Closed

mberr mentioned this pull request Nov 14, 2021

Use nn.CrossEntropyLoss label smoothing instead of our own implementation pykeen/pykeen#636

Closed

VitamintK mentioned this pull request Jan 5, 2023

Update NLLLoss docs with "soft targets" aka class probabilities like in CrossEntropyLoss #91778

Closed

ptrblck mentioned this pull request Aug 23, 2023

torch.nn.functional.cross_entropy different loss when providing one_hot_target and class weights #107680

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for target with class probs in CrossEntropyLoss #61044

Support for target with class probs in CrossEntropyLoss #61044

jbschlosser commented Jun 30, 2021

facebook-github-bot commented Jun 30, 2021 •

edited

facebook-github-bot commented Jul 23, 2021

zou3519 left a comment

zou3519 Jul 23, 2021

jbschlosser Jul 26, 2021 •

edited

zou3519 left a comment

zou3519 Jul 28, 2021

zou3519 Jul 28, 2021

jbschlosser Jul 28, 2021

zou3519 Jul 28, 2021

facebook-github-bot commented Jul 28, 2021

facebook-github-bot commented Jul 28, 2021

facebook-github-bot commented Jul 29, 2021

VitamintK commented Jan 5, 2023

jbschlosser commented Jan 6, 2023 •

edited

		// Compute weighted mean
		ret = ret.sum() / (target * weight_).sum();

Support for target with class probs in CrossEntropyLoss #61044

Support for target with class probs in CrossEntropyLoss #61044

Conversation

jbschlosser commented Jun 30, 2021

facebook-github-bot commented Jun 30, 2021 • edited

🔗 Helpful links

💊 CI failures summary and remediations

🚧 1 fixed upstream failure:

facebook-github-bot commented Jul 23, 2021

zou3519 left a comment

Choose a reason for hiding this comment

zou3519 Jul 23, 2021

Choose a reason for hiding this comment

jbschlosser Jul 26, 2021 • edited

Choose a reason for hiding this comment

zou3519 left a comment

Choose a reason for hiding this comment

zou3519 Jul 28, 2021

Choose a reason for hiding this comment

zou3519 Jul 28, 2021

Choose a reason for hiding this comment

jbschlosser Jul 28, 2021

Choose a reason for hiding this comment

zou3519 Jul 28, 2021

Choose a reason for hiding this comment

facebook-github-bot commented Jul 28, 2021

facebook-github-bot commented Jul 28, 2021

facebook-github-bot commented Jul 29, 2021

VitamintK commented Jan 5, 2023

jbschlosser commented Jan 6, 2023 • edited

facebook-github-bot commented Jun 30, 2021 •

edited

jbschlosser Jul 26, 2021 •

edited

jbschlosser commented Jan 6, 2023 •

edited