Migrates nll_loss_backward from TH to Aten (CUDA) #60299

thomasjpfan · 2021-06-18T20:45:25Z

Fixes #24609
Aten Umbrella issue #24507
Related to #59765

There are no performance differences when running the following benchmark:

Benchmark script

import torch
import torch.nn as nn
import time

torch.manual_seed(0)


def _time():
    torch.cuda.synchronize()
    MS_PER_SECOND = 1000
    return time.perf_counter() * MS_PER_SECOND


device = "cuda"
C = 30
softmax = nn.LogSoftmax(dim=1)
n_runs = 250

for reduction in ["none", "mean", "sum"]:
    for N in [100_000, 500_000, 1_000_000]:
        elapsed = 0
        for i in range(n_runs):
            data = torch.randn(N, C, device=device, requires_grad=True)
            target = torch.empty(N, dtype=torch.long, device=device).random_(0, C)
            loss = nn.NLLLoss(reduction=reduction)
            input = softmax(data)
            result = loss(input, target)

            if reduction == "none":
                gradient = torch.randn(N, device=device)
            else:
                gradient = torch.randn(1, device=device).squeeze()

            t1 = _time()
            result.backward(gradient)
            t2 = _time()
            elapsed = elapsed + (t2 - t1)
        elapsed_avg = elapsed / n_runs
        print(
            f"input size({N}, {C}), reduction: {reduction} "
            f"elapsed time is {elapsed_avg:.2f} (ms)"
        )
    print()

master

input size(100000, 30), reduction: none elapsed time is 0.19 (ms)
input size(500000, 30), reduction: none elapsed time is 0.83 (ms)
input size(1000000, 30), reduction: none elapsed time is 1.66 (ms)

input size(100000, 30), reduction: mean elapsed time is 1.50 (ms)
input size(500000, 30), reduction: mean elapsed time is 7.19 (ms)
input size(1000000, 30), reduction: mean elapsed time is 14.35 (ms)

input size(100000, 30), reduction: sum elapsed time is 1.49 (ms)
input size(500000, 30), reduction: sum elapsed time is 7.17 (ms)
input size(1000000, 30), reduction: sum elapsed time is 14.21 (ms)

this PR

input size(100000, 30), reduction: none elapsed time is 0.19 (ms)
input size(500000, 30), reduction: none elapsed time is 0.83 (ms)
input size(1000000, 30), reduction: none elapsed time is 1.66 (ms)

input size(100000, 30), reduction: mean elapsed time is 1.48 (ms)
input size(500000, 30), reduction: mean elapsed time is 7.16 (ms)
input size(1000000, 30), reduction: mean elapsed time is 14.29 (ms)

input size(100000, 30), reduction: sum elapsed time is 1.49 (ms)
input size(500000, 30), reduction: sum elapsed time is 7.15 (ms)
input size(1000000, 30), reduction: sum elapsed time is 14.18 (ms)

facebook-github-bot · 2021-06-18T20:45:31Z

💊 CI failures summary and remediations

As of commit 3eccb41 (more details on the Dr. CI page and at hud.pytorch.org/pr/60299):

4/4 failures possibly* introduced in this PR
- 2/4 non-scanned failure(s)

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_bionic_rocm3_9_py3_6_build (1/2)

Step: "Spin up environment" (full log | diagnosis details | 🔁 rerun)

Waiting for a VM assignment: .......................................................................

Build-agent version 1.0.74137-e7d5cf4b (2021-06-21T13:20:20+0000)
Creating a dedicated VM with ubuntu-2004:202104-01 image
Waiting for a VM assignment: ............................................................................................................................................................................................................................................................................................................

We timed out preparing a VM for this build, potentially due to our infrastructure or cloud provider.  Please retry the build in a few minutes

Unexpected capacity error: error caused by capacity

pytorch_macos_10_13_py3_test (2/2)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jun 21 22:55:44 ERROR [0.004s]: test_poisson_sample (__main__.TestDistributions)

Jun 21 22:55:44   File "distributions/test_distributions.py", line 805, in _check_sampler_discrete
Jun 21 22:55:44     chisq, p = scipy.stats.chisquare(counts[msk], pmf[msk] * num_samples)
Jun 21 22:55:44   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6853, in chisquare
Jun 21 22:55:44     lambda_="pearson")
Jun 21 22:55:44   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6694, in power_divergence
Jun 21 22:55:44     raise ValueError(msg)
Jun 21 22:55:44 ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:
Jun 21 22:55:44 0.008265582255680495
Jun 21 22:55:44 
Jun 21 22:55:44 ======================================================================
Jun 21 22:55:44 ERROR [0.004s]: test_poisson_sample (__main__.TestDistributions)
Jun 21 22:55:44 ----------------------------------------------------------------------
Jun 21 22:55:44 Traceback (most recent call last):
Jun 21 22:55:44   File "distributions/test_distributions.py", line 1333, in test_poisson_sample
Jun 21 22:55:44     failure_rate=1e-3)
Jun 21 22:55:44   File "distributions/test_distributions.py", line 805, in _check_sampler_discrete
Jun 21 22:55:44     chisq, p = scipy.stats.chisquare(counts[msk], pmf[msk] * num_samples)
Jun 21 22:55:44   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6853, in chisquare
Jun 21 22:55:44     lambda_="pearson")
Jun 21 22:55:44   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6694, in power_divergence
Jun 21 22:55:44     raise ValueError(msg)

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm4.2-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ngimel

This looks good, thank you!

facebook-github-bot · 2021-06-22T05:12:02Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-06-22T19:05:41Z

@ngimel merged this pull request in 99ca2c5.

Summary: Addresses a part of #59765 This PR adds byte support for nll_loss on the CPU for `input.dim() == 2`. CUDA support will be implemented when `nll_loss` migration to CUDA is completed in #60299 and #60097 Pull Request resolved: #60308 Reviewed By: VitalyFedyunin Differential Revision: D29329458 Pulled By: jbschlosser fbshipit-source-id: d3585c4966030bc61e451f8aa817406a8a3acf47

thomasjpfan added 2 commits June 18, 2021 16:17

ENH Migrations nllloss to a10

e5c6caa

CLN Removes TH references

aff03a6

thomasjpfan added module: nn Related to torch.nn triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jun 18, 2021

thomasjpfan requested a review from ezyang as a code owner June 18, 2021 20:45

facebook-github-bot added the cla signed label Jun 18, 2021

pytorchbot added the open source label Jun 18, 2021

thomasjpfan mentioned this pull request Jun 18, 2021

ENH Adds Byte support for nll_loss (CPU) #60308

Closed

thomasjpfan marked this pull request as draft June 21, 2021 03:10

ENH Use packed tensors

be3a2fc

thomasjpfan marked this pull request as ready for review June 21, 2021 03:19

CLN Target does not require packedaccessor

940773c

thomasjpfan requested a review from ngimel June 21, 2021 15:34

ezyang removed their request for review June 21, 2021 22:16

thomasjpfan added 2 commits June 21, 2021 18:23

CLN Use weight.continious outside of dispatch

46358e5

CLN declare inline

3eccb41

ngimel approved these changes Jun 22, 2021

View reviewed changes

facebook-github-bot closed this in 99ca2c5 Jun 22, 2021

facebook-github-bot added the Merged label Jun 22, 2021

thomasjpfan mentioned this pull request Jun 24, 2021

CLN Remove unneeded code from nll_loss (CUDA) #60651

Closed

ngimel mentioned this pull request Oct 28, 2021

torch.nn.cross_entropy silently incorrect in PyTorch 1.10 on CUDA on non-contiguous inputs #67167

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrates nll_loss_backward from TH to Aten (CUDA) #60299

Migrates nll_loss_backward from TH to Aten (CUDA) #60299

Uh oh!

thomasjpfan commented Jun 18, 2021

Uh oh!

facebook-github-bot commented Jun 18, 2021 •

edited

Loading

Uh oh!

ngimel left a comment

Uh oh!

facebook-github-bot commented Jun 22, 2021

Uh oh!

facebook-github-bot commented Jun 22, 2021

Uh oh!

Uh oh!

Migrates nll_loss_backward from TH to Aten (CUDA) #60299

Migrates nll_loss_backward from TH to Aten (CUDA) #60299

Uh oh!

Conversation

thomasjpfan commented Jun 18, 2021

master

this PR

Uh oh!

facebook-github-bot commented Jun 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 2 new failures recognized by patterns

pytorch_linux_bionic_rocm3_9_py3_6_build (1/2)

pytorch_macos_10_13_py3_test (2/2)

ci.pytorch.org: 1 failed

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jun 22, 2021

Uh oh!

facebook-github-bot commented Jun 22, 2021

Uh oh!

Uh oh!

facebook-github-bot commented Jun 18, 2021 •

edited

Loading