Fix CTC loss for zero-length targets on GPU #23298

t-vi · 2019-07-24T09:23:57Z

Fixes: #18215 at last!

Also sprinkle tests...

test/test_nn.py

aten/src/ATen/native/cuda/LossCTC.cu

ezyang · 2019-07-26T15:44:36Z

aten/src/ATen/native/cuda/LossCTC.cu

-__device__ static inline int64_t get_target_prime(const target_t* __restrict__ target, int64_t offset, int64_t stride, int64_t idx, int64_t BLANK) {
-  if (idx % 2 == 0) {
+template <typename target_t>
+__device__ static inline int64_t get_target_prime(


This is me being ignorant about how CTCLoss works :) For my education, what exactly does get_target_prime do? E.g., what is it called in the paper?

You've adjusted this function to handle target_length == 0 but in many of the call sites, there is already a condition that implies that if you get to this function, target length is nonzero. Does this hold in all of the call sites? I guess I'll go check now.

I guess in backwards, there are a few cases when you will get here when target_length == 0.

So the comment above the function // this ad-hoc converts from targets (l in [1]) to augmented targets (l' in [1]) note that no bound-checking is done isn't all that great - maybe amending it with when l is the targets, l' is BLANK l_0 BLANK l_1 ... l_targetlen BLANK helps?

I'll do a bit more analysis if we need the target length condition here. It might well be that it is not called except with idx 0, which would be equally well...

Edit: Turns out that works well.

ezyang · 2019-07-26T15:57:48Z

Something that would make me feel more confident about this, is specifically running all of the tests under cuda-memcheck and showing there aren't any memory access problems. Is this something you can do easily?

ezyang · 2019-07-26T16:00:59Z

I know this is super goofy and will never happen in practice, but what happens if max_target_length == 0?

ezyang · 2019-07-26T16:03:09Z

aten/src/ATen/native/cuda/LossCTC.cu

+          target_length,
+          BLANK);
+      have_three =
+          ((s < 2 * target_length - 1) &&


Always false when target_length == 0

(Edited) I actually changed the condition here to include target_length > 0 in the outer if, this removes the need to check target length in the get_target_prime.

ezyang

I'm not a CTCLoss algorithmic expert, but I did do a reasonable amount of auditing of target_length use sites and all of the adjustments look reasonable.

a

ezyang · 2019-07-26T16:05:34Z

Test failure is real

Jul 26 00:02:24 ======================================================================
Jul 26 00:02:24 FAIL: test_CTCLoss_empty_target_cuda (__main__.TestNN)
Jul 26 00:02:24 ----------------------------------------------------------------------
Jul 26 00:02:24 Traceback (most recent call last):
Jul 26 00:02:24   File "/var/lib/jenkins/workspace/test/common_utils.py", line 456, in wrapper
Jul 26 00:02:24     method(*args, **kwargs)
Jul 26 00:02:24   File "test_nn.py", line 5559, in test_CTCLoss_empty_target_cuda
Jul 26 00:02:24     self._test_CTCLoss_empty_target('cuda')
Jul 26 00:02:24   File "test_nn.py", line 5552, in _test_CTCLoss_empty_target
Jul 26 00:02:24     self.assertAlmostEqual(-log_probs.sum(0)[[0, 2], 0], loss[[0, 2]], delta=3e-5)
Jul 26 00:02:24   File "/var/lib/jenkins/workspace/test/common_utils.py", line 643, in assertAlmostEqual
Jul 26 00:02:24     self.assertEqual(x, y, prec, msg, allow_inf)
Jul 26 00:02:24   File "/var/lib/jenkins/workspace/test/common_utils.py", line 610, in assertEqual
Jul 26 00:02:24     assertTensorsEqual(x, y)
Jul 26 00:02:24   File "/var/lib/jenkins/workspace/test/common_utils.py", line 596, in assertTensorsEqual
Jul 26 00:02:24     self.assertLessEqual(max_err, prec, message)
Jul 26 00:02:24 AssertionError: tensor(3.0518e-05, device='cuda:0', dtype=torch.float32) not less than or equal to 3e-05
Jul 26 00:02:24

ezyang

tests need to pass

t-vi · 2019-07-26T17:16:26Z

Thank you for the thorough review!

t-vi · 2019-07-26T17:18:22Z

I know this is super goofy and will never happen in practice, but what happens if max_target_length == 0?

Then you'll be glad to hear we do test that in test_autograd.py :) The grid setup changes were needed for these cases.

I'll amend the PR for the other comments, thank you!

t-vi · 2019-07-26T19:47:43Z

So at least the most basic invocation of cuda memcheck seems to not detect any failures in the tests:

$ PYTHONPATH=./build/lib.linux-x86_64-3.7/ cuda-memcheck python3  test/test_nn.py TestNN.test_CTCLoss_empty_target_{cuda,cpu}
========= CUDA-MEMCHECK
/usr/lib/python3/dist-packages/numba/errors.py:104: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
  warnings.warn(msg)
..
----------------------------------------------------------------------
Ran 2 tests in 0.180s

OK
========= ERROR SUMMARY: 0 errors
$ PYTHONPATH=build/lib.linux-x86_64-3.7/ cuda-memcheck python3 test/test_autograd.py TestAutograd.test_ctc_loss
========= CUDA-MEMCHECK
	/usr/lib/python3/dist-packages/numba/errors.py:104: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
  warnings.warn(msg)
.
----------------------------------------------------------------------
Ran 1 test in 6.869s

OK
========= ERROR SUMMARY: 0 errors

Regarding the tolerance in the failing tests: I previously had this at 3e-5 relative tolerance. Apparently that is not good enough, so I increased to 1e-4. The loss is ~1.5e2, so the relative tol tolerance is ~6e-7. (I added a comment to the test.) The obvious alternative would be to run the check with double precision.

ezyang · 2019-07-26T20:52:38Z

Either works. We do often run things double precision for this reason, might be a good choice here. Excerpts from Thomas Viehmann's message of 2019-07-26 12:48:52 -0700:

…

So at least the most basic invocation of cuda memcheck seems to not detect any failures in the tests: ``` $ PYTHONPATH=./build/lib.linux-x86_64-3.7/ cuda-memcheck python3 test/test_nn.py TestNN.test_CTCLoss_empty_target_{cuda,cpu} ========= CUDA-MEMCHECK /usr/lib/python3/dist-packages/numba/errors.py:104: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9 warnings.warn(msg) .. ---------------------------------------------------------------------- Ran 2 tests in 0.180s OK ========= ERROR SUMMARY: 0 errors $ PYTHONPATH=build/lib.linux-x86_64-3.7/ cuda-memcheck python3 test/test_autograd.py TestAutograd.test_ctc_loss ========= CUDA-MEMCHECK /usr/lib/python3/dist-packages/numba/errors.py:104: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9 warnings.warn(msg) .. ---------------------------------------------------------------------- Ran 1 test in 6.869s OK ========= ERROR SUMMARY: 0 errors ``` Regarding the tolerance in the failing tests: I previously had this at 3e-5 relative tolerance. Apparently that is not good enough, so I increased to 1e-4. The loss is ~1.5e2, so the relative tol tolerance is ~6e-7. (I added a comment to the test.) The obvious alternative would be to run the check with double precision.

t-vi · 2019-07-27T22:29:51Z

So I think the remaining failures are spurious.

t-vi · 2019-07-30T05:51:35Z

@ezyang: anything I can do to move this forward?

facebook-github-bot

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Fixes: pytorch/pytorch#18215 at last! Also sprinkle tests... Pull Request resolved: pytorch/pytorch#23298 Differential Revision: D16582145 Pulled By: soumith fbshipit-source-id: bc8b1a629de0c2606e70a2218ccd135f4a9cdc5d

facebook-github-bot · 2019-07-31T23:40:53Z

@soumith merged this pull request in 2e40857.

Summary: Fixes: pytorch#18215 at last! Also sprinkle tests... Pull Request resolved: pytorch#23298 Differential Revision: D16582145 Pulled By: soumith fbshipit-source-id: bc8b1a629de0c2606e70a2218ccd135f4a9cdc5d

Summary: Fixes: #18215 at last! Also sprinkle tests... Pull Request resolved: #23298 Differential Revision: D16582145 Pulled By: soumith fbshipit-source-id: bc8b1a629de0c2606e70a2218ccd135f4a9cdc5d

Fix CTC loss for zero-length targets on GPU

f27a581

Fixes: pytorch#18215

pytorchbot added module: autograd Related to torch.autograd, and the autograd engine in general module: cuda Related to torch.cuda, and CUDA support in general module: nn Related to torch.nn module: operators labels Jul 24, 2019

ezyang added the open source label Jul 24, 2019

add skip if cuda not available

bab6f6f

ifedan requested a review from gchanan July 24, 2019 20:47

ifedan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 24, 2019

gchanan previously requested changes Jul 25, 2019

View reviewed changes

test/test_nn.py Outdated Show resolved Hide resolved

Don't print.

5301552

ezyang reviewed Jul 26, 2019

View reviewed changes

aten/src/ATen/native/cuda/LossCTC.cu Outdated Show resolved Hide resolved

ezyang reviewed Jul 26, 2019

View reviewed changes

ezyang approved these changes Jul 26, 2019

View reviewed changes

ezyang requested changes Jul 26, 2019

View reviewed changes

amendments based on @ezyang's feedback, thank you Edward!

682190b

switch ctcloss empty target test to double

5f1215c

soumith approved these changes Jul 31, 2019

View reviewed changes

facebook-github-bot reviewed Jul 31, 2019

View reviewed changes

soumith mentioned this pull request Jul 31, 2019

[v1.2.0] Release Tracker #23555

Closed

facebook-github-bot closed this in 2e40857 Jul 31, 2019

facebook-github-bot added the merged label Jul 31, 2019

ssnl mentioned this pull request Aug 2, 2019

[1.2.0] Fix CTC loss for zero-length targets on GPU (#23298) #23715

Merged

theceday mentioned this pull request Sep 5, 2019

Feature request: tf.nn.ctc_loss lacks the API to handle sequences with all blanks tensorflow/tensorflow#14659

Closed

igormq mentioned this pull request Sep 1, 2020

Pytorch Lightning Integration SeanNaren/deepspeech.pytorch#569

Merged

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CTC loss for zero-length targets on GPU #23298

Fix CTC loss for zero-length targets on GPU #23298

t-vi commented Jul 24, 2019

ezyang Jul 26, 2019

ezyang Jul 26, 2019

ezyang Jul 26, 2019

t-vi Jul 26, 2019 •

edited

Loading

ezyang commented Jul 26, 2019

ezyang commented Jul 26, 2019

ezyang Jul 26, 2019

t-vi Jul 26, 2019 •

edited

Loading

ezyang left a comment

ezyang commented Jul 26, 2019

ezyang left a comment

t-vi commented Jul 26, 2019

t-vi commented Jul 26, 2019 •

edited

Loading

t-vi commented Jul 26, 2019

ezyang commented Jul 26, 2019 via email

t-vi commented Jul 27, 2019

t-vi commented Jul 30, 2019

facebook-github-bot left a comment

facebook-github-bot commented Jul 31, 2019

Fix CTC loss for zero-length targets on GPU #23298

Fix CTC loss for zero-length targets on GPU #23298

Conversation

t-vi commented Jul 24, 2019

ezyang Jul 26, 2019

Choose a reason for hiding this comment

ezyang Jul 26, 2019

Choose a reason for hiding this comment

ezyang Jul 26, 2019

Choose a reason for hiding this comment

t-vi Jul 26, 2019 • edited Loading

Choose a reason for hiding this comment

ezyang commented Jul 26, 2019

ezyang commented Jul 26, 2019

ezyang Jul 26, 2019

Choose a reason for hiding this comment

t-vi Jul 26, 2019 • edited Loading

Choose a reason for hiding this comment

ezyang left a comment

Choose a reason for hiding this comment

ezyang commented Jul 26, 2019

ezyang left a comment

Choose a reason for hiding this comment

t-vi commented Jul 26, 2019

t-vi commented Jul 26, 2019 • edited Loading

t-vi commented Jul 26, 2019

ezyang commented Jul 26, 2019 via email

t-vi commented Jul 27, 2019

t-vi commented Jul 30, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jul 31, 2019

t-vi Jul 26, 2019 •

edited

Loading

t-vi Jul 26, 2019 •

edited

Loading

t-vi commented Jul 26, 2019 •

edited

Loading