Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CTC loss for zero-length targets on GPU #23298

Closed
wants to merge 5 commits into from
Closed

Conversation

t-vi
Copy link
Collaborator

@t-vi t-vi commented Jul 24, 2019

Fixes: #18215 at last!

Also sprinkle tests...

@pytorchbot pytorchbot added module: autograd Related to torch.autograd, and the autograd engine in general module: cuda Related to torch.cuda, and CUDA support in general module: nn Related to torch.nn module: operators labels Jul 24, 2019
@ifedan ifedan requested a review from gchanan July 24, 2019 20:47
@ifedan ifedan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 24, 2019
test/test_nn.py Outdated Show resolved Hide resolved
__device__ static inline int64_t get_target_prime(const target_t* __restrict__ target, int64_t offset, int64_t stride, int64_t idx, int64_t BLANK) {
if (idx % 2 == 0) {
template <typename target_t>
__device__ static inline int64_t get_target_prime(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is me being ignorant about how CTCLoss works :) For my education, what exactly does get_target_prime do? E.g., what is it called in the paper?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've adjusted this function to handle target_length == 0 but in many of the call sites, there is already a condition that implies that if you get to this function, target length is nonzero. Does this hold in all of the call sites? I guess I'll go check now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in backwards, there are a few cases when you will get here when target_length == 0.

Copy link
Collaborator Author

@t-vi t-vi Jul 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the comment above the function // this ad-hoc converts from targets (l in [1]) to augmented targets (l' in [1]) note that no bound-checking is done isn't all that great - maybe amending it with when l is the targets, l' is BLANK l_0 BLANK l_1 ... l_targetlen BLANK helps?

I'll do a bit more analysis if we need the target length condition here. It might well be that it is not called except with idx 0, which would be equally well...

Edit: Turns out that works well.

@ezyang
Copy link
Contributor

ezyang commented Jul 26, 2019

Something that would make me feel more confident about this, is specifically running all of the tests under cuda-memcheck and showing there aren't any memory access problems. Is this something you can do easily?

@ezyang
Copy link
Contributor

ezyang commented Jul 26, 2019

I know this is super goofy and will never happen in practice, but what happens if max_target_length == 0?

target_length,
BLANK);
have_three =
((s < 2 * target_length - 1) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always false when target_length == 0

Copy link
Collaborator Author

@t-vi t-vi Jul 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Edited) I actually changed the condition here to include target_length > 0 in the outer if, this removes the need to check target length in the get_target_prime.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a CTCLoss algorithmic expert, but I did do a reasonable amount of auditing of target_length use sites and all of the adjustments look reasonable.

@ezyang
Copy link
Contributor

ezyang commented Jul 26, 2019

Test failure is real

Jul 26 00:02:24 ======================================================================
Jul 26 00:02:24 FAIL: test_CTCLoss_empty_target_cuda (__main__.TestNN)
Jul 26 00:02:24 ----------------------------------------------------------------------
Jul 26 00:02:24 Traceback (most recent call last):
Jul 26 00:02:24   File "/var/lib/jenkins/workspace/test/common_utils.py", line 456, in wrapper
Jul 26 00:02:24     method(*args, **kwargs)
Jul 26 00:02:24   File "test_nn.py", line 5559, in test_CTCLoss_empty_target_cuda
Jul 26 00:02:24     self._test_CTCLoss_empty_target('cuda')
Jul 26 00:02:24   File "test_nn.py", line 5552, in _test_CTCLoss_empty_target
Jul 26 00:02:24     self.assertAlmostEqual(-log_probs.sum(0)[[0, 2], 0], loss[[0, 2]], delta=3e-5)
Jul 26 00:02:24   File "/var/lib/jenkins/workspace/test/common_utils.py", line 643, in assertAlmostEqual
Jul 26 00:02:24     self.assertEqual(x, y, prec, msg, allow_inf)
Jul 26 00:02:24   File "/var/lib/jenkins/workspace/test/common_utils.py", line 610, in assertEqual
Jul 26 00:02:24     assertTensorsEqual(x, y)
Jul 26 00:02:24   File "/var/lib/jenkins/workspace/test/common_utils.py", line 596, in assertTensorsEqual
Jul 26 00:02:24     self.assertLessEqual(max_err, prec, message)
Jul 26 00:02:24 AssertionError: tensor(3.0518e-05, device='cuda:0', dtype=torch.float32) not less than or equal to 3e-05
Jul 26 00:02:24 

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests need to pass

@t-vi
Copy link
Collaborator Author

t-vi commented Jul 26, 2019

Thank you for the thorough review!

@t-vi
Copy link
Collaborator Author

t-vi commented Jul 26, 2019

I know this is super goofy and will never happen in practice, but what happens if max_target_length == 0?

Then you'll be glad to hear we do test that in test_autograd.py :) The grid setup changes were needed for these cases.

I'll amend the PR for the other comments, thank you!

@t-vi
Copy link
Collaborator Author

t-vi commented Jul 26, 2019

So at least the most basic invocation of cuda memcheck seems to not detect any failures in the tests:

$ PYTHONPATH=./build/lib.linux-x86_64-3.7/ cuda-memcheck python3  test/test_nn.py TestNN.test_CTCLoss_empty_target_{cuda,cpu}
========= CUDA-MEMCHECK
/usr/lib/python3/dist-packages/numba/errors.py:104: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
  warnings.warn(msg)
..
----------------------------------------------------------------------
Ran 2 tests in 0.180s

OK
========= ERROR SUMMARY: 0 errors
$ PYTHONPATH=build/lib.linux-x86_64-3.7/ cuda-memcheck python3 test/test_autograd.py TestAutograd.test_ctc_loss
========= CUDA-MEMCHECK
	/usr/lib/python3/dist-packages/numba/errors.py:104: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
  warnings.warn(msg)
.
----------------------------------------------------------------------
Ran 1 test in 6.869s

OK
========= ERROR SUMMARY: 0 errors

Regarding the tolerance in the failing tests: I previously had this at 3e-5 relative tolerance. Apparently that is not good enough, so I increased to 1e-4. The loss is ~1.5e2, so the relative tol tolerance is ~6e-7. (I added a comment to the test.) The obvious alternative would be to run the check with double precision.

@ezyang
Copy link
Contributor

ezyang commented Jul 26, 2019 via email

@t-vi
Copy link
Collaborator Author

t-vi commented Jul 27, 2019

So I think the remaining failures are spurious.

@t-vi
Copy link
Collaborator Author

t-vi commented Jul 30, 2019

@ezyang: anything I can do to move this forward?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 31, 2019
Summary:
Fixes: pytorch/pytorch#18215 at last!

Also sprinkle tests...
Pull Request resolved: pytorch/pytorch#23298

Differential Revision: D16582145

Pulled By: soumith

fbshipit-source-id: bc8b1a629de0c2606e70a2218ccd135f4a9cdc5d
@facebook-github-bot
Copy link
Contributor

@soumith merged this pull request in 2e40857.

ssnl pushed a commit to ssnl/pytorch that referenced this pull request Aug 2, 2019
Summary:
Fixes: pytorch#18215 at last!

Also sprinkle tests...
Pull Request resolved: pytorch#23298

Differential Revision: D16582145

Pulled By: soumith

fbshipit-source-id: bc8b1a629de0c2606e70a2218ccd135f4a9cdc5d
soumith pushed a commit that referenced this pull request Aug 2, 2019
Summary:
Fixes: #18215 at last!

Also sprinkle tests...
Pull Request resolved: #23298

Differential Revision: D16582145

Pulled By: soumith

fbshipit-source-id: bc8b1a629de0c2606e70a2218ccd135f4a9cdc5d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: autograd Related to torch.autograd, and the autograd engine in general module: cuda Related to torch.cuda, and CUDA support in general module: nn Related to torch.nn open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CTCLoss with empty target doesn't work well
8 participants