[JIT] Ensure offset is a multiple of 4 to fix "Philox" RNG in jitted kernels #50169

mcarilli · 2021-01-06T23:01:49Z

Immediately-upstreamable part of #50148.

This PR fixes what I'm fairly sure is a subtle bug with custom Philox class usage in jitted kernels. Philox constructors in kernels take the cuda rng generator's current offset. The Philox constructor then carries out offset/4 (a uint64_t division) to compute its internal offset in its virtual Philox bitstream of 128-bit chunks. In other words, it assumes the incoming offset is a multiple of 4. But (in current code) that's not guaranteed. For example, the increments used by these eager kernels could easily make offset not divisible by 4.

I figured the easiest fix was to round all incoming increments up to the nearest multiple of 4 in CUDAGeneratorImpl itself.

Another option would be to round the current offset up to the next multiple of 4 at the jit point of use. But that would be a jit-specific offset jump, so jit rng kernels wouldn't have a prayer of being bitwise accurate with eager rng kernels that used non-multiple-of-4 offsets. Restricting the offset to multiples of 4 for everyone at least gives jit rng the chance to match eager rng. (Of course, there are still many other ways the numerics could diverge, like if a jit kernel launches a different number of threads than an eager kernel, or assigns threads to data elements differently.)

facebook-github-bot · 2021-01-06T23:01:58Z

💊 CI failures summary and remediations

As of commit 25cdf69 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 6 times.

codecov · 2021-01-07T02:32:25Z

Codecov Report

Merging #50169 (25cdf69) into master (eef5eb0) will increase coverage by 0.18%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #50169      +/-   ##
==========================================
+ Coverage   80.49%   80.68%   +0.18%     
==========================================
  Files        1900     1900              
  Lines      206254   206254              
==========================================
+ Hits       166018   166409     +391     
+ Misses      40236    39845     -391

ngimel · 2021-01-08T19:27:46Z

Please add the PR description as a note somewhere in the code.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-01-11T19:57:39Z

@ngimel merged this pull request in 271240a.

checked out CUDAGeneratorImpl diff

b48fcd5

facebook-github-bot added the cla signed label Jan 6, 2021

mcarilli requested a review from ngimel January 6, 2021 23:02

pytorchbot added the open source label Jan 6, 2021

H-Huang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 7, 2021

ngimel approved these changes Jan 8, 2021

View reviewed changes

mcarilli added 2 commits January 8, 2021 15:55

comments to explain

df962b2

Rephrase explanation

25cdf69

ngimel approved these changes Jan 9, 2021

View reviewed changes

facebook-github-bot reviewed Jan 9, 2021

View reviewed changes

facebook-github-bot closed this in 271240a Jan 11, 2021

facebook-github-bot added the Merged label Jan 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JIT] Ensure offset is a multiple of 4 to fix "Philox" RNG in jitted kernels #50169

[JIT] Ensure offset is a multiple of 4 to fix "Philox" RNG in jitted kernels #50169

mcarilli commented Jan 6, 2021

facebook-github-bot commented Jan 6, 2021 •

edited

codecov bot commented Jan 7, 2021 •

edited

ngimel commented Jan 8, 2021

facebook-github-bot left a comment

facebook-github-bot commented Jan 11, 2021

[JIT] Ensure offset is a multiple of 4 to fix "Philox" RNG in jitted kernels #50169

[JIT] Ensure offset is a multiple of 4 to fix "Philox" RNG in jitted kernels #50169

Conversation

mcarilli commented Jan 6, 2021

facebook-github-bot commented Jan 6, 2021 • edited

💊 CI failures summary and remediations

codecov bot commented Jan 7, 2021 • edited

Codecov Report

ngimel commented Jan 8, 2021

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jan 11, 2021

facebook-github-bot commented Jan 6, 2021 •

edited

codecov bot commented Jan 7, 2021 •

edited