Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cuda] Preserve operations order between vectorized and non-vectorized in ln grad input #111488

Closed
wants to merge 29 commits into from

Conversation

valentinandrei
Copy link
Contributor

@valentinandrei valentinandrei commented Oct 18, 2023

The vectorized implementation in #111021 changed the order of arithmetic instructions in layer_norm_grad_input, causing non bitwise identical results when compared to the non-vectorized implementation. At merging, all accuracy checks passed, including internal inductor ones.

There are CI periodic inductor dynamo tests (e.g. pit_b_224) that run eager mode models several times and compare results. If the input buffers are aligned to the vector length, the vectorized implementation will be used. If not, the default one will be used. If the 2 eager runs end up having different buffer alignments, 2 implementations will be called and then the results would be very close but not bitwise identical. The tests check for bitwise identical results and in some cases they may fail.

This fix makes sure that the operation order between non-vectorized and vectorized is the same and the 2 implementations should produce bitwise identical results.

@pytorch-bot pytorch-bot bot added the release notes: cuda release notes category label Oct 18, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 18, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111488

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 2a3da21 with merge base ba2ba96 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@valentinandrei
Copy link
Contributor Author

@pytorchbot label "ciflow/periodic"

@pytorch-bot pytorch-bot bot added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Oct 18, 2023
@valentinandrei
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 19, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
…d in ln grad input (pytorch#111488)

The vectorized implementation in pytorch#111021 changed the order of arithmetic instructions in `layer_norm_grad_input`, causing non bitwise identical results when compared to the non-vectorized implementation. At merging, all accuracy checks passed, including internal inductor ones.

There are CI periodic inductor dynamo tests (e.g. `pit_b_224`) that run eager mode models several times and compare results. If the input buffers are aligned to the vector length, the vectorized implementation will be used. If not, the default one will be used. If the 2 eager runs end up having different buffer alignments, 2 implementations will be called and then the results would be very close but not bitwise identical. The tests check for bitwise identical results and in some cases they may fail.

This fix makes sure that the operation order between non-vectorized and vectorized is the same and the 2 implementations **should** produce bitwise identical results.
Pull Request resolved: pytorch#111488
Approved by: https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: cuda release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants