-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[Static Runtime] Fix bug of creating output aliases in aten::embedding_bag #65516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 7b873d3 (more details on the Dr. CI page):
🕵️ 3 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
a532b6d to
9f01eca
Compare
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
9f01eca to
efd24f9
Compare
efd24f9 to
661f2ab
Compare
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
661f2ab to
686e3f9
Compare
686e3f9 to
a80c2dd
Compare
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
a80c2dd to
b702030
Compare
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
b702030 to
7b9d7a3
Compare
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
7b9d7a3 to
09fa9ae
Compare
09fa9ae to
f04c0d0
Compare
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
f04c0d0 to
adc871d
Compare
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
adc871d to
57aaa0b
Compare
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
5c5baa2 to
107df03
Compare
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
107df03 to
31a344a
Compare
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
31a344a to
d30c9ec
Compare
…g_bag (pytorch#65516) Summary: Pull Request resolved: pytorch#65516 This change fixes a bug that Static Runtime's `aten::embedding_bag` out variant implementation creates aliases in its managed output tensors. Managed output tensors should never be an alias with each other since writing to them can illegally overwrite others' contents unintentionally, and this exact problem was causing the bug at T97393697, causing SR to return wrong return values. This bug is detected in inline_cvr/remote_ro by a DCHECK, `verify_no_memory_overlap` (introduced by D30211705 (pytorch@3fb33b3)), but wasn't found so far since our testing didn't include running the model in the debug mode. Fortunately this bug is not hitting production since the aliases outputs are not used in production. This change fixes the root cause from `_embedding_bag_cpu_impl_out` by replacing alias creation with copying. Note that this change also includes a fundamental change in Static Runtime's unit testing: `testStaticRuntime` exercises the given graph 3 times: 1. profile run 2. run using the profile to allocate managed tensors 3. reuse the managed tensors -- newly added Adding 3 reveals this bug with a new unittest `EmbeddingBagWithManagedOutput`. Test Plan: - Confirmed that the crash experienced by `StaticRuntime.EmbeddingBagWithManagedOutput` disappears with this change (crash paste: P459807248). - Added `StaticRuntime.EmbeddingBagWithManagedOutput` to detect the same problem in the future. Reviewed By: hlu1 Differential Revision: D31104345 fbshipit-source-id: 89eee43111d4979b07a0d9bdd5049664a7d6774b
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
d30c9ec to
d5ba2cd
Compare
|
This pull request was exported from Phabricator. Differential Revision: D31104345 |
d5ba2cd to
7b873d3
Compare
Summary:
This change fixes a bug that Static Runtime's
aten::embedding_bagout variant implementation creates aliases in its managed output tensors.Managed output tensors should never be an alias with each other since writing to them can illegally overwrite others' contents unintentionally, and this exact problem was causing the bug at T97393697, causing SR to return wrong return values.
This bug is detected in inline_cvr/remote_ro by a DCHECK,
verify_no_memory_overlap(introduced by D30211705 (3fb33b3)), but wasn't found so far since our testing didn't include running the model in the debug mode. Fortunately this bug is not hitting production since the aliases outputs are not used in production.This change fixes the root cause from
_embedding_bag_cpu_impl_outby replacing alias creation with copying.Note that this change also includes a fundamental change in Static Runtime's unit testing:
testStaticRuntimeexercises the given graph 3 times:Adding 3 reveals this bug with a new unittest
EmbeddingBagWithManagedOutput.Test Plan:
Confirmed that the crash experienced by
StaticRuntime.EmbeddingBagWithManagedOutputdisappears with this change (crash paste: P459807248).Added
StaticRuntime.EmbeddingBagWithManagedOutputto detect the same problem in the future.Differential Revision: D31104345