-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sampled_addmm backward: fix incorrect gradient wrt self #103548
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103548
Note: Links to docs will display an error until the docs builds have been completed. ✅ 1 Unrelated FailureAs of commit 4ed23c1: UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: a639df2ec7b5579743f3cb36560716d46bc227b3 Pull Request resolved: #103548
[ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with
alpha * (mat1 @ mat2)
ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case.Under this parametrization we can expect
gradcheck
to succeed with eithermasked=True
ormasked=False
even after #103518 is fixed.cc @alexsamardzic @pearu @cpuhrsch @amjames @bhosmer @ezyang @gchanan @albanD @zou3519 @gqchen @soulitzer @lezcano @Varal7 .
Stack from ghstack (oldest at bottom):
cc @alexsamardzic @pearu @cpuhrsch @amjames @bhosmer @ezyang @gchanan @albanD @zou3519 @gqchen @soulitzer @lezcano @Varal7