Skip to content

Conversation

@CaoE
Copy link
Collaborator

@CaoE CaoE commented Sep 25, 2023

Stack from ghstack (oldest at bottom):

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 25, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109994

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit c2b2fec with merge base a43c283 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

CaoE added a commit that referenced this pull request Sep 25, 2023
ghstack-source-id: 3164a84
Pull Request resolved: #109994
@CaoE CaoE marked this pull request as draft September 25, 2023 06:44
CaoE added a commit that referenced this pull request Sep 26, 2023
ghstack-source-id: 56f1c48
Pull Request resolved: #109994
CaoE added a commit that referenced this pull request Sep 27, 2023
ghstack-source-id: b3bb224
Pull Request resolved: #109994
@CaoE CaoE added ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/inductor ciflow/slow labels Sep 27, 2023
CaoE added a commit that referenced this pull request Oct 7, 2023
ghstack-source-id: 81dcc9a
Pull Request resolved: #109994
CaoE added a commit that referenced this pull request Dec 1, 2023
ghstack-source-id: 8cf3826
Pull Request resolved: #109994
CaoE added a commit that referenced this pull request Dec 4, 2023
ghstack-source-id: 9fba887
Pull Request resolved: #109994
CaoE added a commit that referenced this pull request Jan 29, 2024
ghstack-source-id: f02a56a
Pull Request resolved: #109994
CaoE added a commit that referenced this pull request Jan 30, 2024
ghstack-source-id: 0cb9838
Pull Request resolved: #109994
CaoE added a commit that referenced this pull request Jan 31, 2024
ghstack-source-id: 02fad89
Pull Request resolved: #109994
CaoE added a commit that referenced this pull request Feb 1, 2024
ghstack-source-id: 717129b
Pull Request resolved: #109994
@CaoE CaoE requested a review from ezyang February 1, 2024 01:56
CaoE added a commit that referenced this pull request Feb 1, 2024
ghstack-source-id: 2a83d48
Pull Request resolved: #109994
with self.assertWarns(FutureWarning):
scaler.step(o1)
scaler.step(o2)
scaler.update()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should seriously just consider making a dedicated test file for this though

Copy link
Collaborator Author

@CaoE CaoE Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean making a dedicated test file like test_grad_scaler.py for all GradScaler related tests above ? It's okay for me. I will move such tests to a new file test_grad_scaler.py.

@skipIfTorchDynamo("Failed running call_function for sparse_coo_tensor. See https://github.com/pytorch/pytorch/issues/118856")
@onlyNativeDeviceTypes
@dtypes(torch.float)
def test_grad_scaling_unscale_sparse(self, device, dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any code changes besides changing device? (A diff perhaps?)

Copy link
Collaborator Author

@CaoE CaoE Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, only the device change.
The simple code would also fail (will also fail on CUDA with device="cuda"):
torch._dynamo.exc.TorchRuntimeError: Failed running call_function <built-in method sparse_coo_tensor of type object at 0x7f105afdb9a0>(*(FakeTensor(..., size=(2, 3), dtype=torch.int64), FakeTensor(..., size=(3,)), (2, 3)), **{'device': 'cpu', 'dtype': torch.float32}): The tensor has a non-zero number of elements, but its data is not allocated yet. Caffe2 uses a lazy allocation, so you will need to call mutable_data() or raw_mutable_data() to actually allocate memory.

@onlyNativeDeviceTypes
@dtypes(torch.float)
def test_grad_scaling_unscale_sparse(self, device, dtype):
    i = torch.tensor([[0, 1, 1],
                      [2, 0, 2]], device="cpu", dtype=torch.int64)
    v = torch.tensor([16., 32., 64.], device="cpu", dtype=torch.float)
    s = torch.sparse_coo_tensor(i, v, torch.Size([2, 3]), device="cpu", dtype=torch.float)

@ezyang
Copy link
Contributor

ezyang commented Feb 2, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/mps Run MPS tests (subset of trunk) ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm Trigger "default" config CI on ROCm ciflow/slow ciflow/trunk Trigger trunk jobs on your pull request Merged open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants