You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1717594747.366986 228 cuda_executor.cc:1032] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1717594747.369320 16 service.cc:145] XLA service 0x55fb8bed0e60 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1717594747.369368 16 service.cc:153] StreamExecutor device (0): NVIDIA A10G, Compute Capability 8.6
I0000 00:00:1717594747.369825 16 se_gpu_pjrt_client.cc:853] Using BFC allocator.
I0000 00:00:1717594747.369881 16 gpu_helpers.cc:107] XLA backend allocating 17696931840 bytes on device 0 for BFCAllocator.
I0000 00:00:1717594747.369915 16 gpu_helpers.cc:147] XLA backend will use up to 5898977280 bytes on device 0 for CollectiveBFCAllocator.
I0000 00:00:1717594747.370073 16 cuda_executor.cc:1032] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
Traceback (most recent call last):
File "/app/torch_xla_issue.py", line 35, in <module>
loss = custom_loss_module.forward(pred=custom_pred[:, [-1], :], target=custom_target, is_dummy=True)
File "/app/torch_xla_issue.py", line 22, in forward
cos_sim = self.custom_cos_similarity_module.forward(x1=pred, x2=target)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/distance.py", line 89, in forward
return F.cosine_similarity(x1, x2, self.dim, self.eps)
RuntimeError: isDifferentiableType(variable.scalar_type()) INTERNAL ASSERT FAILED at "/src/pytorch/torch/csrc/autograd/functions/utils.h":75, please report a bug to PyTorch.
@JackCaoG , yes, the issue can be reproduced with the xla CPU also. Meanwhile, I tried the same code with the master branch, the issue doesn't exist. So the issue only exists on the 2.3 release.
The simplest solution is to use the most latest docker build us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_cuda_12.1_20240605. @ajayvohra2005 , can you try this docker instead?
🐛 Bug
Using
torch.repeat
leads to runtime error:To Reproduce
Steps to reproduce the behavior:
Docker Image:
Python script to reproduce error:
Expected behavior
Script should run without error
Environment
us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.3.0_3.10_cuda_12.1
Additional context
Using
torch.expand
is a work around.The text was updated successfully, but these errors were encountered: