Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: isDifferentiableType(variable.scalar_type()) INTERNAL ASSERT FAILED when using torch.repeat #7197

Open
ajayvohra2005 opened this issue Jun 5, 2024 · 2 comments

Comments

@ajayvohra2005
Copy link

🐛 Bug

Using torch.repeat leads to runtime error:

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1717594747.366986     228 cuda_executor.cc:1032] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1717594747.369320      16 service.cc:145] XLA service 0x55fb8bed0e60 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1717594747.369368      16 service.cc:153]   StreamExecutor device (0): NVIDIA A10G, Compute Capability 8.6
I0000 00:00:1717594747.369825      16 se_gpu_pjrt_client.cc:853] Using BFC allocator.
I0000 00:00:1717594747.369881      16 gpu_helpers.cc:107] XLA backend allocating 17696931840 bytes on device 0 for BFCAllocator.
I0000 00:00:1717594747.369915      16 gpu_helpers.cc:147] XLA backend will use up to 5898977280 bytes on device 0 for CollectiveBFCAllocator.
I0000 00:00:1717594747.370073      16 cuda_executor.cc:1032] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
Traceback (most recent call last):
  File "/app/torch_xla_issue.py", line 35, in <module>
    loss = custom_loss_module.forward(pred=custom_pred[:, [-1], :], target=custom_target, is_dummy=True)
  File "/app/torch_xla_issue.py", line 22, in forward
    cos_sim = self.custom_cos_similarity_module.forward(x1=pred, x2=target)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/distance.py", line 89, in forward
    return F.cosine_similarity(x1, x2, self.dim, self.eps)
RuntimeError: isDifferentiableType(variable.scalar_type()) INTERNAL ASSERT FAILED at "/src/pytorch/torch/csrc/autograd/functions/utils.h":75, please report a bug to PyTorch. 

To Reproduce

Steps to reproduce the behavior:

Docker Image:

us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.3.0_3.10_cuda_12.1

Python script to reproduce error:

import torch
import torch.nn as nn

import torch_xla.core.xla_model as xm

class CustomLoss(nn.Module):

    def __init__(self, cos_similarity_dim=2):
        super().__init__()

        self.custom_cos_similarity_module = nn.CosineSimilarity(dim=cos_similarity_dim)
        self.custom_mse_loss_module = nn.MSELoss(reduction="none")

    def forward(self, pred: torch.Tensor, target: torch.Tensor, is_dummy: bool):

        if is_dummy:
            pred = pred.repeat((1, target.shape[1], 1)) 
            #pred = pred.expand((-1, target.shape[1], -1)) 
        else:
            assert pred.shape[1] == target.shape[1]

        cos_sim = self.custom_cos_similarity_module.forward(x1=pred, x2=target)
        custom_cos_loss_tensor = 1 - cos_sim
        custom_mse_loss_tensor = self.custom_mse_loss_module.forward(input=pred, target=target)

        return custom_cos_loss_tensor, custom_mse_loss_tensor
    

custom_loss_module = CustomLoss()

device = xm.xla_device()
custom_pred = torch.rand( size=(10, 150, 256), dtype=torch.float, requires_grad=True).to(device)
custom_target = torch.zeros(size=(10, 150, 256), dtype=torch.float, requires_grad=True).to(device)

loss = custom_loss_module.forward(pred=custom_pred[:, [-1], :], target=custom_target, is_dummy=True)
print(loss)

Expected behavior

Script should run without error

Environment

  • Reproducible on XLA backend [CUDA]:
  • torch_xla version: us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.3.0_3.10_cuda_12.1

Additional context

Using torch.expand is a work around.

@JackCaoG
Copy link
Collaborator

JackCaoG commented Jun 5, 2024

@zpcore can you take a look at this one? I suspect you can repo with the CPU as well.

@zpcore
Copy link
Collaborator

zpcore commented Jun 5, 2024

@JackCaoG , yes, the issue can be reproduced with the xla CPU also. Meanwhile, I tried the same code with the master branch, the issue doesn't exist. So the issue only exists on the 2.3 release.

The simplest solution is to use the most latest docker build us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_cuda_12.1_20240605. @ajayvohra2005 , can you try this docker instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants