CUDA error: device-side assert triggered in sample_points_from_meshes function #117

rahuldey91 · 2020-03-19T23:23:50Z

Hi. First of all, thanks for developing this long-desired tool. Now, coming to the bug.

I just started working with PyTorch3D and was trying the tutorial from here: https://github.com/facebookresearch/pytorch3d/blob/master/docs/tutorials/deform_source_mesh_to_target_mesh.ipynb

I started with my own jupyter notebook to reproduce the code. However, when I tried to visualize the meshes, by calling the plot_pointcloud() function in the tutorial, I came across the following error:
plot_pointcloud(trg_mesh, "Target mesh")

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-39-1e1d27f1793b> in <module>
      3 # print(trg_mesh._N)
      4 # trg_mesh.valid
----> 5 plot_pointcloud(trg_mesh, "Target mesh")
      6 # plot_pointcloud(src_mesh, "Source mesh")

<ipython-input-30-fa31b9ded440> in plot_pointcloud(mesh, title)
      2     # Sample points uniformly from the surface of the mesh
      3     print(mesh)
----> 4     points = sample_points_from_meshes(mesh, 5000)
      5     x, y, z = points.clone().detach().cpu().squeeze().unbind(1)
      6     fig = plt.figure(figsize=(5, 5))

~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/pytorch3d/ops/sample_points_from_meshes.py in sample_points_from_meshes(meshes, num_samples, return_normals)
     39           be filled with 0.
     40     """
---> 41     if meshes.isempty():
     42         raise ValueError("Meshes are empty.")
     43 

~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/pytorch3d/structures/meshes.py in isempty(self)
    430             bool indicating whether there is any data.
    431         """
--> 432         return self._N == 0 or self.valid.eq(False).all()
    433 
    434     def verts_list(self):

RuntimeError: CUDA error: device-side assert triggered

I noticed the error was coming by the member mesh.valid. When I called that member directly from the script, I got similar error.
trg_mesh.valid

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
    700                 type_pprinters=self.type_printers,
    701                 deferred_pprinters=self.deferred_printers)
--> 702             printer.pretty(obj)
    703             printer.flush()
    704             return stream.getvalue()

~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj)
    392                         if cls is not object \
    393                                 and callable(cls.__dict__.get('__repr__')):
--> 394                             return _repr_pprint(obj, self, cycle)
    395 
    396             return _default_pprint(obj, self, cycle)

~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    682     """A pprint that just redirects to the normal repr function."""
    683     # Find newlines and replace them with p.break_()
--> 684     output = repr(obj)
    685     lines = output.splitlines()
    686     with p.group():

~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/torch/tensor.py in __repr__(self)
    157         # characters to replace unicode characters with.
    158         if sys.version_info > (3,):
--> 159             return torch._tensor_str._str(self)
    160         else:
    161             if hasattr(sys.stdout, 'encoding'):

~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/torch/_tensor_str.py in _str(self)
    309                 tensor_str = _tensor_str(self.to_dense(), indent)
    310             else:
--> 311                 tensor_str = _tensor_str(self, indent)
    312 
    313     if self.layout != torch.strided:

~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/torch/_tensor_str.py in _tensor_str(self, indent)
    207     if self.dtype is torch.float16 or self.dtype is torch.bfloat16:
    208         self = self.float()
--> 209     formatter = _Formatter(get_summarized_data(self) if summarize else self)
    210     return _tensor_str_with_formatter(self, indent, formatter, summarize)
    211 

~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/torch/_tensor_str.py in __init__(self, tensor)
     81         if not self.floating_dtype:
     82             for value in tensor_view:
---> 83                 value_str = '{}'.format(value)
     84                 self.max_width = max(self.max_width, len(value_str))
     85 

~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/torch/tensor.py in __format__(self, format_spec)
    407     def __format__(self, format_spec):
    408         if self.dim() == 0:
--> 409             return self.item().__format__(format_spec)
    410         return object.__format__(self, format_spec)
    411 

RuntimeError: CUDA error: device-side assert triggered

My configuration is:
Ubuntu: 18.04
Python: 3.6.10
Pytorch: 1.4.0
Pytorch3D: 0.1.1
CUDA: 10.1

Thanks!

The text was updated successfully, but these errors were encountered:

gkioxari · 2020-03-19T23:48:41Z

Hi @rahuldey91! Thank you for your kind words.

This is issue has been reported before (see #82 and #63) and is likely due to nans in your meshes. Could you print out or check for nans before you execute sampling?
In the meantime, I will add a check at the beginning of mesh sampling which will raise a better error message!

gkioxari · 2020-03-20T03:17:56Z

I added a check that raises an error if non finite values are passed (see 6c48ff6).

rahuldey91 · 2020-03-20T06:08:03Z

Hi @gkioxari! Thanks for your quick response and pointing out related issues. I was trying to check for the presence of nans in the mesh, but I was getting the same error even while calling trg_mesh.verts_list(). Then I noticed that my mesh was in device "cuda:7". I reran the code after changing the device to "cuda:0" and I got the desired output without any errors. Could you help me understand why the data being on a device other than cuda:0 would produce an error?

gkioxari · 2020-03-20T17:58:36Z

This shouldn't create a problem. Note that we use these ops to train on multiple GPUs, e.g. when training Mesh R-CNN models with distributed training on 8 gpus. Is it possible that your data was living on different devices, or that your GPU is corrupt in any way? I can't think of other reasons why it would fail.

rahuldey91 · 2020-03-20T18:16:35Z

Here is my ipynb file to reproduce the error. If you change the device to device = torch.device("cuda:0"), it will run without errors. For any other gpu, it shoots this error.
sphere_to_dolphin.zip

nikhilaravi · 2020-03-20T18:29:52Z

@rahuldey91 are you using one gpu or multiple gpus? If you are using a GPU other than the default (cuda:0) you may need set it explicitly as :

device = torch.device("cuda:7")
torch.cuda.set_device(device)

rahuldey91 · 2020-03-20T21:12:53Z

Oh I see. That resolves the issue. You can go ahead and close it. Thanks.

gkioxari self-assigned this Mar 19, 2020

nikhilaravi added the bug Something isn't working label Mar 19, 2020

gkioxari assigned nikhilaravi Mar 20, 2020

gkioxari closed this as completed Mar 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error: device-side assert triggered in sample_points_from_meshes function #117

CUDA error: device-side assert triggered in sample_points_from_meshes function #117

rahuldey91 commented Mar 19, 2020 •

edited

Loading

gkioxari commented Mar 19, 2020 •

edited

Loading

gkioxari commented Mar 20, 2020

rahuldey91 commented Mar 20, 2020

gkioxari commented Mar 20, 2020

rahuldey91 commented Mar 20, 2020

nikhilaravi commented Mar 20, 2020 •

edited

Loading

rahuldey91 commented Mar 20, 2020

CUDA error: device-side assert triggered in sample_points_from_meshes function #117

CUDA error: device-side assert triggered in sample_points_from_meshes function #117

Comments

rahuldey91 commented Mar 19, 2020 • edited Loading

gkioxari commented Mar 19, 2020 • edited Loading

gkioxari commented Mar 20, 2020

rahuldey91 commented Mar 20, 2020

gkioxari commented Mar 20, 2020

rahuldey91 commented Mar 20, 2020

nikhilaravi commented Mar 20, 2020 • edited Loading

rahuldey91 commented Mar 20, 2020

rahuldey91 commented Mar 19, 2020 •

edited

Loading

gkioxari commented Mar 19, 2020 •

edited

Loading

nikhilaravi commented Mar 20, 2020 •

edited

Loading