Question on the environment required to run sol-renderer #5

heiwang1997 · 2021-05-21T15:14:33Z

Hi @tovacinni , thanks for this great work and the code release. I am trying to run your C++ renderer and meet the following segmentation fault. Can you guide me on how to solve this issue, at your convenience?

The system is Ubuntu 20.04. I've tried both rtx3090 and 1080 and neither of them works. By the way, the python part works well -- I can run the training and generate the rendered armadillo. The libtorch is downloaded from https://download.pytorch.org/libtorch/cu111/libtorch-cxx11-abi-shared-with-deps-1.8.1%2Bcu111.zip

Here is the error message:

    (nglod) my@ws:~/nglod/sol-renderer/build$ ./sdfRenderer ../../sdf-net/_results/armadillo.npz
    NLOD Demo starting...
    GPU Device 0: "Ampere" with compute capability 8.6
    
    terminate called after throwing an instance of 'c10::Error'
      what():  CUDA error: an illegal memory access was encountered
    Exception raised from nonzero_cuda_out_impl at /pytorch/aten/src/ATen/native/cuda/Indexing.cu:873 (most recent call first):
    frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x69 (0x7f6705badb29 in /home/my/nglod/sol-renderer/third-party/libtorch/lib/libc10.so)
    frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xd2 (0x7f6705baaab2 in /home/my/nglod/sol-renderer/third-party/libtorch/lib/libc10.so)
    frame #2: void at::native::nonzero_cuda_out_impl<bool>(at::Tensor const&, at::Tensor&) + 0xebe (0x7f66a6227c4e in /home/my/nglod/sol-renderer/third-party/libtorch/lib/libtorch_cuda_cu.so)
    frame #3: at::native::nonzero_out_cuda(at::Tensor&, at::Tensor const&) + 0x1eb (0x7f66a6199c5b in /home/my/nglod/sol-renderer/third-party/libtorch/lib/libtorch_cuda_cu.so)
    frame #4: at::native::nonzero_cuda(at::Tensor const&) + 0xea (0x7f66a619a09a in /home/my/nglod/sol-renderer/third-party/libtorch/lib/libtorch_cuda_cu.so)
    frame #5: <unknown function> + 0x2e6a80b (0x7f66a6fd180b in /home/my/nglod/sol-renderer/third-party/libtorch/lib/libtorch_cuda_cu.so)
    frame #6: <unknown function> + 0x2e6a890 (0x7f66a6fd1890 in /home/my/nglod/sol-renderer/third-party/libtorch/lib/libtorch_cuda_cu.so)
    frame #7: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&)> const&, at::Tensor const&) const + 0xe7 (0x7f6692f17c57 in /home/my/nglod/sol-renderer/third-party/libtorch/lib/libtorch_cpu.so)
    frame #8: at::nonzero(at::Tensor const&) + 0x5e (0x7f6692d5338e in /home/my/nglod/sol-renderer/third-party/libtorch/lib/libtorch_cpu.so)
    frame #9: <unknown function> + 0x2f15a3e (0x7f6694791a3e in /home/my/nglod/sol-renderer/third-party/libtorch/lib/libtorch_cpu.so)
    frame #10: <unknown function> + 0x2f15ac0 (0x7f6694791ac0 in /home/my/nglod/sol-renderer/third-party/libtorch/lib/libtorch_cpu.so)
    frame #11: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&)> const&, at::Tensor const&) const + 0xe7 (0x7f6692f17c57 in /home/my/nglod/sol-renderer/third-party/libtorch/lib/libtorch_cpu.so)
    frame #12: at::nonzero(at::Tensor const&) + 0x5e (0x7f6692d5338e in /home/my/nglod/sol-renderer/third-party/libtorch/lib/libtorch_cpu.so)
    frame #13: <unknown function> + 0x4222b (0x555f01cd522b in ./sdfRenderer)
    frame #14: <unknown function> + 0x27750 (0x555f01cba750 in ./sdfRenderer)
    frame #15: <unknown function> + 0x1819a (0x555f01cab19a in ./sdfRenderer)
    frame #16: <unknown function> + 0x20194 (0x7f67060ed194 in /lib/x86_64-linux-gnu/libglut.so.3)
    frame #17: fgEnumWindows + 0x39 (0x7f67060f0c39 in /lib/x86_64-linux-gnu/libglut.so.3)
    frame #18: glutMainLoopEvent + 0x1cd (0x7f67060ed7bd in /lib/x86_64-linux-gnu/libglut.so.3)
    frame #19: glutMainLoop + 0x65 (0x7f67060edff5 in /lib/x86_64-linux-gnu/libglut.so.3)
    frame #20: <unknown function> + 0x18edc (0x555f01cabedc in ./sdfRenderer)
    frame #21: __libc_start_main + 0xf3 (0x7f6617f1a0b3 in /lib/x86_64-linux-gnu/libc.so.6)
    frame #22: <unknown function> + 0x1639e (0x555f01ca939e in ./sdfRenderer)
    
    Aborted (core dumped)

The text was updated successfully, but these errors were encountered:

tovacinni · 2021-05-21T15:18:56Z

Thanks for your interest in our work!

What version of libtorch are you using? The code was tested on 1.7.1, and using a newer version may cause issues (but I haven't actually tried).

heiwang1997 · 2021-05-21T16:00:22Z

I was using 1.8.1. But just now I tried 1.7.1, which can be downloaded from here, but still no luck -- the error is the same 🤔

I saw in the requirements.txt that for the python renderer the pytorch version should be 1.6. Does the version of libtorch and pytorch have to be the same?

tovacinni · 2021-05-21T16:18:45Z

Thanks for trying that out. If you can share with me the NPZ file you generated on Google Drive or something, I can try running it on my side & try to reproduce.

The Python PyTorch version shouldn't matter in theory, since it uses NPZ to bridge between the two and the C++ version uses its own separate PyTorch (libtorch).

heiwang1997 · 2021-05-21T16:29:37Z

Thanks for the fast response! Here is the npz file: https://drive.google.com/file/d/1EcGrddM3kS_IbVVuS8_3zvja6PCswv1i/view?usp=sharing

tovacinni · 2021-05-21T16:38:57Z

I just tried the NPZ and I got the same error too, but still works on the NPZs I have. There might be an issue with the NPZ export in the released code, so I'll take a deeper look at this later today.

heiwang1997 · 2021-05-21T16:47:31Z

Cool! Thanks for your help. Looking forward to your reply.

sixftninja · 2021-11-22T21:22:44Z

Hi @heiwang1997 ,

did you try upgrading PyTorch? I was trying to run nglod on an A4000 gpu and figured that PyTorch 1.6 does not support ampere architecture. Upgrading to latest PyTorch worked.

Sylva-Lin · 2022-12-20T13:43:56Z

Hi@heiwang1997,
I also met these errors; how did you solve this question in the end?

coledea mentioned this issue May 21, 2021

C++ Renderer: Failure to render higher LODs #6

Open

kingcodefish mentioned this issue Oct 1, 2021

mesh2sdf errors #20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on the environment required to run sol-renderer #5

Question on the environment required to run sol-renderer #5

heiwang1997 commented May 21, 2021 •

edited

Loading

tovacinni commented May 21, 2021 •

edited

Loading

heiwang1997 commented May 21, 2021 •

edited

Loading

tovacinni commented May 21, 2021

heiwang1997 commented May 21, 2021

tovacinni commented May 21, 2021

heiwang1997 commented May 21, 2021

sixftninja commented Nov 22, 2021

Sylva-Lin commented Dec 20, 2022

Question on the environment required to run sol-renderer #5

Question on the environment required to run sol-renderer #5

Comments

heiwang1997 commented May 21, 2021 • edited Loading

tovacinni commented May 21, 2021 • edited Loading

heiwang1997 commented May 21, 2021 • edited Loading

tovacinni commented May 21, 2021

heiwang1997 commented May 21, 2021

tovacinni commented May 21, 2021

heiwang1997 commented May 21, 2021

sixftninja commented Nov 22, 2021

Sylva-Lin commented Dec 20, 2022

heiwang1997 commented May 21, 2021 •

edited

Loading

tovacinni commented May 21, 2021 •

edited

Loading

heiwang1997 commented May 21, 2021 •

edited

Loading