-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train_ngp_nerf_occ.py: RuntimeError: CUDA error: invalid configuration argument #207
Comments
This is weird. May I know your CUDA and pytorch version? python -c "import torch; print(torch.__version__)"
nvcc --version And what's your GPU? nvidia-smi My guess this is related to gpu you are using. |
Hello, I have the same problem! |
Sorry I mean which NVIDIA card are you using, e.g. V100? |
Hello I’m using A100 thanks Regards,Sara RojasOn 25 Apr 2023, at 10:24 PM, Ruilong Li ***@***.***> wrote:
Sorry I mean which NVIDIA card are you using, e.g. V100?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
|
RTX 3090 |
also having problems on A100 with occupancy grids. Rays are in bounding box, but ray indices, starts and ends come back with 0 in the batch dimension. |
I also had this exact issue on my 3090, although it was working fine in another conda environment. Turns out that when I set up the new environment, I forgot to specify the torch version and therefore I was using torch 2.0.0 with cuda 11.7, then tiny-cuda-nn was compiled with this combination. Installing torch version 1.13.0 with pytorch-cuda 11.7 fixed the issue for me. |
I can reproduce this error with torch 2.0.0. With torch 1.13.0 everything seems working fine. I'll come back to this issue once I figure out why this happens. In the mean time, using torch 1.13.0 seems to be a workaround. |
Thanks!
…On Tue, May 2, 2023 at 3:10 AM Ruilong Li(李瑞龙) ***@***.***> wrote:
I can reproduce this error with torch 2.0.0. With torch 1.13.0 everything
seems working fine.
I'll come back to this issue once I figure out why this happens. In the
mean time, using torch 1.13.0 seems to be a workaround.
—
Reply to this email directly, view it on GitHub
<https://urldefense.com/v3/__https://github.com/KAIR-BAIR/nerfacc/issues/207*issuecomment-1530635985__;Iw!!Nmw4Hv0!yg9uH9frG5qkdhyd16cc_UdzvfZsk9eZtN20yBug2ojRTIRroLs1cpcMxHBZop57a0_YtJbPANmU7j3eLMd_euT0o1fH3HPVEYQ$>,
or unsubscribe
<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AWZC2PIOSBGVOQ3GEUT5Q2TXEBGG5ANCNFSM6AAAAAAXJDPSLQ__;!!Nmw4Hv0!yg9uH9frG5qkdhyd16cc_UdzvfZsk9eZtN20yBug2ojRTIRroLs1cpcMxHBZop57a0_YtJbPANmU7j3eLMd_euT0o1fHgCjw0B4$>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
This message and its contents, including attachments are intended solely
for the original recipient. If you are not the intended recipient or have
received this message in error, please notify me immediately and delete
this message from your computer system. Any unauthorized use or
distribution is prohibited. Please consider the environment before printing
this email.
|
Just fixed it on the master branch! |
I also have this problem with torch1.10 and cuda 11.3 |
I have this problem with torch 1.11 and cuda 11.3 while I use NVIDIA 4090. File "/home/xxx/anaconda3/envs/conerf/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/media/.../model/render_utils.py", line 772, in render_sdf_image_with_xxx
weights, _ = render_weight_from_alpha(
File "/home/xxx/anaconda3/envs/conerf/lib/python3.9/site-packages/nerfacc/volrend.py", line 305, in render_weight_from_alpha
trans = render_transmittance_from_alpha(
File "/home/xxx/anaconda3/envs/conerf/lib/python3.9/site-packages/nerfacc/volrend.py", line 201, in render_transmittance_from_alpha
packed_info = pack_info(ray_indices, n_rays)
File "/home/xxx/anaconda3/envs/conerf/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/xxx/anaconda3/envs/conerf/lib/python3.9/site-packages/nerfacc/pack.py", line 43, in pack_info
chunk_cnts = torch.zeros((n_rays,), device=device, dtype=dtype)
RuntimeError: CUDA error: invalid configuration argument |
I'm here to contribute more data points. I encountered the same issue with torch 1.11 and cuda 11.3 while I use NVIDIA 4090. The issue always happens for the render_image() function during validation. Different versions of nerfacc I have tried so far: 0.3.2, 0.3.3, 0.3.4, 0.3.5. |
I've encountered the same problem, and after two days of debugging, I believe I've figured it out. The error is not related to the GPU model nor the CUDA version. This is not even a NerfAcc bug. The error message that I see is similar to the one reported by @AIBluefisher:
But CUDA entered an invalid state before this point, and I added The positions tensor is empty because As a fix, I simply bypassed the NeRF / TorchNGP when the input positions are empty:
Alternatively, it might be worth explore calling |
Hi, I also encountered this issue:
I'm using Pytorch 1.12.1+cuda11.6 and CUDA compilation tools release 11.4. Also, using NVIDIA RTX A6000 |
Same problem in version 0.5.2, volrend.accumulate_along_rays can not handle empty input |
I faced the same issue using torch==1.12.1+cuda11.3 in RTX 4090, but it could be solved by replacing the version with torch 1.13.1+cuda11.6 |
anyone an idea how i can solve this?? i am not a developer but a designer using SVD.. |
Hello,
It may be a easy problem to solve but I have not be able to do it.
When running
python examples/train_ngp_nerf_occ.py --scene lego --data_root path
I get the following error when I evaluate the model (L. 236):
This does not happen when using
train_ngp_nerf_prop.py
Thanks!
The text was updated successfully, but these errors were encountered: