Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error occured when I trained the model on my own dataset. #67

Open
Sagiri18 opened this issue Mar 17, 2021 · 3 comments
Open

An error occured when I trained the model on my own dataset. #67

Sagiri18 opened this issue Mar 17, 2021 · 3 comments

Comments

@Sagiri18
Copy link

When I recurrent the R-C3D model to achieve the action recognization task on my own dataset, I had the same problem.

[session 1][epoch  1][iter    1/  30] loss: 2.5127, lr: 1.00e-04
		fg/bg=(5/123), gt_twins: 1, time cost: 53.711857
		rpn_cls: 0.7111, rpn_twin: 0.0049, rcnn_cls: 1.7332, rcnn_twin 0.0635
one step data time: 2.2378
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [0,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [1,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [2,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [3,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [4,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [5,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [6,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [7,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [8,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [9,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [10,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [11,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [12,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [13,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [14,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [15,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
Traceback (most recent call last):
  File "/content/R-C3D.pytorch-pytorch-1.1/trainval_net.py", line 360, in <module>
train_net(tdcnn_demo, dataloader, optimizer, args)
  File "/content/R-C3D.pytorch-pytorch-1.1/trainval_net.py", line 161, in train_net
loss_temp += reduced_loss.item()
RuntimeError: CUDA error: device-side assert triggered

I‘m a rookie, can anyone give me some advice? Thanks a lot!
The environment is:

  1. Python 3.7
  2. CUDA 10.0
  3. Pytorch 1.1.0
  4. Torchvision 0.3.0

By the way, I use Google Colab to do this job. The default version of CUDA is 11.2, but the version I need is 10.0. I noticed that there is CUDA 10.0 under path “usr/loacl” , I pointed the soft connection of "usr/local/CUDA" to "usr/local/CUDA10.0", but When I check the GPU information, it shows CUDA 11.2 which confused me.
GPU

@Sagiri18 Sagiri18 changed the title An error occured when I train the model on my own dataset. An error occured when I trained the model on my own dataset. Mar 17, 2021
@Mary-xl
Copy link

Mary-xl commented Aug 20, 2021

any solution to it? Got the same issue here

@Mary-xl
Copy link

Mary-xl commented Aug 20, 2021

Hey buddy, check detclasslist.txt to see if it starts from 0. I solved the issue by correcting this part.

@BigBuffa1o
Copy link

BigBuffa1o commented Jun 21, 2023

Hey buddy, check detclasslist.txt to see if it starts from 0. I solved the issue by correcting this part.

i meet same problem by start at 1,should it start at 0 or 1?If start at 0 i assmue backgroud is 0?If start at 1 does that means when set num_classes we should add 1?(if we got 8 classes we should write 9 for config instead)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants