Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train error #86

Closed
njustymk opened this issue Apr 3, 2019 · 3 comments
Closed

train error #86

njustymk opened this issue Apr 3, 2019 · 3 comments

Comments

@njustymk
Copy link

njustymk commented Apr 3, 2019

0%| | 0/500000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 218, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 160, in train
training_loss = nnet.train(**training)
File "/data2/yanmengkai/CornerNet/nnet/py_factory.py", line 82, in train
loss = self.network(xs, ys)
File "/root/anaconda3/envs/CornerNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, kwargs)
File "/data2/yanmengkai/CornerNet/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/data2/yanmengkai/CornerNet/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/data2/yanmengkai/CornerNet/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/data2/yanmengkai/CornerNet/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/data2/yanmengkai/CornerNet/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/data2/yanmengkai/CornerNet/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/data2/yanmengkai/CornerNet/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/root/anaconda3/envs/CornerNet/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/root/anaconda3/envs/CornerNet/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error (10): invalid device ordinal (check_status at /opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36)
frame #0: torch::cuda::scatter(at::Tensor const&, at::ArrayRef, at::optional<std::vector<long, std::allocator > > const&, long, at::optional<std::vector<CUDAStreamInternals
, std::allocator<CUDAStreamInternals
> > > const&) + 0x4e1 (0x7fe834104a11 in /root/anaconda3/envs/CornerNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0xc42bab (0x7fe83410cbab in /root/anaconda3/envs/CornerNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0x38a52b (0x7fe83385452b in /root/anaconda3/envs/CornerNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #3: _PyCFunction_FastCallDict + 0x154 (0x559d63309b94 in python3)
frame #4: + 0x19e67c (0x559d6339967c in python3)
frame #5: _PyEval_EvalFrameDefault + 0x2fa (0x559d633bbcba in python3)
frame #6: + 0x197a94 (0x559d63392a94 in python3)
frame #7: + 0x198941 (0x559d63393941 in python3)
frame #8: + 0x19e755 (0x559d63399755 in python3)
frame #9: _PyEval_EvalFrameDefault + 0x2fa (0x559d633bbcba in python3)
frame #10: PyEval_EvalCodeEx + 0x329 (0x559d63394459 in python3)
frame #11: + 0x19a264 (0x559d63395264 in python3)
frame #12: PyObject_Call + 0x3e (0x559d6330999e in python3)
frame #13: THPFunction_apply(_object
, _object
) + 0x38f (0x7fe833c32bcf in /root/anaconda3/envs/CornerNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #14: _PyCFunction_FastCallDict + 0x91 (0x559d63309ad1 in python3)
frame #15: + 0x19e67c (0x559d6339967c in python3)
frame #16: _PyEval_EvalFrameDefault + 0x2fa (0x559d633bbcba in python3)
frame #17: + 0x197dae (0x559d63392dae in python3)
frame #18: _PyFunction_FastCallDict + 0x1bb (0x559d63393e1b in python3)
frame #19: _PyObject_FastCallDict + 0x26f (0x559d63309f5f in python3)
frame #20: + 0x12a552 (0x559d63325552 in python3)
frame #21: PyIter_Next + 0xe (0x559d6334ec9e in python3)
frame #22: PySequence_Tuple + 0xf9 (0x559d63353ad9 in python3)
frame #23: _PyEval_EvalFrameDefault + 0x563a (0x559d633c0ffa in python3)
frame #24: + 0x197dae (0x559d63392dae in python3)
frame #25: _PyFunction_FastCallDict + 0x1bb (0x559d63393e1b in python3)
frame #26: _PyObject_FastCallDict + 0x26f (0x559d63309f5f in python3)
frame #27: + 0x12a552 (0x559d63325552 in python3)
frame #28: PyIter_Next + 0xe (0x559d6334ec9e in python3)
frame #29: PySequence_Tuple + 0xf9 (0x559d63353ad9 in python3)
frame #30: _PyEval_EvalFrameDefault + 0x563a (0x559d633c0ffa in python3)
frame #31: + 0x197dae (0x559d63392dae in python3)
frame #32: + 0x198941 (0x559d63393941 in python3)
frame #33: + 0x19e755 (0x559d63399755 in python3)
frame #34: _PyEval_EvalFrameDefault + 0x2fa (0x559d633bbcba in python3)
frame #35: + 0x197dae (0x559d63392dae in python3)
frame #36: + 0x198941 (0x559d63393941 in python3)
frame #37: + 0x19e755 (0x559d63399755 in python3)
frame #38: _PyEval_EvalFrameDefault + 0x2fa (0x559d633bbcba in python3)
frame #39: + 0x197a94 (0x559d63392a94 in python3)
frame #40: + 0x198941 (0x559d63393941 in python3)
frame #41: + 0x19e755 (0x559d63399755 in python3)
frame #42: _PyEval_EvalFrameDefault + 0x10ba (0x559d633bca7a in python3)
frame #43: + 0x19870b (0x559d6339370b in python3)
frame #44: + 0x19e755 (0x559d63399755 in python3)
frame #45: _PyEval_EvalFrameDefault + 0x2fa (0x559d633bbcba in python3)
frame #46: + 0x197a94 (0x559d63392a94 in python3)
frame #47: _PyFunction_FastCallDict + 0x3db (0x559d6339403b in python3)
frame #48: _PyObject_FastCallDict + 0x26f (0x559d63309f5f in python3)
frame #49: _PyObject_Call_Prepend + 0x63 (0x559d6330ea03 in python3)
frame #50: PyObject_Call + 0x3e (0x559d6330999e in python3)
frame #51: _PyEval_EvalFrameDefault + 0x1ab0 (0x559d633bd470 in python3)
frame #52: + 0x197a94 (0x559d63392a94 in python3)
frame #53: _PyFunction_FastCallDict + 0x1bb (0x559d63393e1b in python3)
frame #54: _PyObject_FastCallDict + 0x26f (0x559d63309f5f in python3)
frame #55: _PyObject_Call_Prepend + 0x63 (0x559d6330ea03 in python3)
frame #56: PyObject_Call + 0x3e (0x559d6330999e in python3)
frame #57: + 0x16b9b7 (0x559d633669b7 in python3)
frame #58: _PyObject_FastCallDict + 0x8b (0x559d63309d7b in python3)
frame #59: + 0x19e7ce (0x559d633997ce in python3)
frame #60: _PyEval_EvalFrameDefault + 0x2fa (0x559d633bbcba in python3)
frame #61: + 0x197a94 (0x559d63392a94 in python3)
frame #62: _PyFunction_FastCallDict + 0x3db (0x559d6339403b in python3)
frame #63: _PyObject_FastCallDict + 0x26f (0x559d63309f5f in python3)

@njustymk njustymk closed this as completed May 4, 2019
@hejinsome
Copy link

存在同样问题,解决了吗??

@njustymk
Copy link
Author

存在同样问题,解决了吗??

解决了,应该是batch设置的问题,很久以前的问题了

@kszqq
Copy link

kszqq commented Aug 19, 2019

how to solve it ?can you tell me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants