You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I have been trying to train a model on Jumping sequence using your code but I have this cuda kernel error:
python train.py --dataset_name monocular --root_dir $ROOT_DIR --img_wh 512 288 --start_end 0 30 --N_samples 128 --N_importance 0 --encode_t --use_viewdir --num_epochs 50 --batch_size 512 --optimizer adam --lr 5e-4 --lr_scheduler cosine --exp_name exp
Global seed set to 42
/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp.py:20: LightningDeprecationWarning: The `pl.plugins.training_type.ddp.DDPPlugin` is deprecated in v1.6 and will be removed in v1.8. Use `pl.strategies.ddp.DDPStrategy` instead.
rank_zero_deprecation(
/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:312: LightningDeprecationWarning: Passing <pytorch_lightning.plugins.training_type.ddp.DDPPlugin object at 0x7f74534b47c0> `strategy` to the `plugins` flag in Trainer has been deprecated in v1.5 and will be removed in v1.7. Use `Trainer(strategy=<pytorch_lightning.plugins.training_type.ddp.DDPPlugin object at 0x7f74534b47c0>)` instead.
rank_zero_deprecation(
/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:96: LightningDeprecationWarning: Setting `Trainer(progress_bar_refresh_rate=1)` is deprecated in v1.5 and will be removed in v1.7. Please pass `pytorch_lightning.callbacks.progress.TQDMProgressBar` with `refresh_rate` directly to the Trainer's `callbacks` argument instead. Or, to disable the progress bar pass `enable_progress_bar = False` to the Trainer.
rank_zero_deprecation(
/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:171: LightningDeprecationWarning: Setting `Trainer(weights_summary=None)` is deprecated in v1.5 and will be removed in v1.7. Please set `Trainer(enable_model_summary=False)` instead.
rank_zero_deprecation(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:154: LightningDeprecationWarning: The `LightningModule.get_progress_bar_dict` method was deprecated in v1.5 and will be removed in v1.7. Please use the `ProgressBarBase.get_metrics` instead.
rank_zero_deprecation(
Global seed set to 42
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------
/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/TensorShape.cpp:2228.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Epoch 0: 1%| | 24/3539 [00:03<09:04, 6.46it/s, loss=0.208, train/col_l=0.0248, train/disp_l=0.0415, train/entropy_l=0.00239, train/cross_entropy_l=0.000, train/flow_fw_l=0.0434, train/flow_bw_l=0.0416, train/pho_l=0.0495, train/
/opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [11,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [11,0,0], thread: [1,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [11,0,0], thread: [2,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [11,0,0], thread: [3,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [11,0,0], thread: [4,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [11,0,0], thread: [5,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [11,0,0], thread: [6,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [11,0,0], thread: [7,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [11,0,0], thread: [8,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [11,0,0], thread: [9,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 719, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1234, in _run
results = self._run_stage()
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1321, in _run_stage
return self._run_train()
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1351, in _run_train
self.fit_loop.run()
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 268, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance
batch_output = self.batch_loop.run(batch, batch_idx)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 203, in advance
result = self._run_optimization(
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization
self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 369, in _optimizer_step
self.trainer._call_lightning_module_hook(
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1593, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1644, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 278, in optimizer_step
optimizer_output = super().optimizer_step(optimizer, opt_idx, closure, model, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step
return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 155, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
return wrapped(*args, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/torch/optim/adam.py", line 100, in step
loss = closure()
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 140, in _wrap_closure
closure_result = closure()
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in __call__
self._result = self.closure(*args, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure
step_output = self._step_fn()
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 427, in _training_step
training_step_output = self.trainer._call_strategy_hook("training_step", *step_kwargs.values())
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1763, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 341, in training_step
return self.model(*args, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 963, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 82, in forward
output = self.module.training_step(*inputs, **kwargs)
File "train.py", line 187, in training_step
loss_d = self.loss(results, batch, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/phong/data/Nvidia inter/Code/nsff_pl/losses.py", line 117, in forward
if valid_geo_fw.any():
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 316, in <module>
main(hparams)
File "train.py", line 300, in main
trainer.fit(system)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 768, in fit
self._call_and_handle_interrupt(
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 736, in _call_and_handle_interrupt
self._teardown()
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1298, in _teardown
self.strategy.teardown()
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 447, in teardown
super().teardown()
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/strategies/parallel.py", line 134, in teardown
super().teardown()
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 444, in teardown
optimizers_to_device(self.optimizers, torch.device("cpu"))
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/utilities/optimizer.py", line 27, in optimizers_to_device
optimizer_to_device(opt, device)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/utilities/optimizer.py", line 33, in optimizer_to_device
optimizer.state[p] = apply_to_collection(v, torch.Tensor, move_data_to_device, device)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 107, in apply_to_collection
v = apply_to_collection(
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 99, in apply_to_collection
return function(data, *args, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 354, in move_data_to_device
return apply_to_collection(batch, dtype=dtype, function=batch_to)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 99, in apply_to_collection
return function(data, *args, **kwargs)
File "/home/phong/miniconda3/envs/deep/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 347, in batch_to
data_output = data.to(device, **kwargs)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
The text was updated successfully, but these errors were encountered:
Hello, I have been trying to train a model on Jumping sequence using your code but I have this cuda kernel error:
The text was updated successfully, but these errors were encountered: