Training problem #22

AHappyFlyBird · 2021-02-02T14:56:54Z

Excuse me! When I run bash train. sh, the error occurs as follows. How can I solve it?

Using tensorboardX
/usr/local/lib/python3.6/dist-packages/sklearn/utils/linear_assignment_.py:21: DeprecationWarning: The linear_assignment_ module is deprecated in 0.21 and will be removed from 0.23. Use scipy.optimize.linear_sum_assignment instead.
DeprecationWarning)
Fix size testing.
training chunk_sizes: [16, 16]
input h w: 448 800
heads {'hm': 10, 'reg': 2, 'wh': 2, 'dep': 1, 'rot': 8, 'dim': 3, 'amodel_offset': 2, 'dep_sec': 1, 'rot_sec': 8, 'nuscenes_att': 8, 'velocity': 3}
weights {'hm': 1, 'reg': 1, 'wh': 0.1, 'dep': 1, 'rot': 1, 'dim': 1, 'amodel_offset': 1, 'dep_sec': 1, 'rot_sec': 1, 'nuscenes_att': 1, 'velocity': 1}
head conv {'hm': [256], 'reg': [256], 'wh': [256], 'dep': [256], 'rot': [256], 'dim': [256], 'amodel_offset': [256], 'dep_sec': [256, 256, 256], 'rot_sec': [256, 256, 256], 'nuscenes_att': [256, 256, 256], 'velocity': [256, 256, 256]}
Namespace(K=100, amodel_offset_weight=1, arch='dla_34', aug_rot=0, backbone='dla34', batch_size=32, chunk_sizes=[16, 16], custom_dataset_ann_path='', custom_dataset_img_path='', custom_head_convs={'dep_sec': 3, 'rot_sec': 3, 'velocity': 3, 'nuscenes_att': 3}, data_dir='/content/drive/My Drive/CenterFusion/src/lib/../../data', dataset='nuscenes', dataset_version='', debug=0, debug_dir='/content/drive/My Drive/CenterFusion/src/lib/../../exp/ddd/centerfusion/debug', debugger_theme='white', demo='', dense_reg=1, dep_res_weight=1, dep_weight=1, depth_scale=1, dim_weight=1, disable_frustum=False, dla_node='dcn', down_ratio=4, eval=False, eval_n_plots=0, eval_render_curves=False, exp_dir='/content/drive/My Drive/CenterFusion/src/lib/../../exp/ddd', exp_id='centerfusion', fix_res=True, fix_short=-1, flip=0.5, flip_test=False, fp_disturb=0, freeze_backbone=False, frustumExpansionRatio=0.0, gpus=[0, 1], gpus_str='0,1', head_conv={'hm': [256], 'reg': [256], 'wh': [256], 'dep': [256], 'rot': [256], 'dim': [256], 'amodel_offset': [256], 'dep_sec': [256, 256, 256], 'rot_sec': [256, 256, 256], 'nuscenes_att': [256, 256, 256], 'velocity': [256, 256, 256]}, head_kernel=3, heads={'hm': 10, 'reg': 2, 'wh': 2, 'dep': 1, 'rot': 8, 'dim': 3, 'amodel_offset': 2, 'dep_sec': 1, 'rot_sec': 8, 'nuscenes_att': 8, 'velocity': 3}, hm_dist_thresh={0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 1, 6: 1, 7: 1, 8: 0, 9: 0}, hm_disturb=0, hm_hp_weight=1, hm_to_box_ratio=0.3, hm_transparency=0.7, hm_weight=1, hp_weight=1, hungarian=False, ignore_loaded_cats=[], img_format='jpg', input_h=448, input_res=800, input_w=800, iou_thresh=0, keep_res=False, kitti_split='3dop', layers_to_freeze=['base', 'dla_up', 'ida_up'], load_model='../models/centerfusion_e60.pth', load_results='', lost_disturb=0, lr=0.00025, lr_step=[50], ltrb=False, ltrb_amodal=False, ltrb_amodal_weight=0.1, ltrb_weight=0.1, master_batch_size=16, max_age=-1, max_frame_dist=3, max_pc=1000, max_pc_dist=60.0, model_output_list=False, msra_outchannel=256, neck='dlaup', new_thresh=0.3, nms=False, no_color_aug=False, no_pause=False, no_pre_img=False, non_block_test=False, normalize_depth=True, not_cuda_benchmark=False, not_max_crop=False, not_prefetch_test=False, not_rand_crop=True, not_set_cuda_env=False, not_show_bbox=False, not_show_number=False, num_classes=10, num_epochs=60, num_head_conv=1, num_img_channels=3, num_iters=-1, num_resnet_layers=101, num_stacks=1, num_workers=4, nuscenes_att=True, nuscenes_att_weight=1, off_weight=1, optim='adam', out_thresh=-1, output_h=112, output_res=200, output_w=200, pad=31, pc_atts=['x', 'y', 'z', 'dyn_prop', 'id', 'rcs', 'vx', 'vy', 'vx_comp', 'vy_comp', 'is_quality_valid', 'ambig_state', 'x_rms', 'y_rms', 'invalid_state', 'pdh0', 'vx_rms', 'vy_rms'], pc_feat_channels={'pc_dep': 0, 'pc_vx': 1, 'pc_vz': 2}, pc_feat_lvl=['pc_dep', 'pc_vx', 'pc_vz'], pc_roi_method='pillars', pc_z_offset=0.0, pillar_dims=[1.5, 0.2, 0.2], pointcloud=True, pre_hm=False, pre_img=False, pre_thresh=-1, print_iter=0, prior_bias=-4.6, public_det=False, qualitative=False, r_a=250, r_b=5, radar_sweeps=3, reg_loss='l1', reset_hm=False, resize_video=False, resume=False, reuse_hm=False, root_dir='/content/drive/My Drive/CenterFusion/src/lib/../..', rot_weight=1, rotate=0, run_dataset_eval=True, same_aug_pre=False, save_all=False, save_dir='/content/drive/My Drive/CenterFusion/src/lib/../../exp/ddd/centerfusion', save_framerate=30, save_img_suffix='', save_imgs=[], save_point=[20, 40, 50], save_results=False, save_video=False, scale=0, secondary_heads=['velocity', 'nuscenes_att', 'dep_sec', 'rot_sec'], seed=317, shift=0.1, show_track_color=False, show_velocity=False, shuffle_train=True, sigmoid_dep_sec=True, skip_first=-1, sort_det_by_dist=False, tango_color=False, task='ddd', test_dataset='nuscenes', test_focal_length=-1, test_scales=[1.0], track_thresh=0.3, tracking=False, tracking_weight=1, train_split='mini_train', trainval=False, transpose_video=False, use_loaded_results=False, val_intervals=1, val_split='mini_val', velocity=True, velocity_weight=1, video_h=512, video_w=512, vis_gt_bev='', vis_thresh=0.3, warm_start_weights=False, weights={'hm': 1, 'reg': 1, 'wh': 0.1, 'dep': 1, 'rot': 1, 'dim': 1, 'amodel_offset': 1, 'dep_sec': 1, 'rot_sec': 1, 'nuscenes_att': 1, 'velocity': 1}, wh_weight=0.1, zero_pre_hm=False, zero_tracking=False)
cp: target 'Drive/CenterFusion/src/lib/../../exp/ddd/centerfusion/logs_2021-02-02-14-41/' is not a directory
Creating model...
Using node type: (<class 'model.networks.dla.DeformConv'>, <class 'model.networks.dla.DeformConv'>)
Warning: No ImageNet pretrain!!
loaded ../models/centerfusion_e60.pth, epoch 60
Traceback (most recent call last):
File "main.py", line 140, in
main(opt)
File "main.py", line 52, in main
trainer.set_device(opt.gpus, opt.chunk_sizes, opt.device)
File "/content/drive/My Drive/CenterFusion/src/lib/trainer.py", line 141, in set_device
chunk_sizes=chunk_sizes).to(device)
File "/content/drive/My Drive/CenterFusion/src/lib/model/data_parallel.py", line 127, in DataParallel
return torch.nn.DataParallel(module, device_ids, output_device, dim)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 133, in init
_check_balance(self.device_ids)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 19, in _check_balance
dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 19, in
dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 318, in get_device_properties
raise AssertionError("Invalid device id")
AssertionError: Invalid device id

(my version:ubuntun18.04 CUDA=10.1 pytorch=1.2.0 torchvision=0.4.0 python3.6)

mrnabati · 2021-02-02T15:08:31Z

Hi. Looks like you are trying to access a GPU device that does not exist. If you only have one GPU, you need to change the following parameter in both train.sh and test.sh scripts:

export CUDA_VISIBLE_DEVICES=0,1

and also the --gpu parameter in those scripts.

AHappyFlyBird · 2021-02-03T06:48:06Z

Hi. Looks like you are trying to access a GPU device that does not exist. If you only have one GPU, you need to change the following parameter in both train.sh and test.sh scripts:
export CUDA_VISIBLE_DEVICES=0,1
and also the --gpu parameter in those scripts.

I am so happy to see your reply. Thanks for your work. Using your suggestion, I solved this problem, but a new issue apppeared, the error occurs as follows. Looking foward to your reply.

Using tensorboardX
/usr/local/lib/python3.6/dist-packages/sklearn/utils/linear_assignment_.py:21: DeprecationWarning: The linear_assignment_ module is deprecated in 0.21 and will be removed from 0.23. Use scipy.optimize.linear_sum_assignment instead.
DeprecationWarning)
Fix size testing.
training chunk_sizes: [32]
input h w: 448 800
heads {'hm': 10, 'reg': 2, 'wh': 2, 'dep': 1, 'rot': 8, 'dim': 3, 'amodel_offset': 2, 'dep_sec': 1, 'rot_sec': 8, 'nuscenes_att': 8, 'velocity': 3}
weights {'hm': 1, 'reg': 1, 'wh': 0.1, 'dep': 1, 'rot': 1, 'dim': 1, 'amodel_offset': 1, 'dep_sec': 1, 'rot_sec': 1, 'nuscenes_att': 1, 'velocity': 1}
head conv {'hm': [256], 'reg': [256], 'wh': [256], 'dep': [256], 'rot': [256], 'dim': [256], 'amodel_offset': [256], 'dep_sec': [256, 256, 256], 'rot_sec': [256, 256, 256], 'nuscenes_att': [256, 256, 256], 'velocity': [256, 256, 256]}
Namespace(K=100, amodel_offset_weight=1, arch='dla_34', aug_rot=0, backbone='dla34', batch_size=32, chunk_sizes=[32], custom_dataset_ann_path='', custom_dataset_img_path='', custom_head_convs={'dep_sec': 3, 'rot_sec': 3, 'velocity': 3, 'nuscenes_att': 3}, data_dir='/content/drive/MyDrive/CenterFusion/src/lib/../../data', dataset='nuscenes', dataset_version='', debug=0, debug_dir='/content/drive/MyDrive/CenterFusion/src/lib/../../exp/ddd/centerfusion/debug', debugger_theme='white', demo='', dense_reg=1, dep_res_weight=1, dep_weight=1, depth_scale=1, dim_weight=1, disable_frustum=False, dla_node='dcn', down_ratio=4, eval=False, eval_n_plots=0, eval_render_curves=False, exp_dir='/content/drive/MyDrive/CenterFusion/src/lib/../../exp/ddd', exp_id='centerfusion', fix_res=True, fix_short=-1, flip=0.5, flip_test=False, fp_disturb=0, freeze_backbone=False, frustumExpansionRatio=0.0, gpus=[0], gpus_str='0', head_conv={'hm': [256], 'reg': [256], 'wh': [256], 'dep': [256], 'rot': [256], 'dim': [256], 'amodel_offset': [256], 'dep_sec': [256, 256, 256], 'rot_sec': [256, 256, 256], 'nuscenes_att': [256, 256, 256], 'velocity': [256, 256, 256]}, head_kernel=3, heads={'hm': 10, 'reg': 2, 'wh': 2, 'dep': 1, 'rot': 8, 'dim': 3, 'amodel_offset': 2, 'dep_sec': 1, 'rot_sec': 8, 'nuscenes_att': 8, 'velocity': 3}, hm_dist_thresh={0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 1, 6: 1, 7: 1, 8: 0, 9: 0}, hm_disturb=0, hm_hp_weight=1, hm_to_box_ratio=0.3, hm_transparency=0.7, hm_weight=1, hp_weight=1, hungarian=False, ignore_loaded_cats=[], img_format='jpg', input_h=448, input_res=800, input_w=800, iou_thresh=0, keep_res=False, kitti_split='3dop', layers_to_freeze=['base', 'dla_up', 'ida_up'], load_model='../models/centernet_baseline_e170.pth', load_results='', lost_disturb=0, lr=0.00025, lr_step=[50], ltrb=False, ltrb_amodal=False, ltrb_amodal_weight=0.1, ltrb_weight=0.1, master_batch_size=32, max_age=-1, max_frame_dist=3, max_pc=1000, max_pc_dist=60.0, model_output_list=False, msra_outchannel=256, neck='dlaup', new_thresh=0.3, nms=False, no_color_aug=False, no_pause=False, no_pre_img=False, non_block_test=False, normalize_depth=True, not_cuda_benchmark=False, not_max_crop=False, not_prefetch_test=False, not_rand_crop=True, not_set_cuda_env=False, not_show_bbox=False, not_show_number=False, num_classes=10, num_epochs=60, num_head_conv=1, num_img_channels=3, num_iters=-1, num_resnet_layers=101, num_stacks=1, num_workers=4, nuscenes_att=True, nuscenes_att_weight=1, off_weight=1, optim='adam', out_thresh=-1, output_h=112, output_res=200, output_w=200, pad=31, pc_atts=['x', 'y', 'z', 'dyn_prop', 'id', 'rcs', 'vx', 'vy', 'vx_comp', 'vy_comp', 'is_quality_valid', 'ambig_state', 'x_rms', 'y_rms', 'invalid_state', 'pdh0', 'vx_rms', 'vy_rms'], pc_feat_channels={'pc_dep': 0, 'pc_vx': 1, 'pc_vz': 2}, pc_feat_lvl=['pc_dep', 'pc_vx', 'pc_vz'], pc_roi_method='pillars', pc_z_offset=0.0, pillar_dims=[1.5, 0.2, 0.2], pointcloud=True, pre_hm=False, pre_img=False, pre_thresh=-1, print_iter=0, prior_bias=-4.6, public_det=False, qualitative=False, r_a=250, r_b=5, radar_sweeps=3, reg_loss='l1', reset_hm=False, resize_video=False, resume=False, reuse_hm=False, root_dir='/content/drive/MyDrive/CenterFusion/src/lib/../..', rot_weight=1, rotate=0, run_dataset_eval=True, same_aug_pre=False, save_all=False, save_dir='/content/drive/MyDrive/CenterFusion/src/lib/../../exp/ddd/centerfusion', save_framerate=30, save_img_suffix='', save_imgs=[], save_point=[20, 40, 50], save_results=False, save_video=False, scale=0, secondary_heads=['velocity', 'nuscenes_att', 'dep_sec', 'rot_sec'], seed=317, shift=0.1, show_track_color=False, show_velocity=False, shuffle_train=True, sigmoid_dep_sec=True, skip_first=-1, sort_det_by_dist=False, tango_color=False, task='ddd', test_dataset='nuscenes', test_focal_length=-1, test_scales=[1.0], track_thresh=0.3, tracking=False, tracking_weight=1, train_split='mini_train', trainval=False, transpose_video=False, use_loaded_results=False, val_intervals=1, val_split='mini_val', velocity=True, velocity_weight=1, video_h=512, video_w=512, vis_gt_bev='', vis_thresh=0.3, warm_start_weights=False, weights={'hm': 1, 'reg': 1, 'wh': 0.1, 'dep': 1, 'rot': 1, 'dim': 1, 'amodel_offset': 1, 'dep_sec': 1, 'rot_sec': 1, 'nuscenes_att': 1, 'velocity': 1}, wh_weight=0.1, zero_pre_hm=False, zero_tracking=False)
Creating model...
Using node type: (<class 'model.networks.dla.DeformConv'>, <class 'model.networks.dla.DeformConv'>)
Warning: No ImageNet pretrain!!
loaded ../models/centernet_baseline_e170.pth, epoch 28
Skip loading parameter nuscenes_att.0.weight, required shapetorch.Size([256, 67, 3, 3]), loaded shapetorch.Size([256, 64, 3, 3]).
Skip loading parameter nuscenes_att.2.weight, required shapetorch.Size([256, 256, 1, 1]), loaded shapetorch.Size([8, 256, 1, 1]).
Skip loading parameter nuscenes_att.2.bias, required shapetorch.Size([256]), loaded shapetorch.Size([8]).
Skip loading parameter velocity.0.weight, required shapetorch.Size([256, 67, 3, 3]), loaded shapetorch.Size([256, 64, 3, 3]).
Skip loading parameter velocity.2.weight, required shapetorch.Size([256, 256, 1, 1]), loaded shapetorch.Size([3, 256, 1, 1]).
Skip loading parameter velocity.2.bias, required shapetorch.Size([256]), loaded shapetorch.Size([3]).
No param dep_sec.0.weight.
No param dep_sec.0.bias.
No param dep_sec.2.weight.
No param dep_sec.2.bias.
No param dep_sec.4.weight.
No param dep_sec.4.bias.
No param dep_sec.6.weight.
No param dep_sec.6.bias.
No param rot_sec.0.weight.
No param rot_sec.0.bias.
No param rot_sec.2.weight.
No param rot_sec.2.bias.
No param rot_sec.4.weight.
No param rot_sec.4.bias.
No param rot_sec.6.weight.
No param rot_sec.6.bias.
No param nuscenes_att.4.weight.
No param nuscenes_att.4.bias.
No param nuscenes_att.6.weight.
No param nuscenes_att.6.bias.
No param velocity.4.weight.
No param velocity.4.bias.
No param velocity.6.weight.
No param velocity.6.bias.
Setting up validation data...
Dataset version
==> initializing mini_val data from /content/drive/MyDrive/CenterFusion/src/lib/../../data/nuscenes/annotations_3sweeps/mini_val.json,
images from /content/drive/MyDrive/CenterFusion/src/lib/../../data/nuscenes ...
loading annotations into memory...
Done (t=0.99s)
creating index...
index created!
Loaded mini_val 486 samples
Setting up train data...
Dataset version
==> initializing mini_train data from /content/drive/MyDrive/CenterFusion/src/lib/../../data/nuscenes/annotations_3sweeps/mini_train.json,
images from /content/drive/MyDrive/CenterFusion/src/lib/../../data/nuscenes ...
loading annotations into memory...
Done (t=3.87s)
creating index...
index created!
Loaded mini_train 1938 samples
Starting training...
ddd/centerfusionTraceback (most recent call last):
File "main.py", line 140, in
main(opt)
File "main.py", line 84, in main
log_dict_train, _ = trainer.train(epoch, train_loader)
File "/content/drive/MyDrive/CenterFusion/src/lib/trainer.py", line 406, in train
return self.run_epoch('train', epoch, data_loader)
File "/content/drive/MyDrive/CenterFusion/src/lib/trainer.py", line 178, in run_epoch
output, loss, loss_stats = model_with_loss(batch, phase)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, kwargs)
File "/content/drive/MyDrive/CenterFusion/src/lib/trainer.py", line 123, in forward
outputs = self.model(batch['image'], pc_hm=pc_hm, pc_dep=pc_dep, calib=calib)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call**
result = self.forward(*input, kwargs)
File "/content/drive/MyDrive/CenterFusion/src/lib/model/networks/base_model.py", line 91, in forward
feats = self.img2feats(x)
File "/content/drive/MyDrive/CenterFusion/src/lib/model/networks/dla.py", line 622, in img2feats
x = self.dla_up(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call**
result = self.forward(*input, kwargs)
File "/content/drive/MyDrive/CenterFusion/src/lib/model/networks/dla.py", line 572, in forward
ida(layers, len(layers) -i - 2, len(layers))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call**
result = self.forward(*input, kwargs)
File "/content/drive/MyDrive/CenterFusion/src/lib/model/networks/dla.py", line 543, in forward
layers[i] = upsample(project(layers[i]))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call**
result = self.forward(*input, kwargs)
File "/content/drive/MyDrive/CenterFusion/src/lib/model/networks/dla.py", line 516, in forward
x = self.conv(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call**
result = self.forward(*input, kwargs)
File "/content/drive/MyDrive/CenterFusion/src/lib/model/networks/DCNv2/dcn_v2.py", line 128, in forward
self.deformable_groups)
File "/content/drive/MyDrive/CenterFusion/src/lib/model/networks/DCNv2/dcn_v2.py", line 31, in forward
ctx.deformable_groups)
RuntimeError: Not compiled with GPU support (dcn_v2_forward at /content/drive/My Drive/CenterFusion/src/lib/model/networks/DCNv2/src/dcn_v2.h:35)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f726a55b273 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: dcn_v2_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int, int, int, int, int, int, int, int, int) + 0x159 (0x7f7249ae4fd9 in /content/drive/MyDrive/CenterFusion/src/lib/model/networks/DCNv2/_ext.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0x16629 (0x7f7249af2629 in /content/drive/MyDrive/CenterFusion/src/lib/model/networks/DCNv2/_ext.cpython-36m-x86_64-linux-gnu.so)
frame #3: + 0x12a2d (0x7f7249aeea2d in /content/drive/MyDrive/CenterFusion/src/lib/model/networks/DCNv2/_ext.cpython-36m-x86_64-linux-gnu.so)
frame #4: python3() [0x50a4a5]

frame #6: python3() [0x507be4]
frame #7: python3() [0x588c8b]
frame #9: THPFunction_apply(_object, _object) + 0x9df (0x7f72b49f191f in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #10: python3() [0x50a12f]
frame #13: python3() [0x594a01]
frame #16: python3() [0x507be4]
frame #18: python3() [0x594a01]
frame #19: python3() [0x54a971]
frame #21: python3() [0x50a433]
frame #24: python3() [0x594a01]
frame #27: python3() [0x507be4]
frame #29: python3() [0x594a01]
frame #30: python3() [0x54a971]
frame #32: python3() [0x50a433]
frame #35: python3() [0x594a01]
frame #38: python3() [0x507be4]
frame #40: python3() [0x594a01]
frame #41: python3() [0x54a971]
frame #43: python3() [0x50a433]
frame #46: python3() [0x594a01]
frame #49: python3() [0x507be4]
frame #51: python3() [0x594a01]
frame #52: python3() [0x54a971]
frame #54: python3() [0x50a433]
frame #56: python3() [0x5095c8]
frame #57: python3() [0x50a2fd]
frame #59: python3() [0x507be4]
frame #61: python3() [0x594a01]

mrnabati · 2021-02-10T03:34:43Z

Make sure you build the DCNv2 library after installing PyTorch and also PyTorch is installed with GPU support. This error usually happens when DCNv2 is built with a PyTorch without GPU support.

fabrizioschiano · 2021-10-06T18:02:23Z

I have the same error when trying to

bash experiments/test.sh

The error is:

  File "/home/fabrizioschiano/repositories/CenterFusion/src/lib/model/networks/dla.py", line 516, in forward
    x = self.conv(x)
  File "/home/fabrizioschiano/.virtualenvs/centerfusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fabrizioschiano/repositories/CenterFusion/src/lib/model/networks/DCNv2/dcn_v2.py", line 161, in forward
    return dcn_v2_conv(
  File "/home/fabrizioschiano/repositories/CenterFusion/src/lib/model/networks/DCNv2/dcn_v2.py", line 23, in forward
    output = _backend.dcn_v2_forward(
RuntimeError: Not compiled with GPU support

Using tensorboardX
Fix size testing.
training chunk_sizes: [32]
input h w: 448 800
heads {'hm': 10, 'reg': 2, 'wh': 2, 'dep': 1, 'rot': 8, 'dim': 3, 'amodel_offset': 2, 'dep_sec': 1, 'rot_sec': 8, 'nuscenes_att': 8, 'velocity': 3}
weights {'hm': 1, 'reg': 1, 'wh': 0.1, 'dep': 1, 'rot': 1, 'dim': 1, 'amodel_offset': 1, 'dep_sec': 1, 'rot_sec': 1, 'nuscenes_att': 1, 'velocity': 1}
head conv {'hm': [256], 'reg': [256], 'wh': [256], 'dep': [256], 'rot': [256], 'dim': [256], 'amodel_offset': [256], 'dep_sec': [256, 256, 256], 'rot_sec': [256, 256, 256], 'nuscenes_att': [256, 256, 256], 'velocity': [256, 256, 256]}
Namespace(K=100, amodel_offset_weight=1, arch='dla_34', aug_rot=0, backbone='dla34', batch_size=32, chunk_sizes=[32], custom_dataset_ann_path='', custom_dataset_img_path='', custom_head_convs={'dep_sec': 3, 'rot_sec': 3, 'velocity': 3, 'nuscenes_att': 3}, data_dir='/home/fabrizioschiano/repositories/CenterFusion/src/lib/../../data', dataset='nuscenes', dataset_version='', debug=0, debug_dir='/home/fabrizioschiano/repositories/CenterFusion/src/lib/../../exp/ddd/centerfusion/debug', debugger_theme='white', demo='', dense_reg=1, dep_res_weight=1, dep_weight=1, depth_scale=1, dim_weight=1, disable_frustum=False, dla_node='dcn', down_ratio=4, eval=False, eval_n_plots=0, eval_render_curves=False, exp_dir='/home/fabrizioschiano/repositories/CenterFusion/src/lib/../../exp/ddd', exp_id='centerfusion', fix_res=True, fix_short=-1, flip=0.5, flip_test=True, fp_disturb=0, freeze_backbone=False, frustumExpansionRatio=0.0, gpus=[0], gpus_str='0', head_conv={'hm': [256], 'reg': [256], 'wh': [256], 'dep': [256], 'rot': [256], 'dim': [256], 'amodel_offset': [256], 'dep_sec': [256, 256, 256], 'rot_sec': [256, 256, 256], 'nuscenes_att': [256, 256, 256], 'velocity': [256, 256, 256]}, head_kernel=3, heads={'hm': 10, 'reg': 2, 'wh': 2, 'dep': 1, 'rot': 8, 'dim': 3, 'amodel_offset': 2, 'dep_sec': 1, 'rot_sec': 8, 'nuscenes_att': 8, 'velocity': 3}, hm_dist_thresh={0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 1, 6: 1, 7: 1, 8: 0, 9: 0}, hm_disturb=0, hm_hp_weight=1, hm_to_box_ratio=0.3, hm_transparency=0.7, hm_weight=1, hp_weight=1, hungarian=False, ignore_loaded_cats=[], img_format='jpg', input_h=448, input_res=800, input_w=800, iou_thresh=0, keep_res=False, kitti_split='3dop', layers_to_freeze=['base', 'dla_up', 'ida_up'], load_model='../models/centerfusion_e60.pth', load_results='', lost_disturb=0, lr=0.000125, lr_step=[60], ltrb=False, ltrb_amodal=False, ltrb_amodal_weight=0.1, ltrb_weight=0.1, master_batch_size=32, max_age=-1, max_frame_dist=3, max_pc=1000, max_pc_dist=60.0, model_output_list=False, msra_outchannel=256, neck='dlaup', new_thresh=0.3, nms=False, no_color_aug=False, no_pause=False, no_pre_img=False, non_block_test=False, normalize_depth=True, not_cuda_benchmark=False, not_max_crop=False, not_prefetch_test=False, not_rand_crop=False, not_set_cuda_env=False, not_show_bbox=False, not_show_number=False, num_classes=10, num_epochs=70, num_head_conv=1, num_img_channels=3, num_iters=-1, num_resnet_layers=101, num_stacks=1, num_workers=4, nuscenes_att=True, nuscenes_att_weight=1, off_weight=1, optim='adam', out_thresh=-1, output_h=112, output_res=200, output_w=200, pad=31, pc_atts=['x', 'y', 'z', 'dyn_prop', 'id', 'rcs', 'vx', 'vy', 'vx_comp', 'vy_comp', 'is_quality_valid', 'ambig_state', 'x_rms', 'y_rms', 'invalid_state', 'pdh0', 'vx_rms', 'vy_rms'], pc_feat_channels={'pc_dep': 0, 'pc_vx': 1, 'pc_vz': 2}, pc_feat_lvl=['pc_dep', 'pc_vx', 'pc_vz'], pc_roi_method='pillars', pc_z_offset=-0.0, pillar_dims=[1.5, 0.2, 0.2], pointcloud=True, pre_hm=False, pre_img=False, pre_thresh=-1, print_iter=0, prior_bias=-4.6, public_det=False, qualitative=False, r_a=250, r_b=5, radar_sweeps=6, reg_loss='l1', reset_hm=False, resize_video=False, resume=False, reuse_hm=False, root_dir='/home/fabrizioschiano/repositories/CenterFusion/src/lib/../..', rot_weight=1, rotate=0, run_dataset_eval=True, same_aug_pre=False, save_all=False, save_dir='/home/fabrizioschiano/repositories/CenterFusion/src/lib/../../exp/ddd/centerfusion', save_framerate=30, save_img_suffix='', save_imgs=[], save_point=[90], save_results=False, save_video=False, scale=0, secondary_heads=['velocity', 'nuscenes_att', 'dep_sec', 'rot_sec'], seed=317, shift=0, show_track_color=False, show_velocity=False, shuffle_train=False, sigmoid_dep_sec=True, skip_first=-1, sort_det_by_dist=False, tango_color=False, task='ddd', test_dataset='nuscenes', test_focal_length=-1, test_scales=[1.0], track_thresh=0.3, tracking=False, tracking_weight=1, train_split='train', trainval=False, transpose_video=False, use_loaded_results=False, val_intervals=10, val_split='mini_val', velocity=True, velocity_weight=1, video_h=512, video_w=512, vis_gt_bev='', vis_thresh=0.3, warm_start_weights=False, weights={'hm': 1, 'reg': 1, 'wh': 0.1, 'dep': 1, 'rot': 1, 'dim': 1, 'amodel_offset': 1, 'dep_sec': 1, 'rot_sec': 1, 'nuscenes_att': 1, 'velocity': 1}, wh_weight=0.1, zero_pre_hm=False, zero_tracking=False)
Dataset version 
==> initializing mini_val data from /home/fabrizioschiano/repositories/CenterFusion/src/lib/../../data/nuscenes/annotations_6sweeps/mini_val.json, 
 images from /home/fabrizioschiano/repositories/CenterFusion/src/lib/../../data/nuscenes ...
loading annotations into memory...
Done (t=0.83s)
creating index...
index created!
Loaded mini_val 486 samples
Creating model...
Using node type: (<class 'model.networks.dla.DeformConv'>, <class 'model.networks.dla.DeformConv'>)
Warning: No ImageNet pretrain!!
loaded ../models/centerfusion_e60.pth, epoch 60
Traceback (most recent call last):
  File "test.py", line 215, in <module>
    prefetch_test(opt)
  File "test.py", line 125, in prefetch_test
    ret = detector.run(pre_processed_images)
  File "/home/fabrizioschiano/repositories/CenterFusion/src/lib/detector.py", line 118, in run
    output, dets, forward_time = self.process(
  File "/home/fabrizioschiano/repositories/CenterFusion/src/lib/detector.py", line 321, in process
    output = self.model(images, pc_dep=pc_dep, calib=calib)[-1]
  File "/home/fabrizioschiano/.virtualenvs/centerfusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fabrizioschiano/repositories/CenterFusion/src/lib/model/networks/base_model.py", line 91, in forward
    feats = self.img2feats(x)
  File "/home/fabrizioschiano/repositories/CenterFusion/src/lib/model/networks/dla.py", line 622, in img2feats
    x = self.dla_up(x)
  File "/home/fabrizioschiano/.virtualenvs/centerfusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fabrizioschiano/repositories/CenterFusion/src/lib/model/networks/dla.py", line 572, in forward
    ida(layers, len(layers) -i - 2, len(layers))
  File "/home/fabrizioschiano/.virtualenvs/centerfusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fabrizioschiano/repositories/CenterFusion/src/lib/model/networks/dla.py", line 543, in forward
    layers[i] = upsample(project(layers[i]))
  File "/home/fabrizioschiano/.virtualenvs/centerfusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fabrizioschiano/repositories/CenterFusion/src/lib/model/networks/dla.py", line 516, in forward
    x = self.conv(x)
  File "/home/fabrizioschiano/.virtualenvs/centerfusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fabrizioschiano/repositories/CenterFusion/src/lib/model/networks/DCNv2/dcn_v2.py", line 161, in forward
    return dcn_v2_conv(
  File "/home/fabrizioschiano/repositories/CenterFusion/src/lib/model/networks/DCNv2/dcn_v2.py", line 23, in forward
    output = _backend.dcn_v2_forward(
RuntimeError: Not compiled with GPU support

The DCNv2 library seems to be built correctly (I run the make.sh file without errors)

I checked my pytorch installation.

When I check my pytorch version with:

python -c "import torch; print(torch.__version__)"

I get

1.9.1+cu102

Then, if I do:

python -c "import torch; print(torch.cuda.is_available())"

I get:

True

What am I doing wrong?

I will come back here if I find a solution.

fabrizioschiano · 2021-10-07T08:07:48Z

@AHappyFlyBird , how did you solve your problem?

fabrizioschiano · 2021-10-07T10:54:26Z

After some research I understood that the problem was that I actually did not have CUDA installed.

You can find it out by doing:
nvcc –V

If nothing is returned it means that you did not install CUDA

I followed all this:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/

And I installed CUDA with the following official link

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=deb_local

Then I installed the nvidia-development-kit simply with
sudo apt install nvidia-cuda-toolkit

Then you can do:

export CUDA_HOME=/usr/local/cuda-11

(before doing it you should check that this is the folder in which CUDA has been installed on your machine)

Then, I had another problem:

what(): No CUDA GPUs are available

I found out what to do thanks to this issue.

I had to change the line

export CUDA_VISIBLE_DEVICES=1

To

export CUDA_VISIBLE_DEVICES=0

In the test.sh of this repository.

I hope this helps someone else in the same situation.

fabrizioschiano · 2021-10-14T17:35:10Z

@mrnabati @AHappyFlyBird , do you think that is it normal to get at the beginning of the training all the following printed out?

No param dep_sec.0.weight.
No param dep_sec.0.bias.
No param dep_sec.2.weight.
No param dep_sec.2.bias.
No param dep_sec.4.weight.
No param dep_sec.4.bias.
No param dep_sec.6.weight.
No param dep_sec.6.bias.
No param rot_sec.0.weight.
No param rot_sec.0.bias.
No param rot_sec.2.weight.
No param rot_sec.2.bias.
No param rot_sec.4.weight.
No param rot_sec.4.bias.
No param rot_sec.6.weight.
No param rot_sec.6.bias.
No param nuscenes_att.4.weight.
No param nuscenes_att.4.bias.
No param nuscenes_att.6.weight.
No param nuscenes_att.6.bias.
No param velocity.4.weight.
No param velocity.4.bias.
No param velocity.6.weight.
No param velocity.6.bias.

AHappyFlyBird closed this as completed Mar 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training problem #22

Training problem #22

AHappyFlyBird commented Feb 2, 2021

mrnabati commented Feb 2, 2021

AHappyFlyBird commented Feb 3, 2021

mrnabati commented Feb 10, 2021

fabrizioschiano commented Oct 6, 2021 •

edited

fabrizioschiano commented Oct 7, 2021

fabrizioschiano commented Oct 7, 2021 •

edited

fabrizioschiano commented Oct 14, 2021

Training problem #22

Training problem #22

Comments

AHappyFlyBird commented Feb 2, 2021

mrnabati commented Feb 2, 2021

AHappyFlyBird commented Feb 3, 2021

I am so happy to see your reply. Thanks for your work. Using your suggestion, I solved this problem, but a new issue apppeared, the error occurs as follows. Looking foward to your reply.

mrnabati commented Feb 10, 2021

fabrizioschiano commented Oct 6, 2021 • edited

fabrizioschiano commented Oct 7, 2021

fabrizioschiano commented Oct 7, 2021 • edited

fabrizioschiano commented Oct 14, 2021

fabrizioschiano commented Oct 6, 2021 •

edited

fabrizioschiano commented Oct 7, 2021 •

edited