Channels error when trying to train with 360 degree point cloud data (Argoverse) #44

chayangkultan96 · 2020-04-16T03:11:20Z

Hi Sshaoshuai,

Thanks for the great work! I am currently running into an error trying to train with surround point cloud data. I am using the argoverse dataset that has been converted to the same format as KITTI data, all labels have been put in camera coordinate frames and calibration files have been extracted. I've created a config file and redefined all the mean sizes and anchor points. I also created a dataloader and was able to generate the pkl and groundtruth databases.

I am currently able to train with the dataset only when I limit point cloud range to [0, -40, -3, 70.4, 40, 1] (as provided). When I expand point cloud range to [-80,-80,-10, 80, 80, 10] I get the following error:

I am training using a docker image with 8 V100 GPUs.

Thanks!

UPDATE: I had to increase the number of input features to get it to work, is this the correct approach?

File "train.py", line 155, in
main()
File "train.py", line 148, in main
max_ckpt_save_num=args.max_ckpt_save_num
File "/s/dat/UserFolders/ctan24/PCDet-master/tools/train_utils/train_utils.py", line 81, in train_model
leave_pbar=(cur_epoch + 1 == total_epochs)
File "/s/dat/UserFolders/ctan24/PCDet-master/tools/train_utils/train_utils.py", line 36, in train_one_epoch
loss, tb_dict, disp_dict = model_func(model, batch)
File "/s/dat/UserFolders/ctan24/PCDet-master/pcdet/models/init.py", line 25, in model_func
ret_dict, tb_dict, disp_dict = model(input_dict)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/distributed.py", line 376, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/s/dat/UserFolders/ctan24/PCDet-master/pcdet/models/detectors/PartA2_net.py", line 106, in forward
rpn_ret_dict = self.forward_rpn(**input_dict)
File "/s/dat/UserFolders/ctan24/PCDet-master/pcdet/models/detectors/PartA2_net.py", line 38, in forward_rpn
**{'gt_boxes': kwargs.get('gt_boxes', None)}
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/s/dat/UserFolders/ctan24/PCDet-master/pcdet/models/bbox_heads/rpn_head.py", line 292, in forward
x = self.blocksi
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/s/dat/UserFolders/ctan24/PCDet-master/pcdet/models/model_utils/pytorch_utils.py", line 88, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 338, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 128 256 3 3, expected input[1, 1536, 252, 252] to have 256 channels, but got 1536 channels instead

A separate issue that I ran into when training with limited to image FOV is:
UPDATE: Seems like this issue only occur when I train with multiple GPU, I am able to train successfully with one GPU.

Traceback (most recent call last):
File "train.py", line 155, in
main()
File "train.py", line 148, in main
max_ckpt_save_num=args.max_ckpt_save_num
File "/s/dat/UserFolders/ctan24/PCDet-master/tools/train_utils/train_utils.py", line 81, in train_model
leave_pbar=(cur_epoch + 1 == total_epochs)
File "/s/dat/UserFolders/ctan24/PCDet-master/tools/train_utils/train_utils.py", line 36, in train_one_epoch
loss, tb_dict, disp_dict = model_func(model, batch)
File "/s/dat/UserFolders/ctan24/PCDet-master/pcdet/models/init.py", line 25, in model_func
ret_dict, tb_dict, disp_dict = model(input_dict)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/distributed.py", line 376, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/s/dat/UserFolders/ctan24/PCDet-master/pcdet/models/detectors/PartA2_net.py", line 112, in forward
batch_size, voxel_centers, coords, rpn_ret_dict, input_dict
File "/s/dat/UserFolders/ctan24/PCDet-master/pcdet/models/detectors/PartA2_net.py", line 98, in forward_rcnn
rcnn_ret_dict = self.rcnn_net.forward(rcnn_input_dict)
File "/s/dat/UserFolders/ctan24/PCDet-master/pcdet/models/rcnn/partA2_rcnn_net.py", line 547, in forward
targets_dict = self.assign_targets(batch_size, rcnn_dict)
File "/s/dat/UserFolders/ctan24/PCDet-master/pcdet/models/rcnn/partA2_rcnn_net.py", line 27, in assign_targets
targets_dict = proposal_target_layer(rcnn_dict, roi_sampler_cfg=self.rcnn_target_config)
File "/s/dat/UserFolders/ctan24/PCDet-master/pcdet/models/model_utils/proposal_target_layer.py", line 14, in proposal_target_layer
sample_rois_for_rcnn(rois, gt_boxes, roi_raw_scores, roi_labels, roi_sampler_cfg)
File "/s/dat/UserFolders/ctan24/PCDet-master/pcdet/models/model_utils/proposal_target_layer.py", line 141, in sample_rois_for_rcnn
gt_of_bg_rois = cur_gt[gt_assignment[bg_inds]]
IndexError: index is out of bounds for dimension with size 0

sshaoshuai · 2020-04-17T03:14:02Z

(1) For the training of SECOND/PartA2 configurations, you should make sure your voxelized channels in the height direction shoud match with MODEL.RPN.RPN_HEAD.ARGS['num_input_features'], or you just simply make sure your voxelized height channels should be 40, such as in my default config the channels should be '(1 - (-3)) / 0.1 = 40'.
(2) Is there any scenes that don't have any ground truth boxes in your training data? If so, maybe you should update the dataloader to make sure the input training data has at least one ground truth box.

chayangkultan96 · 2020-04-17T12:44:23Z

(1) For the training of SECOND/PartA2 configurations, you should make sure your voxelized channels in the height direction shoud match with MODEL.RPN.RPN_HEAD.ARGS['num_input_features'], or you just simply make sure your voxelized height channels should be 40, such as in my default config the channels should be '(1 - (-3)) / 0.1 = 40'.
(2) Is there any scenes that don't have any ground truth boxes in your training data? If so, maybe you should update the dataloader to make sure the input training data has at least one ground truth box.

Thanks for your reply. Just a quick follow up question:

(1) Could you elaborate a little more on why 40 is the chosen number and how it eventually mapped to num_input_features which was set to 256 in the default setting? Maybe I'm missing something. How would I go about making sure that they are equal for my setting?

(2) Good point, there are some scenes without ground truth data, I'll make the changes and see how it works.

sshaoshuai · 2020-04-19T09:39:27Z

You could refer to the code here https://github.com/sshaoshuai/PCDet/blob/master/pcdet/models/rpn/rpn_unet.py#L484 for the mapping to the BEV feature channels, which is more clear. Just carefully set the height range and voxel size to make sure it has 40 levels after voxelization.

chayangkultan96 · 2020-04-20T20:42:28Z

Thanks I understand it now. @sshaoshuai Another question not related to this issue. Does other part of the code except for the dataloader depends on KITTI label coordinate? i.e. xyz centers being defined as bottom of object instead of true center. What is the convention for xyz through out the code?

Thanks!

chayangkultan96 · 2020-04-22T23:41:23Z

(1) For the training of SECOND/PartA2 configurations, you should make sure your voxelized channels in the height direction shoud match with MODEL.RPN.RPN_HEAD.ARGS['num_input_features'], or you just simply make sure your voxelized height channels should be 40, such as in my default config the channels should be '(1 - (-3)) / 0.1 = 40'.
(2) Is there any scenes that don't have any ground truth boxes in your training data? If so, maybe you should update the dataloader to make sure the input training data has at least one ground truth box.

Also, I rechecked my dataset, there are no frames that does not have GT data, I'm still occasionally getting this error, any thoughts on this? Thanks!

sshaoshuai · 2020-04-24T11:37:00Z

I do not know why it happens with these information, maybe you could try to catch the bugs and print the variables here.

bugerry87 · 2020-05-15T08:13:44Z

Converting the Argoverse dataset to Kitti format?
Hm... while I try to get through this mess of dicts and pickles.
This data management requires some refactoring.

Hope I can contribute something soon.

chayangkultan96 closed this as completed May 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Channels error when trying to train with 360 degree point cloud data (Argoverse) #44

Channels error when trying to train with 360 degree point cloud data (Argoverse) #44

chayangkultan96 commented Apr 16, 2020 •

edited

Loading

sshaoshuai commented Apr 17, 2020

chayangkultan96 commented Apr 17, 2020 •

edited

Loading

sshaoshuai commented Apr 19, 2020

chayangkultan96 commented Apr 20, 2020 •

edited

Loading

chayangkultan96 commented Apr 22, 2020

sshaoshuai commented Apr 24, 2020

bugerry87 commented May 15, 2020

Channels error when trying to train with 360 degree point cloud data (Argoverse) #44

Channels error when trying to train with 360 degree point cloud data (Argoverse) #44

Comments

chayangkultan96 commented Apr 16, 2020 • edited Loading

sshaoshuai commented Apr 17, 2020

chayangkultan96 commented Apr 17, 2020 • edited Loading

sshaoshuai commented Apr 19, 2020

chayangkultan96 commented Apr 20, 2020 • edited Loading

chayangkultan96 commented Apr 22, 2020

sshaoshuai commented Apr 24, 2020

bugerry87 commented May 15, 2020

chayangkultan96 commented Apr 16, 2020 •

edited

Loading

chayangkultan96 commented Apr 17, 2020 •

edited

Loading

chayangkultan96 commented Apr 20, 2020 •

edited

Loading