all-zero gt_boxes are loaded and cause runtime error #330

gardenbaby · 2020-10-25T03:27:30Z

Hi, team,

I noticed that in dataset.py the following code was added for preventing loading a new frame without gt_boxes.

if len(data_dict['gt_boxes']) == 0:
    new_index = np.random.randint(self.__len__())
    return self.__getitem__(new_index)

But it still happens that sometimes the gt_boxes in batch_dict for certain frames are all zeros before being processed by the model. Some of the operations (such as torch.max()) cannot be accomplished with the situation above.

Here I added the following code in point_rcnn.py at the beginning of forword(),

def forward(self, batch_dict):
    for gt_box in batch_dict['gt_boxes']:
        if gt_box.max() == gt_box.min() == 0:
            pdb.set_trace()
    ...

Everytime it stopped here, I printed batch_dict['gt_boxes'] and would find one of the frames with gt_boxes all zeros as the following.

(Pdb) gt_boxes = batch_dict['gt_boxes']
(Pdb) gt_boxes[0]
tensor([[0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.]], device='cuda:0')

As PointRCNN use PointResidualCoder, if gt_boxes is all-zero, no foreground gt_boxes will be selected and the input of PointResidualCoder will be empty. Then the following assert in encode_torch() will raise RuntimeError as follows.

assert gt_classes.max() <= self.mean_size.shape[0]

RuntimeError: invalid argument 1: cannot perform reduction function max on tensor with no elements because the operation does not have an identity at /opt/conda/conda-bld/pytorch_1587428270644/work/aten/src/THC/generic/THCTensorMathReduce.cu:85

Here I trained the model with CLASS_NAMES: ['Cyclist'].

Is it normal and what could be the possible reason behind?

The text was updated successfully, but these errors were encountered:

gardenbaby · 2020-10-26T01:46:20Z

I reviewed the code and found that the problem is from DataBaseSampler.

As there computes the IoUs of sampled boxes in __call__()(database_sampler.py) and only select boxes which are not overlapped with others, valid_mask(line 188) could be empty.

So when I choose only one class for training, such as Cyclist , even the following constrain in dataset.py (line 127) is satisfied, all the remained boxes could be anything but Cyclist.

 if len(data_dict['gt_boxes']) == 0:
     new_index = np.random.randint(self.__len__())
     return self.__getitem__(new_index)

When it runs to the following code in line 132 (dataset.py), selected will be empty and finally no gt_boxes can be selected.

selected = common_utils.keep_arrays_by_name(data_dict['gt_names'], self.class_names)
data_dict['gt_boxes'] = data_dict['gt_boxes'][selected]
data_dict['gt_names'] = data_dict['gt_names'][selected]

So I changed the code in line 127 (dataset.py) from

if len(data_dict['gt_boxes']) == 0:
    ...

to

gt_boxes_mask = np.array([n in self.class_names for n in data_dict['gt_names']], dtype=np.bool_)
if gt_boxes_mask.sum() == 0:
    ...

It seems that it works.

sshaoshuai · 2020-10-27T17:36:00Z

@gardenbaby , thank you for the information and I will check that.

xjjs · 2020-10-28T13:14:56Z

So when I choose only one class for training, such as Cyclist , even the following constrain in dataset.py (line 127) is satisfied, all the remained boxes could be anything but Cyclist.
 if len(data_dict['gt_boxes']) == 0:
     new_index = np.random.randint(self.__len__())
     return self.__getitem__(new_index)

if you choose only one class for training (Cyclist), then len(data_dict['gt_boxes']) == 0 means no gt_boxes for Cyclist.
However, the bin file in folder 'gt_database' could be empty (0 size), so I think if the original 'gt_boxes' contains no Cyclist, and the sampled bins are empty, then we meet error: assert gt_classes.max() <= self.mean_size.shape[0]

xjjs · 2020-10-29T14:49:16Z

So when I choose only one class for training, such as Cyclist , even the following constrain in dataset.py (line 127) is satisfied, all the remained boxes could be anything but Cyclist.
 if len(data_dict['gt_boxes']) == 0:
     new_index = np.random.randint(self.__len__())
     return self.__getitem__(new_index)
if you choose only one class for training (Cyclist), then len(data_dict['gt_boxes']) == 0 means no gt_boxes for Cyclist.
However, the bin file in folder 'gt_database' could be empty (0 size), so I think if the original 'gt_boxes' contains no Cyclist, and the sampled bins are empty, then we meet error: assert gt_classes.max() <= self.mean_size.shape[0]

Still get the runtime error with the following changement:

        if gt_boxes_mask.sum() == 0:
            new_index = np.random.randint(self.__len__())
            return self.__getitem__(new_index)

        data_dict = self.data_augmentor.forward(
            data_dict={
                **data_dict,
                'gt_boxes_mask': gt_boxes_mask
            }
        )
        # if len(data_dict['gt_boxes']) == 0:
        #     new_index = np.random.randint(self.__len__())
        #     return self.__getitem__(new_index)

MartinHahner · 2020-10-29T23:10:27Z

@gardenbaby
I applied your fix in my codebase and I had no issues with it 👍

gardenbaby · 2020-10-30T02:15:38Z

@xjjs

if you choose only one class for training (Cyclist), then len(data_dict['gt_boxes']) == 0 means no gt_boxes for Cyclist.
However, the bin file in folder 'gt_database' could be empty (0 size), so I think if the original 'gt_boxes' contains no Cyclist, and the sampled bins are empty, then we meet error: assert gt_classes.max() <= self.mean_size.shape[0]

I've checked gt_database and there exist samples for different categories. Later I found it may be caused by samples filtering in DatabaseSampler and the condition to reenable __getitem__() in DatasetTemplate. Please refer to my comment here for details.

Still get the runtime error with the following changement:

        if gt_boxes_mask.sum() == 0:
            new_index = np.random.randint(self.__len__())
            return self.__getitem__(new_index)

        data_dict = self.data_augmentor.forward(
            data_dict={
                **data_dict,
                'gt_boxes_mask': gt_boxes_mask
            }
        )
        # if len(data_dict['gt_boxes']) == 0:
        #     new_index = np.random.randint(self.__len__())
        #     return self.__getitem__(new_index)

Simply comment the last 3 lines may not help, since it's for recursively loading another frame if no gt_boxes item is found in the current one. My solution here only changed the "if" statement.

I replaced

if len(data_dict['gt_boxes']) == 0:

by the following 2 lines.

gt_boxes_mask = np.array([n in self.class_names for n in data_dict['gt_names']], dtype=np.bool_)
if gt_boxes_mask.sum() == 0:

Please note that I added a new gt_boxes_mask here to recheck gt_boxes of the selected category, which is different from the one in line 119 dataset.py.

gardenbaby · 2020-10-30T02:21:36Z

@MartinHahner
Thanks for the recheck.

xjjs · 2020-10-30T08:36:49Z

@gardenbaby

gt_boxes_mask = np.array([n in self.class_names for n in data_dict['gt_names']], dtype=np.bool_)
if gt_boxes_mask.sum() == 0:
Please note that I added a new `gt_boxes_mask` here to recheck `gt_boxes` of the selected category, which is different from the one in line 119 [dataset.py](https://github.com/open-mmlab/OpenPCDet/blob/master/pcdet/datasets/dataset.py).

Thanks for your detailed reply. I have reviewed the code, and found the codes in database_sampler.py (line 196):

    if total_valid_sampled_dict.__len__() > 0:
        data_dict = self.add_sampled_boxes_to_scene(data_dict, sampled_gt_boxes, total_valid_sampled_dict

If the valid_mask (line 188) is empty, then the 'self.add_sampled_boxes_to_scene' will not be called, and the 'gt_boxes_mask' will not be used to update gt_boxes (line 120).

Thus the returned 'data_dict' in dataset.py (line 121) still keep gt_boxes for other categories.

Still, some of the bin files in gt_database folder is empty, which means the corresponding gtbox contains no valid points, and it can be filtered in the generating process.

gardenbaby · 2020-10-30T08:45:06Z

@xjjs
Thanks for the information.

xjjs · 2020-11-03T10:52:52Z

@gardenbaby

With the code changement, fewer runtime error for "assert gt_classes.max() <= self.mean_size.shape[0]" happened. But I got one today, I think the problem may lie in the empty bins in gt_database folder, which contain no points but still can be sampled.

sshaoshuai · 2020-11-04T15:28:03Z

Hi all,

This bug has been fixed in #340.

Actually I moved the empty check to the end of the prepare_data function since data_processor could also modify the gt_boxes.

The error in PointResidualCoder is another bug and I also fixed it in this PR.

Note that both of these two errors don't affect the performance.

Thank you all for the bug information.

sshaoshuai added the enhancement New feature or request label Oct 27, 2020

sshaoshuai added the to be closed label Nov 4, 2020

sshaoshuai closed this as completed Nov 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

all-zero gt_boxes are loaded and cause runtime error #330

all-zero gt_boxes are loaded and cause runtime error #330

gardenbaby commented Oct 25, 2020

gardenbaby commented Oct 26, 2020

sshaoshuai commented Oct 27, 2020

xjjs commented Oct 28, 2020

xjjs commented Oct 29, 2020

MartinHahner commented Oct 29, 2020

gardenbaby commented Oct 30, 2020

gardenbaby commented Oct 30, 2020

xjjs commented Oct 30, 2020

gardenbaby commented Oct 30, 2020

xjjs commented Nov 3, 2020

sshaoshuai commented Nov 4, 2020

all-zero gt_boxes are loaded and cause runtime error #330

all-zero gt_boxes are loaded and cause runtime error #330

Comments

gardenbaby commented Oct 25, 2020

gardenbaby commented Oct 26, 2020

sshaoshuai commented Oct 27, 2020

xjjs commented Oct 28, 2020

xjjs commented Oct 29, 2020

MartinHahner commented Oct 29, 2020

gardenbaby commented Oct 30, 2020

gardenbaby commented Oct 30, 2020

xjjs commented Oct 30, 2020

gardenbaby commented Oct 30, 2020

xjjs commented Nov 3, 2020

sshaoshuai commented Nov 4, 2020