CaDDN Detector #538

codyreading · 2021-05-14T16:36:43Z

Summary

CaDDN is a monocular 3D object detection method that estimates categorical depth distributions in order to generate 3D feature representations for 3D object detection. It has been accepted in CVPR 2021 as an oral submission.
Paper: https://arxiv.org/abs/2103.01100
Code: https://github.com/TRAILab/CaDDN

Changes

Updated kitti_dataset.py and dataset.py to support image, depth map, and 2D GT box loading
Added GET_ITEM_LIST to specify which data items to load
Added image data augmentation: random_flip_horizontal
Added CaDDN detector
Added kornia and torchvision requirements
Addded modules:
- DepthFFE: Frustum feature extractor via depth distribution estimation
- DDNDeepLabV3/DDNTemplate`: Estimate depth distributions
- DDNLoss: Loss for DDN
- FrustumToVoxel: Transforms frustum to voxel grid
- FrustumGridGenerator: Generates frustum sampling grid
- Sampler: Samples the frustum grid
- Conv2DCollapse: Collapses voxel grid to BEV via concat. + 1x1 conv.
- Balancer: Loss balancer for foreground/background pixels
- BasicBlock2D: Conv2D + Bn + Relu block
Added functions:
- calib_to_matricies: Generate transformation matricies from calib objects
- calculate_grid_size: Calculate grid_size without VoxelGenerator
- get_pad_params: Get padding parameters for image padding
- bin_depths: Converts depth map into depth bin indices
- normalize_coords: Normalize grid coordinates between [-1, 1]
- compute_fg_mask: Compute foreground pixel mask for images based on 2D GT boxes
- project_to_image: Project 3D points to the image via projection matricies using Pytorch

Results

Car AP@0.70, 0.70, 0.70:
bbox AP:89.9449, 80.0868, 78.7468
bev  AP:34.8573, 25.5907, 24.0973
3d   AP:27.7777, 21.3760, 18.6217
aos  AP:89.06, 78.95, 77.00
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:95.1921, 82.6336, 77.4336
bev  AP:31.6678, 21.5871, 19.4323
3d   AP:23.7724, 16.0700, 13.6146
aos  AP:94.14, 81.31, 75.67
Car AP@0.70, 0.50, 0.50:
bbox AP:89.9449, 80.0868, 78.7468
bev  AP:62.3596, 46.0990, 44.8178
3d   AP:57.9290, 43.5075, 37.7651
aos  AP:89.06, 78.95, 77.00
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:95.1921, 82.6336, 77.4336
bev  AP:62.5936, 46.1427, 42.2161
3d   AP:57.0378, 40.7755, 36.9172
aos  AP:94.14, 81.31, 75.67
Pedestrian AP@0.50, 0.50, 0.50:
bbox AP:47.8124, 40.4024, 37.0082
bev  AP:16.9941, 13.7987, 12.6136
3d   AP:15.4504, 13.0160, 11.8772
aos  AP:35.24, 29.69, 27.19
Pedestrian AP_R40@0.50, 0.50, 0.50:
bbox AP:46.6209, 39.5637, 33.1653
bev  AP:11.8095, 8.9009, 7.0719
3d   AP:10.0425, 7.2711, 5.7442
aos  AP:32.20, 26.81, 22.44
Pedestrian AP@0.50, 0.25, 0.25:
bbox AP:47.8124, 40.4024, 37.0082
bev  AP:33.0742, 27.3150, 22.2747
3d   AP:32.9383, 26.2553, 21.9029
aos  AP:35.24, 29.69, 27.19
Pedestrian AP_R40@0.50, 0.25, 0.25:
bbox AP:46.6209, 39.5637, 33.1653
bev  AP:29.8401, 23.4260, 19.0682
3d   AP:29.4945, 22.8943, 17.8585
aos  AP:32.20, 26.81, 22.44
Cyclist AP@0.50, 0.50, 0.50:
bbox AP:35.4436, 24.0008, 22.9112
bev  AP:11.1946, 9.8259, 9.8259
3d   AP:10.8464, 9.7608, 9.0909
aos  AP:28.58, 19.83, 19.19
Cyclist AP_R40@0.50, 0.50, 0.50:
bbox AP:32.0066, 20.0532, 18.7363
bev  AP:3.0830, 1.7541, 1.5551
3d   AP:2.7691, 1.4875, 1.2074
aos  AP:24.12, 14.54, 13.68
Cyclist AP@0.50, 0.25, 0.25:
bbox AP:35.4436, 24.0008, 22.9112
bev  AP:16.6019, 11.9957, 11.9923
3d   AP:16.3234, 11.8802, 11.9318
aos  AP:28.58, 19.83, 19.19
Cyclist AP_R40@0.50, 0.25, 0.25:
bbox AP:32.0066, 20.0532, 18.7363
bev  AP:12.2843, 6.3071, 5.8288
3d   AP:11.4585, 5.8544, 5.5210
aos  AP:24.12, 14.54, 13.68

dataloading

codyreading · 2021-05-14T16:58:34Z

Tested PointPillar inference to ensure no changes with following command on a Titan XP:
python test.py --cfg_file cfgs/kitti_models/pointpillar.yaml --batch_size 16 --ckpt ../checkpoints/pointpillar_7728.pth

Performance

Master:
Max GPU Memory Usage: 5952 MB
Max Memory Usage: 2802 MB
Total Time: 1:39
Avg Iteration Time: 2.38it/s

feature/CaDDN:
Max GPU Memory Usage: 5952 MB
Max Memory Usage: 2831 MB
Total Time: 1:40
Avg Iteration Time: 2.36 it/s

Results

Master:

Car AP@0.70, 0.70, 0.70:
bbox AP:90.7786, 89.8062, 88.7936
bev  AP:89.6590, 87.1725, 84.3762
3d   AP:86.4617, 77.2839, 74.6530
aos  AP:90.77, 89.61, 88.47
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:95.6607, 92.2403, 91.3167
bev  AP:92.0399, 88.0556, 86.6625
3d   AP:87.7518, 78.3964, 75.1843
aos  AP:95.64, 92.03, 90.97
Car AP@0.70, 0.50, 0.50:
bbox AP:90.7786, 89.8062, 88.7936
bev  AP:90.7894, 90.1848, 89.4635
3d   AP:90.7894, 90.0675, 89.2495
aos  AP:90.77, 89.61, 88.47
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:95.6607, 92.2403, 91.3167
bev  AP:95.6987, 94.7077, 93.9983
3d   AP:95.6874, 94.3709, 93.4244
aos  AP:95.64, 92.03, 90.97
Pedestrian AP@0.50, 0.50, 0.50:
bbox AP:66.5436, 62.4922, 59.3026
bev  AP:61.6348, 56.2747, 52.6007
3d   AP:57.7500, 52.2916, 47.9072
aos  AP:48.63, 45.62, 42.93
Pedestrian AP_R40@0.50, 0.50, 0.50:
bbox AP:66.5852, 62.4351, 58.8016
bev  AP:61.5971, 56.0143, 52.0457
3d   AP:57.3015, 51.4145, 46.8715
aos  AP:45.89, 42.99, 40.03
Pedestrian AP@0.50, 0.25, 0.25:
bbox AP:66.5436, 62.4922, 59.3026
bev  AP:72.5064, 69.5191, 66.4626
3d   AP:72.4368, 69.3244, 65.3180
aos  AP:48.63, 45.62, 42.93
Pedestrian AP_R40@0.50, 0.25, 0.25:
bbox AP:66.5852, 62.4351, 58.8016
bev  AP:73.8776, 70.4969, 66.6494
3d   AP:73.7943, 70.2258, 66.0435
aos  AP:45.89, 42.99, 40.03
Cyclist AP@0.50, 0.50, 0.50:
bbox AP:85.2661, 72.9744, 68.9914
bev  AP:82.2593, 66.1110, 62.5585
3d   AP:80.0483, 62.6080, 59.5260
aos  AP:84.72, 71.09, 67.13
Cyclist AP_R40@0.50, 0.50, 0.50:
bbox AP:88.5723, 74.0385, 69.8009
bev  AP:85.2585, 66.2439, 62.2173
3d   AP:81.5670, 62.8074, 58.8314
aos  AP:87.91, 71.98, 67.81
Cyclist AP@0.50, 0.25, 0.25:
bbox AP:85.2661, 72.9744, 68.9914
bev  AP:86.6035, 70.6055, 66.9244
3d   AP:86.6035, 70.6055, 66.9244
aos  AP:84.72, 71.09, 67.13
Cyclist AP_R40@0.50, 0.25, 0.25:
bbox AP:88.5723, 74.0385, 69.8009
bev  AP:88.8812, 71.7453, 67.7714
3d   AP:88.8812, 71.7453, 67.7714
aos  AP:87.91, 71.98, 67.81

feature/CaDDN:

Car AP@0.70, 0.70, 0.70:
bbox AP:90.7786, 89.8062, 88.7936
bev  AP:89.6590, 87.1725, 84.3762
3d   AP:86.4617, 77.2839, 74.6530
aos  AP:90.77, 89.61, 88.47
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:95.6607, 92.2403, 91.3167
bev  AP:92.0399, 88.0556, 86.6625
3d   AP:87.7518, 78.3964, 75.1843
aos  AP:95.64, 92.03, 90.97
Car AP@0.70, 0.50, 0.50:
bbox AP:90.7786, 89.8062, 88.7936
bev  AP:90.7894, 90.1848, 89.4635
3d   AP:90.7894, 90.0675, 89.2495
aos  AP:90.77, 89.61, 88.47
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:95.6607, 92.2403, 91.3167
bev  AP:95.6987, 94.7077, 93.9983
3d   AP:95.6874, 94.3709, 93.4244
aos  AP:95.64, 92.03, 90.97
Pedestrian AP@0.50, 0.50, 0.50:
bbox AP:66.5436, 62.4922, 59.3026
bev  AP:61.6348, 56.2747, 52.6007
3d   AP:57.7500, 52.2916, 47.9072
aos  AP:48.63, 45.62, 42.93
Pedestrian AP_R40@0.50, 0.50, 0.50:
bbox AP:66.5852, 62.4351, 58.8016
bev  AP:61.5971, 56.0143, 52.0457
3d   AP:57.3015, 51.4145, 46.8715
aos  AP:45.89, 42.99, 40.03
Pedestrian AP@0.50, 0.25, 0.25:
bbox AP:66.5436, 62.4922, 59.3026
bev  AP:72.5064, 69.5191, 66.4626
3d   AP:72.4368, 69.3244, 65.3180
aos  AP:48.63, 45.62, 42.93
Pedestrian AP_R40@0.50, 0.25, 0.25:
bbox AP:66.5852, 62.4351, 58.8016
bev  AP:73.8776, 70.4969, 66.6494
3d   AP:73.7943, 70.2258, 66.0435
aos  AP:45.89, 42.99, 40.03
Cyclist AP@0.50, 0.50, 0.50:
bbox AP:85.2661, 72.9744, 68.9914
bev  AP:82.2593, 66.1110, 62.5585
3d   AP:80.0483, 62.6080, 59.5260
aos  AP:84.72, 71.09, 67.13
Cyclist AP_R40@0.50, 0.50, 0.50:
bbox AP:88.5723, 74.0385, 69.8009
bev  AP:85.2585, 66.2439, 62.2173
3d   AP:81.5670, 62.8074, 58.8314
aos  AP:87.91, 71.98, 67.81
Cyclist AP@0.50, 0.25, 0.25:
bbox AP:85.2661, 72.9744, 68.9914
bev  AP:86.6035, 70.6055, 66.9244
3d   AP:86.6035, 70.6055, 66.9244
aos  AP:84.72, 71.09, 67.13
Cyclist AP_R40@0.50, 0.25, 0.25:
bbox AP:88.5723, 74.0385, 69.8009
bev  AP:88.8812, 71.7453, 67.7714
3d   AP:88.8812, 71.7453, 67.7714
aos  AP:87.91, 71.98, 67.81

sshaoshuai

Thank you for the contribution, great work!
Welcome the first mono 3d det work CaDDN in OpenPCDet!

Please check the comments and see how we could further improve it to be more elegant.
Thank you!

sshaoshuai · 2021-05-18T08:18:54Z

pcdet/datasets/augmentor/image_augmentor_utils.py

+import numpy as np
+
+
+def random_flip_horizontal(image, depth_map, gt_boxes, calib):


How about moving this to augmentor_utils.py with function name random_image_flip_horizontal?

sshaoshuai · 2021-05-18T08:39:51Z

pcdet/datasets/kitti/kitti_dataset.py

+
+        if "calib_matricies" in self.dataset_cfg.GET_ITEM_LIST:
+            input_dict["trans_lidar_to_cam"], input_dict["trans_cam_to_img"] =  kitti_utils.calib_to_matricies(calib)
+


GET_ITEM_LIST is a good idea for various data sources.
However, points=self.get_lidar() is a common setting for LiDAR-based 3D object detection, so I think it should be kept by default to ensure previous configs could also use the KittiDataset class.

This part should be something like,

get_item_list = self.dataset_cfg.get('GET_ITEM_LIST', ['points']) # load points if 'points' in get_item_list: xxx # load images xxxx # load depth_maps xxxx # load calib_matricies xxxx

sshaoshuai · 2021-05-18T08:54:56Z

pcdet/models/backbones_3d/f2v/frustum_grid_generator.py

+        self.voxel_grid = kornia.utils.create_meshgrid3d(depth=self.depth,
+                                                         height=self.height,
+                                                         width=self.width,
+                                                         normalized_coordinates=False)


Does it necessary to use kornia? It seems we could simply implement this function with native PyTorch operations.
Such as implement it within one file of pcdet/utils.

I could re-implement kornia functions, however I use seven different functions throughout the code. Adding these implementations adds additional code in this repo which I don't feel is necessary. Additionally, I already require adding a dependency (torchvision), so the requirements need to updated anyways.

Kornia Functions:
kornia.image_to_tensor
kornia.utils.create_meshgrid3d
kornia.transform_points
kornia.normalize
kornia.losses.FocalLoss
kornia.convert_points_to_homogeneous
kornia.convert_points_from_homogeneous

sshaoshuai · 2021-05-18T09:07:59Z

pcdet/models/backbones_3d/f2v/__init__.py

+
+__all__ = {
+    'FrustumToVoxel': FrustumToVoxel
+}


I think f2v is a type of vfe (voxel feature encoding or extractor) by using frustum features instead of point-wise features.
So how about moving f2v to the folder of vfe, and create a module named like FrustumVFE as FrustumToVoxel.

Moved f2v as a submodule of ImageVFE

sshaoshuai · 2021-05-18T09:09:45Z

pcdet/models/detectors/detector3d_template.py

@@ -6,7 +6,7 @@
 from ...ops.iou3d_nms import iou3d_nms_utils


Modify this file by considering f2v as vfe

Done, moved f2v as a submodule of ImageVFE

sshaoshuai · 2021-05-18T09:17:31Z

pcdet/utils/grid_utils.py

@@ -0,0 +1,19 @@
+import torch


Maybe it is no need to create a separate file for this simple function.
Such as we could merge grid_utils.py, depth_utils.py and transform_utils.py as a single file transform_utils.py.

sshaoshuai · 2021-05-18T09:30:12Z

pcdet/models/backbones_3d/ffe/__init__.py

@@ -0,0 +1,5 @@
+from .depth_ffe import DepthFFE


I'm not sure whether it is better to also move ffe to the folder of 'vfe', since it seems ffe could only be used as a previous module of f2v.
If so, the overall framework will still keep simple and clear even with the implementation of CaDDN.

Moved ffe as a submodule of ImageVFE

sshaoshuai · 2021-05-18T09:39:55Z

Nice codes!

The only suggestion is that: how about fusing ffe+f2v as a new module of vfe? since it also aims to extract voxel-wise features from
image features.
If so, the integration of CaDDN will be natural, and I think the overall framework will be more clear and will not affect the previous architecture of OpenPCDet for LiDAR-based 3D detection.

codyreading · 2021-05-18T13:27:46Z

Thanks for the quick review!

Sounds good, I'll update the requested changes and fuse the FFE + F2V in one module

…nsample into data_processor

codyreading · 2021-05-19T15:00:37Z

This PR should be good to go. I made FFE (renamed this to FFN) and F2V as modules of ImageVFE, which extracts voxel features from an image.

sshaoshuai

Reviewed.

Added CaDDN detector and support for image, depth map, and 2D GT box

b69ff33

dataloading

sshaoshuai reviewed May 18, 2021

View reviewed changes

codyreading added 6 commits May 18, 2021 17:57

Moved image flip augmentation to augmentor_utils

802d648

Updated default get item list to include points

102cab1

Moved utils functions into transform_utils

c1d0b56

Combined FFE + F2V into ImageVFE, renamed FFE to FFN, moved depth dow…

a8ebbb1

…nsample into data_processor

Updated README with updated CaDDN weights

3221e45

Updated comments for image vfe

466d82b

sshaoshuai reviewed May 20, 2021

View reviewed changes

sshaoshuai merged commit aaf9cbe into open-mmlab:master May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CaDDN Detector #538

CaDDN Detector #538

codyreading commented May 14, 2021 •

edited

codyreading commented May 14, 2021 •

edited

sshaoshuai left a comment •

edited

sshaoshuai May 18, 2021

codyreading May 18, 2021

sshaoshuai May 18, 2021

codyreading May 18, 2021

sshaoshuai May 18, 2021

codyreading May 18, 2021

sshaoshuai May 18, 2021

codyreading May 19, 2021

sshaoshuai May 18, 2021

codyreading May 19, 2021

sshaoshuai May 18, 2021

codyreading May 18, 2021

sshaoshuai May 18, 2021

codyreading May 19, 2021

sshaoshuai commented May 18, 2021

codyreading commented May 18, 2021 •

edited

codyreading commented May 19, 2021

sshaoshuai left a comment

		import numpy as np


		def random_flip_horizontal(image, depth_map, gt_boxes, calib):


		if "calib_matricies" in self.dataset_cfg.GET_ITEM_LIST:
		input_dict["trans_lidar_to_cam"], input_dict["trans_cam_to_img"] = kitti_utils.calib_to_matricies(calib)

CaDDN Detector #538

CaDDN Detector #538

Conversation

codyreading commented May 14, 2021 • edited

Summary

Changes

Results

codyreading commented May 14, 2021 • edited

Performance

Results

sshaoshuai left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sshaoshuai commented May 18, 2021

codyreading commented May 18, 2021 • edited

codyreading commented May 19, 2021

sshaoshuai left a comment

Choose a reason for hiding this comment

codyreading commented May 14, 2021 •

edited

codyreading commented May 14, 2021 •

edited

sshaoshuai left a comment •

edited

codyreading commented May 18, 2021 •

edited