Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Add] CenterPoint compatibility with KITTI #924

Open
wants to merge 9 commits into
base: 1.0
Choose a base branch
from

Conversation

robin-karlsson0
Copy link
Contributor

This PR consists of two contributions

  1. Modifies the code of CenterPoint's head (mmdet3d/models/dense_heads/centerpoint_head.py) to make it compatible with datasets having bbox annotations with 7 values (new, KITTI-like) or 9 values (current, nuScenes-like).
  2. Adds KITTI configuration files for CenterPoint.

Motivation

Currently, there exists no official implementation for KITTI with CenterPoint while several issues have requested an implementation (see #486, #225, #190). KITTI is a standard benchmarking dataset, and think it is of value to be able to refer to an official implementation and benchmark results when comparing CenterPoint with other models.

Modification

The code incenterpoint_head.py is restrained to how annotation bboxes are generated within two functions; def get_targets_single() and def loss().

For def get_targets_single(), the number of bbox annotation elements is set dynamically according to the config file (i.e. gt_annotation_num = len(self.train_cfg['code_weights']). The torch.Tensor annotation box (anno_box) is created either as a 7 or 9 element tensor depending on gt_annotation_num.

For def loss(), the annotation box is restructured first based on universal elements, in addition to the velocity components (vx, vy) if such elements exist.

These changes allow centerpoint_head.py to work with both nuScenes and KITTI configurations without requiring any further code changes.

The new KITTI configuration file parameter values are the same as in #871.

The modifications are currently being validated with two experiment runs; nuScenes is validated using the existing config file centerpoint_0075voxel_second_secfpn_dcn_circlenms_4x8_cyclic_20e_nus.py, and KITTI is validated using the new config file centerpoint_01voxel_second_secfpn_dcn_circlenms_4x8_cyclic_20e_kitti.py. Results will be reported once available.

@codecov
Copy link

codecov bot commented Sep 9, 2021

Codecov Report

Merging #924 (8687aea) into master (2efdb6b) will decrease coverage by 0.02%.
The diff coverage is 0.00%.

❗ Current head 8687aea differs from pull request most recent head 3f9135b. Consider uploading reports for the commit 3f9135b to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #924      +/-   ##
==========================================
- Coverage   49.33%   49.31%   -0.03%     
==========================================
  Files         210      210              
  Lines       16021    16028       +7     
  Branches     2556     2558       +2     
==========================================
  Hits         7904     7904              
- Misses       7614     7621       +7     
  Partials      503      503              
Flag Coverage Δ
unittests 49.31% <0.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmdet3d/models/dense_heads/centerpoint_head.py 43.54% <0.00%> (-1.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2efdb6b...3f9135b. Read the comment docs.

@robin-karlsson0
Copy link
Contributor Author

Here is the result of the (partly ongoing) experiment runs. By the looks of it, as the output is not garbage, the code modifications likely works.

Noticed that the reported results in #871 have several different parameters values like 80 epochs and batch size 8, so I'll run another experiment with corresponding parameters to see if the performance numbers will match better.

KITTI experiment (20/20 epochs, voxel size 0.05 x 0.05 x 0.10)

Pedestrian AP@0.50, 0.50, 0.50:
bbox AP:58.8907, 53.7964, 51.8120
bev  AP:52.6117, 47.6410, 44.5468
3d   AP:48.8790, 43.1561, 39.0406
aos  AP:1.85, 2.11, 2.57
Pedestrian AP@0.50, 0.25, 0.25:
bbox AP:58.8907, 53.7964, 51.8120
bev  AP:66.8690, 62.5275, 59.8175
3d   AP:66.6481, 62.3058, 59.5542
aos  AP:1.85, 2.11, 2.57
Cyclist AP@0.50, 0.50, 0.50:
bbox AP:76.2551, 68.4002, 64.8945
bev  AP:72.6522, 61.4727, 58.4541
3d   AP:69.2906, 58.7568, 55.5770
aos  AP:0.33, 0.99, 0.96
Cyclist AP@0.50, 0.25, 0.25:
bbox AP:76.2551, 68.4002, 64.8945
bev  AP:75.4667, 66.3259, 62.4743
3d   AP:75.3724, 66.0972, 62.3836
aos  AP:0.33, 0.99, 0.96
Car AP@0.70, 0.70, 0.70:
bbox AP:90.3484, 88.7091, 80.3441
bev  AP:89.2393, 83.8527, 78.9589
3d   AP:85.4495, 75.3621, 67.9366
aos  AP:0.12, 0.53, 0.75
Car AP@0.70, 0.50, 0.50:
bbox AP:90.3484, 88.7091, 80.3441
bev  AP:90.4089, 89.5775, 86.4906
3d   AP:90.4022, 89.4933, 85.6491
aos  AP:0.12, 0.53, 0.75

Overall AP@easy, moderate, hard:
bbox AP:75.1647, 70.3019, 65.6835
bev  AP:71.5011, 64.3221, 60.6533
3d   AP:67.8731, 59.0916, 54.1847
aos  AP:0.77, 1.21, 1.43

nuScenes experiment (1/20 epochs)

mAP: 0.4224                                                                                                                                                                                                     
mATE: 0.3379
mASE: 0.2769
mAOE: 0.5464
mAVE: 0.5787
mAAE: 0.2133
NDS: 0.5159
Eval time: 104.8s

Per-class results:
Object Class	AP	ATE	ASE	AOE	AVE	AAE
car	0.765	0.219	0.166	0.307	0.451	0.202
truck	0.372	0.397	0.241	0.192	0.412	0.250
bus	0.566	0.407	0.192	0.224	0.999	0.313
trailer	0.237	0.534	0.222	1.060	0.411	0.165
construction_vehicle	0.094	0.703	0.425	1.172	0.138	0.345
pedestrian	0.735	0.204	0.312	0.691	0.493	0.133
motorcycle	0.240	0.238	0.260	0.544	1.374	0.271
bicycle	0.114	0.195	0.264	0.623	0.350	0.027
traffic_cone	0.529	0.191	0.408	nan	nan	nan
barrier	0.573	0.291	0.279	0.105	nan	nan

@robin-karlsson0
Copy link
Contributor Author

The follow-up KITTI experiment results (80 epochs, voxel size 0.05 x 0.05 x 0.10, batch size 8, single GPU)

Overall AP@easy, moderate, hard:
bbox AP:84.4814, 75.5839, 73.6057
bev  AP:76.4759, 68.6294, 65.4713
3d   AP:72.9873, 61.9473, 59.5183
aos  AP:0.87, 1.64, 1.90

The results improved and are now comparable to those reported in #871, though in general 1~2 points lower. Will update the previous CenterPoint-KITTI config files to reflect the changes.

Overall AP@easy, moderate, hard:
bbox AP:85,4418, 76.7482, 75.1310
bev  AP:78.1172, 70.6400, 67.5259
3d   AP:74.7528, 63.4462, 60.6359
aos  AP:0.77, 1.21, 1.43

Modified further inconsistencies with the config file in #871 and launched a new run trying to match the result values.

  1. sample_groups=dict(Car=12, Pedestrian=10, Cyclist=10) <--> sample_groups=dict(Car=12, Pedestrian=6, Cyclist=6))

  2. post_center_range=point_cloud_range <--> post_center_limit_range=[-10, -50, -10, 80.4, 50, 10]

  3. Optimizer parameters: betas=(0.95, 0.99)

Attaching the training log and processed config file for your reference
20210917_211652.log
centerpoint_005voxel_second_secfpn_4x8_cyclic_80e_kitti.txt

@robin-karlsson0
Copy link
Contributor Author

Results of the latest experiment. Improved to be in general 0.5 - 1 points lower than values reported in #871.

Overall AP@easy, moderate, hard:
bbox AP:83.4772, 77.5470, 74.7893
bev  AP:76.3518, 69.0049, 65.9117
3d   AP:73.1059, 62.9784, 59.6075
aos  AP:0.84, 1.59, 1.83

Planning on pushing a version of the config modifications that do not modify the base dataset config file configs/_base_/datasets/kitti-3d-3class.py.

Log and compiled config file
20210918_171159.log
centerpoint_005voxel_second_secfpn_4x8_cyclic_80e_kitti.txt

@tianweiy
Copy link
Contributor

just a comment, it seems the output stride is important for centerpoint performance on KITTI. In current configs (nusc exps), we use 8x for voxelnet, but I found that 4x works a bit better (though slower and a bit more memory during training). I can get to about 78 moderate in this repo tianweiy/CenterPoint-KITTI#9

@robin-karlsson0
Copy link
Contributor Author

@tianweiy Thank you for sharing your insight and your work on CenterPoint x KITTI!
Will try out your recommendation and check out the repository.

By the way, can you confirm if the feature map dimensions are appropriate for the KITTI point cloud data range? 🤔
Comparing the visualization of the 3D point cloud and the target centre points plotted on the feature map, only a small subset of the feature map seem to be relevant. I also visualized an (unrelated) feature map in RGB that indicate the same concern.

I have not changed parameters relating to the feature map dimensions, at least intentionally.

Point cloud visualization
pointcloud_viz
Target center points in feature map coordinates
centers_in_feat_map
Feature map visualization
feat_map_viz

@tianweiy
Copy link
Contributor

By the way, can you confirm if the feature map dimensions are appropriate for the KITTI point cloud data range?

Which part do you want me to confirm? I agree that the center map seems super sparse compared to the whole map. Here the feature map dimension 8 and as I said above 4 should work slightly better.

@robin-karlsson0
Copy link
Contributor Author

Which part do you want me to confirm? I agree that the center map seems super sparse compared to the whole map. Here the feature map dimension 8 and as I said above 4 should work slightly better.

Ah, I just meant to ask if you found this odd as well. Understand now that it is likely related to the feature map dimension 8 --> 4 difference. Will confirm and thank you again 👍

@robin-karlsson0
Copy link
Contributor Author

@tianweiy Tried out implementing the out_size_factor = 4 modification as you suggested.

Changing the value 8 --> 4 modifies the target feature map dimensions (200, 176) --> (400, 352). However, the feature map outputted by 'SparseEncoder' (middle encoder) remains (200, 176).

==> I modified the stride number 2 --> 1 in the second-last encoder convolution in order to increase the 'SparseEncoder' output feature map dimension to (400, 352) and reduced the output channels by 1/2 to conserve the same total amount of elements and thus feature dimension when rearranging the output tensor into shape (N, 256, 400, 352).

Does this correspond to your implementation? Sorry if I misunderstood something.

Currently running an experiment. Needed to reduce batch size 8 --> 2 to fit the model into memory, so I'm having doubts about the correctness of my implementation.

    pts_middle_encoder=dict(
        type='SparseEncoder',
        in_channels=4,
        sparse_shape=[41, 1600, 1408],
        output_channels=64,  <-- REDUCED FROM 128
        order=('conv', 'norm', 'act'),
        encoder_channels=((16, 16, 32), (32, 32, 64), (64, 64, 128), (128, 128)),
        encoder_strides=(2, 2, 1, 1),  <-- DROPS SECOND-LAST STRIDE 2 CONV
        encoder_paddings=((0, 0, 1), (0, 0, 1), (0, 0, [0, 1, 1]), (0, 0)),
        block_type='basicblock'),

@Tai-Wang
Copy link
Member

@robin-karlsson0 Awesome work! Here is just a kind reminder that the aos number also seems strange. Please check whether the orientation estimation needs to be adjusted (possibly with opposite directions). After your benchmark is ready, we will help you review the code and transfer the benchmark to our refactored coordinate systems of v1.0.0.dev0.

@tianweiy
Copy link
Contributor

@robin-karlsson0 I didn't change the 3d backbone and only add a 2x upsample at the end of the 2d backbone (before detection head). This should take less memory.

    pts_middle_encoder=dict(
        type='SparseEncoder',
        in_channels=4,
        sparse_shape=[41, 1600, 1408],
        order=('conv', 'norm', 'act')),
    pts_backbone=dict(
        type='SECOND',
        in_channels=256,
        layer_nums=[5],
        layer_strides=[1],
        out_channels=[128]),
    pts_neck=dict(
        type='SECONDFPN',
        in_channels=[128],
        upsample_strides=[2],
        out_channels=[128]),

@robin-karlsson0
Copy link
Contributor Author

@robin-karlsson0 I didn't change the 3d backbone and only add a 2x upsample at the end of the 2d backbone (before detection head). This should take less memory.

OK, thanks for sharing! Will try out just upscaling too, and compare with the modified 3D backbone experiment currently running (epoch 68/80, ETA 8h).

cyclic_times=1,
step_ratio_up=0.4,
)
# Although the max_epochs is 40, this schedule is usually used we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_epochs is 80

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wonder if this schedule is necessary to reproduce CenterNet results? We can also use cyclic_40 with repeating dataset twice during training (although it can result in a little difference in terms of the used learning rate)?

@@ -45,6 +46,9 @@ def extract_pts_feat(self, pts, img_feats, img_metas):
x = self.pts_backbone(x)
if self.with_pts_neck:
x = self.pts_neck(x)
# Upsample output feature map spatial dimension to match target
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this condition generalizable? I think a clear comparison between the output feature map and target would be better?

Copy link
Member

@Tai-Wang Tai-Wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @robin-karlsson0 , have you finished this PR? If yes, please have a look at my minor comments and let's work together to merge it into master.

@@ -45,6 +46,9 @@ def extract_pts_feat(self, pts, img_feats, img_metas):
x = self.pts_backbone(x)
if self.with_pts_neck:
x = self.pts_neck(x)
# Upsample output feature map spatial dimension to match target
if self.train_cfg['pts']['out_size_factor'] == 4:
Copy link
Contributor

@KickCellarDoor KickCellarDoor Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.train_cfg['pts']['out_size_factor'] == 4 is incompatible with the default test script, since cfg.model.train_cfg is set to None. It's better to try self.test_cfg is self.train_cfg is not available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to try self.test_cfg when self.train_cfg is not available.

post_center_limit_range=[-10, -50, -10, 80.4, 50, 10],
max_per_img=500,
max_pool_nms=False,
min_radius=[4, 12, 10, 1, 0.85, 0.175],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think min_radius should be modified according to classes

Copy link

@Xrenya Xrenya Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KickCellarDoor could please explain how these radii were obtained? I have calculated the statistic of NuScenes for each object and was not able to understand on what basis it was selected.
For example, for first radius is the car and it is equal 4m.

The average car has the following dimensions:
vehicle.car [widht=1.95447557, lenght=4.61892457, height=1.73131372] 

@robin-karlsson0
Copy link
Contributor Author

@Tai-Wang @KickCellarDoor Hello and sorry for the long absence. Worked on this as part of a part-time work, which I left after having acquired financial support. In the mean time I've been focusing on my primary research incl. a publication DL. Recalled these half-finished Open-MMLab PRs and I'm again available to work towards completing them.

Please allow me some time to catch up and I'll address the reviews above 👍

@alexandrealmin
Copy link

Hi @robin-karlsson0 @Tai-Wang @KickCellarDoor, thank you for your work on this feature, when do you think it will be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants