-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Add] CenterPoint compatibility with KITTI #924
base: 1.0
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #924 +/- ##
==========================================
- Coverage 49.33% 49.31% -0.03%
==========================================
Files 210 210
Lines 16021 16028 +7
Branches 2556 2558 +2
==========================================
Hits 7904 7904
- Misses 7614 7621 +7
Partials 503 503
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Here is the result of the (partly ongoing) experiment runs. By the looks of it, as the output is not garbage, the code modifications likely works. Noticed that the reported results in #871 have several different parameters values like 80 epochs and batch size 8, so I'll run another experiment with corresponding parameters to see if the performance numbers will match better. KITTI experiment (20/20 epochs, voxel size 0.05 x 0.05 x 0.10)
nuScenes experiment (1/20 epochs)
|
The follow-up KITTI experiment results (80 epochs, voxel size 0.05 x 0.05 x 0.10, batch size 8, single GPU)
The results improved and are now comparable to those reported in #871, though in general 1~2 points lower. Will update the previous CenterPoint-KITTI config files to reflect the changes.
Modified further inconsistencies with the config file in #871 and launched a new run trying to match the result values.
Attaching the training log and processed config file for your reference |
Results of the latest experiment. Improved to be in general 0.5 - 1 points lower than values reported in #871.
Planning on pushing a version of the config modifications that do not modify the base dataset config file Log and compiled config file |
just a comment, it seems the output stride is important for centerpoint performance on KITTI. In current configs (nusc exps), we use 8x for voxelnet, but I found that 4x works a bit better (though slower and a bit more memory during training). I can get to about 78 moderate in this repo tianweiy/CenterPoint-KITTI#9 |
@tianweiy Thank you for sharing your insight and your work on CenterPoint x KITTI! By the way, can you confirm if the feature map dimensions are appropriate for the KITTI point cloud data range? 🤔 I have not changed parameters relating to the feature map dimensions, at least intentionally. Point cloud visualization |
Which part do you want me to confirm? I agree that the center map seems super sparse compared to the whole map. Here the feature map dimension 8 and as I said above 4 should work slightly better. |
Ah, I just meant to ask if you found this odd as well. Understand now that it is likely related to the feature map dimension 8 --> 4 difference. Will confirm and thank you again 👍 |
@tianweiy Tried out implementing the Changing the value 8 --> 4 modifies the target feature map dimensions (200, 176) --> (400, 352). However, the feature map outputted by 'SparseEncoder' (middle encoder) remains (200, 176). ==> I modified the stride number 2 --> 1 in the second-last encoder convolution in order to increase the 'SparseEncoder' output feature map dimension to (400, 352) and reduced the output channels by 1/2 to conserve the same total amount of elements and thus feature dimension when rearranging the output tensor into shape (N, 256, 400, 352). Does this correspond to your implementation? Sorry if I misunderstood something. Currently running an experiment. Needed to reduce batch size 8 --> 2 to fit the model into memory, so I'm having doubts about the correctness of my implementation.
|
@robin-karlsson0 Awesome work! Here is just a kind reminder that the aos number also seems strange. Please check whether the orientation estimation needs to be adjusted (possibly with opposite directions). After your benchmark is ready, we will help you review the code and transfer the benchmark to our refactored coordinate systems of v1.0.0.dev0. |
@robin-karlsson0 I didn't change the 3d backbone and only add a 2x upsample at the end of the 2d backbone (before detection head). This should take less memory.
|
OK, thanks for sharing! Will try out just upscaling too, and compare with the modified 3D backbone experiment currently running (epoch 68/80, ETA 8h). |
cyclic_times=1, | ||
step_ratio_up=0.4, | ||
) | ||
# Although the max_epochs is 40, this schedule is usually used we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_epochs is 80
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wonder if this schedule is necessary to reproduce CenterNet results? We can also use cyclic_40 with repeating dataset twice during training (although it can result in a little difference in terms of the used learning rate)?
@@ -45,6 +46,9 @@ def extract_pts_feat(self, pts, img_feats, img_metas): | |||
x = self.pts_backbone(x) | |||
if self.with_pts_neck: | |||
x = self.pts_neck(x) | |||
# Upsample output feature map spatial dimension to match target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this condition generalizable? I think a clear comparison between the output feature map and target would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @robin-karlsson0 , have you finished this PR? If yes, please have a look at my minor comments and let's work together to merge it into master.
@@ -45,6 +46,9 @@ def extract_pts_feat(self, pts, img_feats, img_metas): | |||
x = self.pts_backbone(x) | |||
if self.with_pts_neck: | |||
x = self.pts_neck(x) | |||
# Upsample output feature map spatial dimension to match target | |||
if self.train_cfg['pts']['out_size_factor'] == 4: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.train_cfg['pts']['out_size_factor'] == 4 is incompatible with the default test script, since cfg.model.train_cfg is set to None. It's better to try self.test_cfg is self.train_cfg is not available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to try self.test_cfg when self.train_cfg is not available.
post_center_limit_range=[-10, -50, -10, 80.4, 50, 10], | ||
max_per_img=500, | ||
max_pool_nms=False, | ||
min_radius=[4, 12, 10, 1, 0.85, 0.175], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think min_radius should be modified according to classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KickCellarDoor could please explain how these radii were obtained? I have calculated the statistic of NuScenes for each object and was not able to understand on what basis it was selected.
For example, for first radius is the car
and it is equal 4m.
The average car has the following dimensions:
vehicle.car [widht=1.95447557, lenght=4.61892457, height=1.73131372]
@Tai-Wang @KickCellarDoor Hello and sorry for the long absence. Worked on this as part of a part-time work, which I left after having acquired financial support. In the mean time I've been focusing on my primary research incl. a publication DL. Recalled these half-finished Open-MMLab PRs and I'm again available to work towards completing them. Please allow me some time to catch up and I'll address the reviews above 👍 |
Hi @robin-karlsson0 @Tai-Wang @KickCellarDoor, thank you for your work on this feature, when do you think it will be merged? |
This PR consists of two contributions
mmdet3d/models/dense_heads/centerpoint_head.py
) to make it compatible with datasets having bbox annotations with 7 values (new, KITTI-like) or 9 values (current, nuScenes-like).Motivation
Currently, there exists no official implementation for KITTI with CenterPoint while several issues have requested an implementation (see #486, #225, #190). KITTI is a standard benchmarking dataset, and think it is of value to be able to refer to an official implementation and benchmark results when comparing CenterPoint with other models.
Modification
The code in
centerpoint_head.py
is restrained to how annotation bboxes are generated within two functions;def get_targets_single()
anddef loss()
.For
def get_targets_single()
, the number of bbox annotation elements is set dynamically according to the config file (i.e.gt_annotation_num = len(self.train_cfg['code_weights']
). Thetorch.Tensor
annotation box (anno_box
) is created either as a 7 or 9 element tensor depending ongt_annotation_num
.For
def loss()
, the annotation box is restructured first based on universal elements, in addition to the velocity components(vx, vy)
if such elements exist.These changes allow
centerpoint_head.py
to work with both nuScenes and KITTI configurations without requiring any further code changes.The new KITTI configuration file parameter values are the same as in #871.
The modifications are currently being validated with two experiment runs; nuScenes is validated using the existing config file
centerpoint_0075voxel_second_secfpn_dcn_circlenms_4x8_cyclic_20e_nus.py
, and KITTI is validated using the new config filecenterpoint_01voxel_second_secfpn_dcn_circlenms_4x8_cyclic_20e_kitti.py
. Results will be reported once available.