Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The confusion about results of 3DSSD between official and MMDet3D implementation. #612

Closed
Physu opened this issue Jun 2, 2021 · 22 comments

Comments

@Physu
Copy link

Physu commented Jun 2, 2021

Thanks for developers extraordinary work!
I have a question about 3DSSD evaluation result between author and MMDet3D implementation.
The author's release result:

Methods Easy AP Moderate AP Hard AP Models
3DSSD 91.71 83.30 80.44 model
PointRCNN 88.91 79.88 78.37 model

In MMDet3D, the result:

Backbone Class Lr schd Mem (GB) Inf time (fps) mAP Download
PointNet2SAMSG Car 72e 4.7   78.39(81.00)1 model | log

I notice "Experiment details on KITTI datasets", which shows the difference between official implementation.

1.Official implementation based on Tensorflow1.4, but I guess pytorch is not the reason of poor performance, or tensorflow and pytorch exist performance gap?
2.It is about two percent margin(81.0 and 83.3) between two implementation, can we come up with some methods to fix it?

I also use single2080Ti to train a train+val model with configs/3DSSD/3dssd_kitti-3d-car.py, I modified the ann_file=data_root + 'kitti_infos_train.pkl', to ann_file=data_root + 'kitti_infos_trainval.pkl', the rest code was kept as origin.
when the train was finished, I will evaluate on test, and get the result there to discuss.
Thanks again!

@Physu
Copy link
Author

Physu commented Jun 7, 2021

I train the 3DSSD followed the configs in configs/3dssd/3dssd_kitti-3d-car.py with train+val data
and modified the batchsize from 4 to 8, modified the lr from 0.002 to 0.004, the rest keep as origin.
The test result(under AP40):

Benchmark Easy Moderate Hard
Car (Detection) 94.91 % 91.35 % 87.47 %
Car (Orientation) 0.01 % 0.47 % 0.63 %
Car (3D Detection) 86.06 % 76.48 % 69.71 %
Car (Bird's Eye View) 91.65 % 86.69 % 81.05 %

There exits a large margin between the official 3DSSD(76.48 vs 79.55). I feel confused about this, did I set something wrong? Or what can I do to make up this performance gap?
Thanks

@Physu Physu changed the title The confusion about results of SSD3D between author and MMDet3D implementation. The confusion about results of 3DSSD between official and MMDet3D implementation. Jun 7, 2021
@Tai-Wang
Copy link
Member

Tai-Wang commented Jun 8, 2021

The reason for the performance difference has been explained in the README page. Among the differences, there are two most important ones: different evaluation code and different train/val set. The first one can yield about 2 mAP difference as said in the readme while the second one will at least remove the influence of false positive predictions in those samples without ground truths.

In addition, we also regress the benchmark by evaluating our results with their evaluation code and evaluating their results with our evaluation code. The results are almost the same. (Actually, we only reproduce the 79.26 mAP with the official code according to the record of @encore-zhou.)

As for the difference on the test set, there exist some uncertainty and tricks. Have you ever tried to train a model with the official code and submit the result to the benchmark?

@Physu
Copy link
Author

Physu commented Jun 8, 2021

Thanks for your feedback! Official code was implemented by Tensorflow, I will try train a model and submit the result to the test and evaluate the performance. New results will be updated here, as soon as I get it.

@Physu
Copy link
Author

Physu commented Jun 8, 2021

By the way, 79.26 is evaluated on val data or test data? If result was evaluated on test data, 79.26 vs 79.55(official in test data), the margin is acceptable. My result on test data was 3 mAP margin, it is unacceptable.

Actually, we only reproduce the 79.26 mAP with the official code according to the record of @encore-zhou

@Tai-Wang
Copy link
Member

Tai-Wang commented Jun 8, 2021

It's evaluated on their val dataset and with their evaluation code (compared with the reported 83.3). So I guess there is a large range of fluctuation in terms of performance on the validation set. You can have a try first and let's have a closer look into whether there is a gap between our implementation and the official one.

@Physu
Copy link
Author

Physu commented Jun 8, 2021

Got it, I will try to reproduce result by following official code.

@Physu
Copy link
Author

Physu commented Jun 17, 2021

I use the official implementation and configs train models in Docker container.
The python packages are listed below:
tensorflow 1.4.0
tensorflow-tensorboard 0.4.0
python 3.5
cuda 9.0
numpy 1.14.5

total train iterations: 80700
final ckptfile: model-79893(not 80700 as final ckptfile)

the result:
the model-79893
image
the model-79086
image
the model-78279
image
the model-77472
image

Benchmark iterations Easy Moderate Hard
Car (Detection) 77472 89.70 % 82.84 % 79.97 %
Car (Detection) 78279 89.29 % 82.69 % 80.06 %
Car (Detection) 79086 91.14 % 82.79% 80.02 %
Car (Detection) 79893 89.39 % 82.54% 79.83 %

It seems official model evaluation results are better than MMdet3D, but the reason needs further study to find out.

@Tai-Wang
Copy link
Member

It's a little strange because when we reproduce 3DSSD, @encore-zhou only got the following performance with the official code:
image

Maybe there is some fluctuation in performance?

@Physu
Copy link
Author

Physu commented Jun 18, 2021

Maybe the author improved the code implementation? There is something cause the performance gap. I will check the 3DSSD head, hope can find something to explain this situation.
image

And this is new results gotten by minutes ago.

@Physu
Copy link
Author

Physu commented Jun 18, 2021

By the way, this results is trained with more epoch, can see that the performance further improved( reach 82.9%).

@Tai-Wang
Copy link
Member

Yes, it is really strange because we reproduce the above results on Aug. 2020 (as shown in the screenshot) and there are no updates after April 2020. We will check this issue recently. In the meantime, if you have any progress, please feel free to share it here.

@Tai-Wang Tai-Wang reopened this Jun 18, 2021
@Physu
Copy link
Author

Physu commented Jun 18, 2021

Thanks for reopening this issue! New findings will be updated.

@Physu
Copy link
Author

Physu commented Jul 12, 2021

Using pytorch1.5
mmdet 1.3.9
mmdet3d 0.14.0
mmcv-full 1.3.9
ubuntu 18.4

I use official configs in configs/3DSSD/3dssd_4x4_kitti-3d-car.py and modified the single GPU batchsize from 4 to 8(because I use 2 GPUs, the official config setting is 4 GPUs), lr_rate and epoch keep as origin.
Trained a model with 2 2080Ti GPUs and with Full train data(7481). Finally I get a validation results on valsplit (3, 769 samples):
image
Then I generate test submission file, and submit it to test server:
image
The performance is not good as I expected, I just don't know why. Could you please give some opinions on this performance?

@Physu
Copy link
Author

Physu commented Jul 12, 2021

I find it is hard to reproduce the results on KITTI test, though you could have gotten a good result on val already.

@Physu
Copy link
Author

Physu commented Jul 15, 2021

If we set the confidence threshold great than 0.0(default, output all the plausible predictions), e.g. 0.2 to filter the final predictions in predictions_in_test.txt, we will get:
image.
Note that in configs, you can define your threshold:

test_cfg=dict(
        nms_cfg=dict(type='nms', iou_thr=0.1),
        sample_mod='spec',
        score_thr=0.0,  # Attention!!!
        per_class_proposal=True,
        max_output_num=100))

Though there is some improvement, it is far from 79.57 in moderate (3DSSD in leaderboard). I guess a good post processing is needed,but the other skills which can improve performance are sitll a question.

@Wuziyi616
Copy link
Contributor

Wuziyi616 commented Jul 16, 2021

@Physu Have you ever tried generating submission using the official code and submit it to the test server to see the test set result? Also, it seems to me that, changing mmdet3d's training batch and GPUs from 4x4 to 8x2 improves val set results a lot?

Please kindly provide more observations and I will try to look into this issue.

@Physu
Copy link
Author

Physu commented Jul 16, 2021

@Wuziyi616 Thanks for your attention! Does offcial code mean dvlab-research/3DSSD or other methods?
Besides, in order to learn more about the evaluation procedure, I use traveller59/kitti-object-eval-python to test results on val set(e.g. get every LiDAR bin's results and save it in a txt file, finally get 3769 txt files).
I find when no other post processing involved, the results:
image
which is slightly better than mmdet3d evaluation result(Maybe it is unfair to compare this way, for the hyperparameters may change):
image
If I use a confidence threshold 0.2 to filter out the false positive, the result further improved:
image

@Physu
Copy link
Author

Physu commented Jul 16, 2021

I will reproduce on 4*4, and we will see the difference further.

@Wuziyi616
Copy link
Contributor

@Wuziyi616 Thanks for your attention! Does offcial code mean dvlab-research/3DSSD or other methods?
Besides, in order to learn more about the evaluation procedure, I use traveller59/kitti-object-eval-python to test results on val set(e.g. get every LiDAR bin's results and save it in a txt file, finally get 3769 txt files).
I find when no other post processing involved, the results:
image
which is slightly better than mmdet3d evaluation result(Maybe it is unfair to compare this way, for the hyperparameters may change):
image
If I use a confidence threshold 0.2 to filter out the false positive, the result further improved:
image

Exactly, the official code I said is the dvlab's code. I think that's the official code release for 3DSSD isn't it? As you mentioned in this reply, you said you would like to submit test results using that code, have you done that?

@Physu
Copy link
Author

Physu commented Jul 18, 2021

Thanks for your attention,my opportunity is running out, the results will be updated soon.

@jlqzzz
Copy link

jlqzzz commented Sep 1, 2021

@Physu
Have you tried to reproduce the multi-class version of 3dssd (that is, predict car, pedestrian and cyclist at the same time)?

@Machine97
Copy link

@Physu
Hi, have you ever tried generating submission using the official code and submit it to the test server to see the test set result?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants