Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparation for CLOCs #2

Closed
shaunkheng97 opened this issue Jan 14, 2021 · 12 comments
Closed

Preparation for CLOCs #2

shaunkheng97 opened this issue Jan 14, 2021 · 12 comments

Comments

@shaunkheng97
Copy link

shaunkheng97 commented Jan 14, 2021

Hi, I am planning to do a fusion with Yolov4 and Seconds/PointPillar. Would you be providing a tutorial/guide for the extraction of the bounding boxes before NMS?

@pangsu0613
Copy link
Owner

Hello, for SECOND-V1.5 (for newer version of SECOND, it should be very similar), check the file 'voxelnet.py' https://github.com/traveller59/second.pytorch/blob/v1.5/second/pytorch/models/voxelnet.py, line 377, batch_box_preds = preds_dict["box_preds"], they are the raw output (encodings of bounding boxes before NMS) from the SECOND network. First, you need to decode them (line 387), the decoded expressions are [x, y, z, w, l, h, r] in lidar coordinate, if you need them in camera coordinate, use functions in https://github.com/traveller59/second.pytorch/blob/v1.5/second/pytorch/core/box_torch_ops.py to transform them. 'box_torch_ops.py' also provides many other useful 2D/3D bounding box and coordinate transformation functions.
As for Yolov4, I am not very familiar with the Yolov4 codebase, but based on my experience, just setting NMS score threshold to 0 (if you are using sigmoid score, 0 means no threshold), the output also works fine.

@CodeDragon18
Copy link

Hi,
when do you update your code? i am waiting to try.

@shaunkheng97
Copy link
Author

shaunkheng97 commented Jan 19, 2021

Currently I do have a trained yolov4 model with BDD dataset and a trained Seconds v1.6 with Kitti dataset. My questions are:

  1. If I am planning to evaluate CLOCs on Kitti dataset, is it recommended to train yolov4 with Kitti dataset to optimize the performance?
  2. From my understanding, I do not need to retrain the network but would have to do an inference without NMS on a dataset as the input for CLOCs, am I correct?

@pangsu0613
Copy link
Owner

@shaunkheng97

  1. Yes, it is recommended to train the 2d detector (for here it is your yolov4) with Kitti dataset. If the 2D detector performs poorly on Kitti, it would spoil the fusion.
  2. Yes, you are right, you don't need to re-train the network, just do the inference without NMS or no NMS score thresholding, the point is to get more raw outputs from the network.

@pangsu0613
Copy link
Owner

@CodeDragon18
Thank you for your interests, I have been really busy these days, but I have started working on it, I will upload an early version as soon as possible.

@shaunkheng97
Copy link
Author

Alright I’ll work on it in the meantime! Thanks!

@shaunkheng97
Copy link
Author

Hi, just curious and confused about the training.

I have used 90% for training and 10% for validation on 7480 Kitti training dataset. If I were to run inference without NMS, I would have to reuse the training dataset as the inference set? Would it be contradictory if I am using the same dataset for training and inference?

@pangsu0613
Copy link
Owner

Yes, you are right. Ideally, one should divide the dataset into 3 parts, part 1 for training the 3D and 2D detector, part 2 for training CLOCs, and part 3 for validation only. But for KITTI, first, the 3712 frames mini-training and 3769 frames validation split is so popular, many researchers use that split for their experiments, it would be good to show results on the 3769 validation set for comparison; second, KITTI is a relatively small dataset, I think it is too small to divide it into 3 parts. So, I just use the popular 3712 frames mini-training set to train 3D/2D detectors and CLOCs, and doing validation on 3769 frames validation set. This is NOT the best and reasonable way for training, but even with this, I still do get some improvements. I think for other larger datasets (such as nuScene, Waymo and Argoverse), dividing it into 3 parts would be a better choice.

@shaunkheng97
Copy link
Author

So for now, should I retrain yolov4 with any random 3712 frames, and run inference again on all 7480 frames for CLOCs input? How was Seconds' training like? I believe it is more ideal to train yolov4 similarly with the Seconds that you've used.

Might try to train on nuScenes if I am able to successfully train CLOCs on Kitti.

@pangsu0613
Copy link
Owner

Yes, it would be better to train YOLO-V4 with the 3712 frames if 3712 frames are enough to train YOLO-V4.
Also, the 3712 + 3769 split is NOT random, it is a fixed well known convention split used by many researchers, so people could compare different networks on the same validation set, I remembered the split was proposed in a year 2015 paper named '3D object proposals for accurate object class detection'. You can find the split under /CLOCs/second/data/ImageSet, there are multiple text files there, for 'train.txt', it contains all the frame numbers for the 3712 mini-training set; for 'val.txt', it contains all the frame numbers for the 3769 validation set. SECOND uses the 3712 mini-training set for training.

@shaunkheng97
Copy link
Author

Alright. I'll attempt to train yolov4 with the 3712 mini-training set first, and will get back to CLOCs soon!

@FaFaLiu
Copy link

FaFaLiu commented Apr 24, 2021

Did you succeed?I also want to do this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants