Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some ques. #6

Closed
huixiancheng opened this issue Nov 15, 2021 · 4 comments
Closed

Some ques. #6

huixiancheng opened this issue Nov 15, 2021 · 4 comments

Comments

@huixiancheng
Copy link
Contributor

Hi, Dear Ph.D Zhao. I have some doubts about details and look forward to your answers

  1. I tried generating and submitting test sequences from your pre-trained model to the official codalab. The end result is as follows, which is similar to the result I expected. In fact, in my similar submissions in the past, the test sequence usually has a slightly lower miou than the validation set sequence. So I would like to know if the 60miou you got in your paper was obtained by TTA (test augmentation or fine tuning with validation set, etc.)? In addition, can you provide detailed data(iou class by class) for 60miou?
    ********************************************************************************
    INTERFACE:
    Data:  /tmp/codalab/tmpjFJIFr/run/input/ref
    Predictions:  /tmp/codalab/tmpjFJIFr/run/input/res
    Backend:  numpy
    Split:  test
    Config:  /tmp/codalab/tmpjFJIFr/run/program/semantic-kitti.yaml
    Limit:  None
    Codalab:  /tmp/codalab/tmpjFJIFr/run/output
    ********************************************************************************
    Opening data config file /tmp/codalab/tmpjFJIFr/run/program/semantic-kitti.yaml
    Ignoring xentropy class  0  in IoU evaluation
    [IOU EVAL] IGNORE:  [0]
    [IOU EVAL] INCLUDE:  [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
    Evaluating sequences: 10% 20% 30% 40% 50% 60% 70% 80% 90% Validation set:
    Acc avg 0.896
    IoU avg 0.586
    IoU class 1 [car] = 0.930
    IoU class 2 [bicycle] = 0.457
    IoU class 3 [motorcycle] = 0.420
    IoU class 4 [truck] = 0.279
    IoU class 5 [other-vehicle] = 0.326
    IoU class 6 [person] = 0.626
    IoU class 7 [bicyclist] = 0.581
    IoU class 8 [motorcyclist] = 0.305
    IoU class 9 [road] = 0.908
    IoU class 10 [parking] = 0.583
    IoU class 11 [sidewalk] = 0.749
    IoU class 12 [other-ground] = 0.201
    IoU class 13 [building] = 0.885
    IoU class 14 [fence] = 0.595
    IoU class 15 [vegetation] = 0.831
    IoU class 16 [trunk] = 0.643
    IoU class 17 [terrain] = 0.678
    IoU class 18 [pole] = 0.526
    IoU class 19 [traffic-sign] = 0.600
    ********************************************************************************
    below can be copied straight for paper table
    0.930,0.457,0.420,0.279,0.326,0.626,0.581,0.305,0.908,0.583,0.749,0.201,0.885,0.595,0.831,0.643,0.678,0.526,0.600
  1. In addition, I still have doubts about the FPS test on SemanticKITTI on all the other works, as there is a lack of uniform and fair comparison(same set of hardware). So I would like to make sure that in the code here, it is necessary to set the sync:torch.cuda.synchronize()
    a=time.time()
    with torch.cuda.amp.autocast(enabled=args.if_mixture):
    semantic_output=model(input_tensor)
    b=time.time()

    Just like in RangeNet++.
    https://github.com/PRBonn/lidar-bonnetal/blob/5a5f4b180117b08879ec97a3a05a3838bce6bb0f/train/tasks/semantic/modules/user.py#L136-L141

Looking forward to your reply again.

@placeforyiming
Copy link
Owner

placeforyiming commented Nov 24, 2021

Hi, Dear Ph.D Zhao. I have some doubts about details and look forward to your answers

  1. I tried generating and submitting test sequences from your pre-trained model to the official codalab. The end result is as follows, which is similar to the result I expected. In fact, in my similar submissions in the past, the test sequence usually has a slightly lower miou than the validation set sequence. So I would like to know if the 60miou you got in your paper was obtained by TTA (test augmentation or fine tuning with validation set, etc.)? In addition, can you provide detailed data(iou class by class) for 60miou?
    ********************************************************************************
    INTERFACE:
    Data:  /tmp/codalab/tmpjFJIFr/run/input/ref
    Predictions:  /tmp/codalab/tmpjFJIFr/run/input/res
    Backend:  numpy
    Split:  test
    Config:  /tmp/codalab/tmpjFJIFr/run/program/semantic-kitti.yaml
    Limit:  None
    Codalab:  /tmp/codalab/tmpjFJIFr/run/output
    ********************************************************************************
    Opening data config file /tmp/codalab/tmpjFJIFr/run/program/semantic-kitti.yaml
    Ignoring xentropy class  0  in IoU evaluation
    [IOU EVAL] IGNORE:  [0]
    [IOU EVAL] INCLUDE:  [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
    Evaluating sequences: 10% 20% 30% 40% 50% 60% 70% 80% 90% Validation set:
    Acc avg 0.896
    IoU avg 0.586
    IoU class 1 [car] = 0.930
    IoU class 2 [bicycle] = 0.457
    IoU class 3 [motorcycle] = 0.420
    IoU class 4 [truck] = 0.279
    IoU class 5 [other-vehicle] = 0.326
    IoU class 6 [person] = 0.626
    IoU class 7 [bicyclist] = 0.581
    IoU class 8 [motorcyclist] = 0.305
    IoU class 9 [road] = 0.908
    IoU class 10 [parking] = 0.583
    IoU class 11 [sidewalk] = 0.749
    IoU class 12 [other-ground] = 0.201
    IoU class 13 [building] = 0.885
    IoU class 14 [fence] = 0.595
    IoU class 15 [vegetation] = 0.831
    IoU class 16 [trunk] = 0.643
    IoU class 17 [terrain] = 0.678
    IoU class 18 [pole] = 0.526
    IoU class 19 [traffic-sign] = 0.600
    ********************************************************************************
    below can be copied straight for paper table
    0.930,0.457,0.420,0.279,0.326,0.626,0.581,0.305,0.908,0.583,0.749,0.201,0.885,0.595,0.831,0.643,0.678,0.526,0.600
  1. In addition, I still have doubts about the FPS test on SemanticKITTI on all the other works, as there is a lack of uniform and fair comparison(same set of hardware). So I would like to make sure that in the code here, it is necessary to set the sync:torch.cuda.synchronize()

    a=time.time()
    with torch.cuda.amp.autocast(enabled=args.if_mixture):
    semantic_output=model(input_tensor)
    b=time.time()

    Just like in RangeNet++.
    https://github.com/PRBonn/lidar-bonnetal/blob/5a5f4b180117b08879ec97a3a05a3838bce6bb0f/train/tasks/semantic/modules/user.py#L136-L141

Looking forward to your reply again.

  1. The test result reported in the paper is using only the training set, with a slightly different but heavy model, including the ASPP module and the input without normal. I modified the model to make it faster and more stable with similar performance (58.6 vs 59.5) without finetune or cherry-pick on the validation. I think some refinements can make this code achieve better performance so that I believe the current one is a better solution.

  2. I don't know what do you mean with torch.cuda.synchronize(). This code supports multi-gpu training, but all the current result and settings are done on a single 2080ti.

@huixiancheng
Copy link
Contributor Author

Thank you very much for your reply.
For 1: I would also like to know if the corresponding model in the original paper version is ResNet34_aspp_1 or ResNet34_aspp_2?
For 2: Although I don't quite understand the rationale behind it, this is mentioned in many documents out of a similar explanation (CUDA calls are asynchronous, so that the data transfer and processing will be performed in the background.): https://discuss.pytorch.org/t/how-to-measure-execution-time-in-pytorch/111458 and https://discuss.pytorch.org/t/doing-qr-decomposition-on-gpu-is-much-slower-than-on-cpu/21213/6.
However, here is the speed in my test with 100 frames in RTX 3080 with torch.cuda.synchronize and amp(This is also not quite correct, it may not have preheated the GPU):

Mean CNN inference time:0.029579203128814697     std:0.0035621014516948072
Mean NLA inference time:0.00027779102325439456   std:2.5095342563410048e-05
Total Frames:100
Finished Infering
Mean KNN inference time:0.0018239092826843261    std:0.002006522343807084
Total Frames:100
Finished Infering

Despite the difference in inference speed of CNN, your proposed NLA algorithm is simpler and more efficient compared to KNN, great work! 👍👍
Also, if you think you need to set up a sync, can you kindly report on the results of the speed test on the 2080ti when you have some free time? (Since I couldn't find 2080ti so I chose a close 3080)

@placeforyiming
Copy link
Owner

Thank you very much for your reply. For 1: I would also like to know if the corresponding model in the original paper version is ResNet34_aspp_1 or ResNet34_aspp_2? For 2: Although I don't quite understand the rationale behind it, this is mentioned in many documents out of a similar explanation (CUDA calls are asynchronous, so that the data transfer and processing will be performed in the background.): https://discuss.pytorch.org/t/how-to-measure-execution-time-in-pytorch/111458 and https://discuss.pytorch.org/t/doing-qr-decomposition-on-gpu-is-much-slower-than-on-cpu/21213/6. However, here is the speed in my test with 100 frames in RTX 3080 with torch.cuda.synchronize and amp(This is also not quite correct, it may not have preheated the GPU):

Mean CNN inference time:0.029579203128814697     std:0.0035621014516948072
Mean NLA inference time:0.00027779102325439456   std:2.5095342563410048e-05
Total Frames:100
Finished Infering
Mean KNN inference time:0.0018239092826843261    std:0.002006522343807084
Total Frames:100
Finished Infering

Despite the difference in inference speed of CNN, your proposed NLA algorithm is simpler and more efficient compared to KNN, great work! 👍👍 Also, if you think you need to set up a sync, can you kindly report on the results of the speed test on the 2080ti when you have some free time? (Since I couldn't find 2080ti so I chose a close 3080)

  1. I can't remember details, the training scheduler is also slightly different from the current version. If you want to know the exact numbers, you can find them on the semantickitti leaderboard, ranking around 70 now, named Yiming. If you want higher performance, you can train with the current framework with more epochs or CosineAnnealing scheduler like salsaNext. The provided checkpoint is a direct outcome without any more tricks.

  2. I'm moving across countries, I don't have a desktop to run any code now.

  3. This code is using half-precision floating numbers with amp. I usually skip the first several frames when testing the inference time to avoid the launch of the operators.

@huixiancheng
Copy link
Contributor Author

thxs 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants