Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few questions about your work #34

Closed
Bin-ze opened this issue Dec 7, 2022 · 1 comment
Closed

A few questions about your work #34

Bin-ze opened this issue Dec 7, 2022 · 1 comment

Comments

@Bin-ze
Copy link

Bin-ze commented Dec 7, 2022

Thank you for your excellent work! I have some questions to ask.

  1. I would like to know how the synthetic data is obtained? I noticed that the synthetic data is about cityscapes, I would like to know how the GT from the BEV perspective is synthesized, I would like to use this generation method for the synthesis of indoor scene data, can you elaborate?
  2. Regarding section3-C of the paper: single-Input Model, is the input of this part the projection of the inference results obtained by the trained segmentation algorithm through the IPM? So the function of the network here is to correct the error caused by IPM? So the pipeline for the application is:
    Input image -segmentation result -ipm - single-input model - final result, is my understanding correct?
  3. In section 3-D: Muti-Input Model, do you want to integrate IPM into the network and perform end-to-end segmentation from the perspective of BEV? But if there is no segmentation of GT from the perspective of BEV? How should this be done? For example, I only have segmentation annotations in perspective view, does that mean I can't use this method at all? So how do I apply him to real scenarios?
  4. According to the above questions, can I understand that the key to the algorithm lies in the generation of segmentation annotations from the BEV perspective. Back to the first question, that is to say, the most important part is data synthesis? Is my understanding correct? I'm curious to know if I only have a self-labeled indoor scene dataset - 2d, how can I extend your method to BEV perspective?
    Looking forward to your reply! ! !
@lreiher
Copy link
Member

lreiher commented Dec 10, 2022

  1. As stated in the paper, the synthetic training data was generated with a simulation tool called Virtual Test Drive (VTD). The ground truth therefore is directly obtained from the simulation. The semantic class coloring palette is akin to Cityscapes.
  2. Yes, that's correct. Note however that the results in the paper were obtained with perfectly segmented input images from simulation. In practice, you would have to run a dedicated segmentation model first, as we also did in section V.B.
  3. Yes, in III.D, IPM is basically integrated into the network. The input to the network is still semantically segmented images in order to decrease the domain gap between synthetic training data and real-world data.
  4. You need to have ground truth data at hand, i.e. semantic segmentation in BEV. Our approach is to generate synthetic training data using simulation tools such as VTD or CARLA, giving us the ground truth in BEV basically for free. By running on semantically segmented input images, we hope to decrease the domain gap between simulation and real world, s.t. we can successfully apply a trained model in the real world as well. In your specific case, you might also want to consider generating synthetic datasets or alternatively you could first of all see whether standard IPM already gives you usable results (see our ipm.py).

@Bin-ze Bin-ze closed this as completed Dec 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants