You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your excellent work! I have some questions to ask.
I would like to know how the synthetic data is obtained? I noticed that the synthetic data is about cityscapes, I would like to know how the GT from the BEV perspective is synthesized, I would like to use this generation method for the synthesis of indoor scene data, can you elaborate?
Regarding section3-C of the paper: single-Input Model, is the input of this part the projection of the inference results obtained by the trained segmentation algorithm through the IPM? So the function of the network here is to correct the error caused by IPM? So the pipeline for the application is:
Input image -segmentation result -ipm - single-input model - final result, is my understanding correct?
In section 3-D: Muti-Input Model, do you want to integrate IPM into the network and perform end-to-end segmentation from the perspective of BEV? But if there is no segmentation of GT from the perspective of BEV? How should this be done? For example, I only have segmentation annotations in perspective view, does that mean I can't use this method at all? So how do I apply him to real scenarios?
According to the above questions, can I understand that the key to the algorithm lies in the generation of segmentation annotations from the BEV perspective. Back to the first question, that is to say, the most important part is data synthesis? Is my understanding correct? I'm curious to know if I only have a self-labeled indoor scene dataset - 2d, how can I extend your method to BEV perspective?
Looking forward to your reply! ! !
The text was updated successfully, but these errors were encountered:
As stated in the paper, the synthetic training data was generated with a simulation tool called Virtual Test Drive (VTD). The ground truth therefore is directly obtained from the simulation. The semantic class coloring palette is akin to Cityscapes.
Yes, that's correct. Note however that the results in the paper were obtained with perfectly segmented input images from simulation. In practice, you would have to run a dedicated segmentation model first, as we also did in section V.B.
Yes, in III.D, IPM is basically integrated into the network. The input to the network is still semantically segmented images in order to decrease the domain gap between synthetic training data and real-world data.
You need to have ground truth data at hand, i.e. semantic segmentation in BEV. Our approach is to generate synthetic training data using simulation tools such as VTD or CARLA, giving us the ground truth in BEV basically for free. By running on semantically segmented input images, we hope to decrease the domain gap between simulation and real world, s.t. we can successfully apply a trained model in the real world as well. In your specific case, you might also want to consider generating synthetic datasets or alternatively you could first of all see whether standard IPM already gives you usable results (see our ipm.py).
Thank you for your excellent work! I have some questions to ask.
Input image -segmentation result -ipm - single-input model - final result, is my understanding correct?
Looking forward to your reply! ! !
The text was updated successfully, but these errors were encountered: