Can the scale ambiguity be resolved with multi-frame approach? #59

JinraeKim · 2022-09-02T20:44:06Z

Hi, I'm trying to implement self-supervised depth estimation for robotic applications and started studying this area.
Thank you for the impressive works in advance!

My concern is that if one uses only RGB images, the scale awareness would be impossible in principle (even with the multi-frame approach). The scale would be consistent though.
So many works use "median scaling" using ground truth depth image, but it does not make sense as the true depth would not be accessible.

In this case, what would be the best practice for realistic application?

JamieWatson683 · 2023-01-03T10:34:59Z

Hi - thanks for your interest, and sorry for the delay in getting back to you!

It's a good question, and is an active area of research.

There are 2 approaches that I can think of (although there are probably many more!)

Use a device with an IMU, and run a visual-inertial SLAM system to compute poses. I.e. an iphone running ARKit.
In autonomous driving (or potential robotics) where the camera is at a fixed height above the ground plane, you have a constraint which can be used to give you real world scale. There are a few papers using this method to obtain ground truth scale when training from video on KITTI

Hope this helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can the scale ambiguity be resolved with multi-frame approach? #59

Can the scale ambiguity be resolved with multi-frame approach? #59

JinraeKim commented Sep 2, 2022

JamieWatson683 commented Jan 3, 2023

Can the scale ambiguity be resolved with multi-frame approach? #59

Can the scale ambiguity be resolved with multi-frame approach? #59

Comments

JinraeKim commented Sep 2, 2022

JamieWatson683 commented Jan 3, 2023