What is the use of 1.312 on line 47 in DistDepth/utils.py? #27

ch654833197 · 2023-11-12T12:45:04Z

What is the use of 1.312 on line 47 in DistDepth/utils.py? Why is the scale factor here calculated like this? Without using stereo matching? Can you explain the principle? Thanks!

choyingw · 2023-11-13T05:24:01Z

The term is associated with 13.12cm stereo baseline used to render SimSIN Dataset of the during training. The unit of stereo_baseline used for training is 10cm (

DistDepth/datasets/SimSINbase.py

Line 187 in 45e7e3f

stereo_T[0, 3] = side_sign * baseline_sign * 0.1

), so we x 1.312 to match the scale learned for trained on SimSIN. The logic is the same as STEREO_SCALE_FACTOR in MonoDepth2 test script (https://github.com/nianticlabs/monodepth2/blob/master/test_simple.py)

ch654833197 · 2023-11-13T11:35:44Z

The term is associated with 13.12cm stereo baseline used to render SimSIN Dataset of the during training. The unit of stereo_baseline used for training is 10cm (

DistDepth/datasets/SimSINbase.py

Line 187 in 45e7e3f

stereo_T[0, 3] = side_sign * baseline_sign * 0.1

), so we x 1.312 to match the scale learned for trained on SimSIN. The logic is the same as STEREO_SCALE_FACTOR in MonoDepth2 test script (https://github.com/nianticlabs/monodepth2/blob/master/test_simple.py)

Thank you very much for your reply! Can I understand that the formula Depth= Baseline*focal/disparity is applied to this part of the code? Baseline corresponds to 1.312, is scaled_out equivalent to disparity/focal? Is the output of the network actually a disparity? For example, here, normalizing to [0,10] is equivalent to doing focal length transformation? In supervised training, what will be the result if normalization is not performed here?

min_out = 1 / max_depth
max_out = 1 / min_depth
scaled_out = min_out + (max_out - min_out) * level
depth = 1.312 / scaled_out

choyingw · 2023-11-15T06:34:05Z

One view is as you said the scaled_out is focal/disparity under 10cm baseline. So just scale up with wider baseline.

Another view: it is a ratio problem. Suppose a network learns depth from intrinsic/ extrinsic projection in the definition of baseline 10cm (and since it is self-supervised no groudtruth are provided to correct the scale), but actually it is 13.12cm. From the equation depth = baseline*focal/disparity the same image with the same focal and disparity, the depth is proportional to baseline, and thus the actual depth is *1.312

Normalizing to [0, 10] gives a range prior of depth in views, which can actually cause a bit scale not matching to test scene, as the same issue of MonoDepth works. In supervised learning, you will have all metric stuff and doesn't need scaling. Even learned in inverse depth, you also have metric inverse depth.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the use of 1.312 on line 47 in DistDepth/utils.py? #27

What is the use of 1.312 on line 47 in DistDepth/utils.py? #27

ch654833197 commented Nov 12, 2023

choyingw commented Nov 13, 2023

ch654833197 commented Nov 13, 2023

choyingw commented Nov 15, 2023

What is the use of 1.312 on line 47 in DistDepth/utils.py? #27

What is the use of 1.312 on line 47 in DistDepth/utils.py? #27

Comments

ch654833197 commented Nov 12, 2023

choyingw commented Nov 13, 2023

ch654833197 commented Nov 13, 2023

choyingw commented Nov 15, 2023