Skip to content
This repository has been archived by the owner on Apr 18, 2023. It is now read-only.

question about the Pre-processing #1

Closed
Robertwyq opened this issue Aug 17, 2021 · 11 comments
Closed

question about the Pre-processing #1

Robertwyq opened this issue Aug 17, 2021 · 11 comments

Comments

@Robertwyq
Copy link

Can you provide the code for preprocessing part? I wonder for dynamic video, how to get accurate camera pose and K? I see you use DAVIS for example, I want to know how to deal with other videos in this dataset.

@ztzhang
Copy link
Contributor

ztzhang commented Aug 17, 2021

Hi,

Most of the preprocessing was done using internal tools at google, but as a rule of thumb, it usually works well to use mask-rcnn for segmenting the foreground object, and then run colmap with the segmented masks.

For the Davis example, most the calibration is done using the camera calibration tool in Nuke, with some mannual selection of detected keypoints.

@Robertwyq
Copy link
Author

Thank you for your reply. I noticed that you mentioned using ORBSLAM2 and COLMAP to produce camera pose estimates in your paper. I wonder whether the above method can be successful without mask or manual selection for only a small number of foreground moving objects.

@ztzhang
Copy link
Contributor

ztzhang commented Aug 18, 2021

orbslam typically works with reasonable dynamic scenes but it does require camera intrinsics. I think you can assume a reasonable focal length, and pass the keypoints and camera calibrations to colmap for further optimizations.

@Robertwyq
Copy link
Author

Thank you very much😁

@Robertwyq Robertwyq reopened this Aug 26, 2021
@Robertwyq
Copy link
Author

I want to confirm the pose information required for the preprocessing process. Is the coordinate system of pose consistent with colmap? It is the world to camera coordinate system Tcw.

@ztzhang
Copy link
Contributor

ztzhang commented Aug 26, 2021

Hi, we assume a x right, y down Image coordinate system where the origin is top left, and the pose matrices in the npz files are camera2world transformations.

@Robertwyq
Copy link
Author

Robertwyq commented Aug 27, 2021

Add an additional discovery, does the text of the output video confuse refine and initial? Looks like the first one is initial, and second one is refine.
image
image

@ztzhang
Copy link
Contributor

ztzhang commented Aug 27, 2021

I think the order is correct; the initial depth will be flickering due to inconsistency.

The refined depth might suffer some detail loss due to flow inaccuracies on fast moving/ thin structures

@Robertwyq
Copy link
Author

Thank you for your reply. When I tested some videos of road scenes, I found that it would blur a lot of distant details, but I found that there were still a lot of details in the initial estimation of the network. I want to know if there are any suggestions on network parameter adjustment. I think it is mainly due to the influence of flow information. Can you give me some suggestions?
image
image

@ztzhang
Copy link
Contributor

ztzhang commented Aug 31, 2021

Hi, since our method takes optical flow and camera poses as geometric cues, objects further in the scene need more accurate flows and larger baselines. Single image depth maps are trained in a supervised way, therefore, are agnostic to such issues, but not temporally consistent.

@Robertwyq
Copy link
Author

Thank you for your reply.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants