Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom dataset with poses and intrinsics included -> NSVF Format #10

Open
phelps-matthew opened this issue Dec 15, 2021 · 7 comments
Open

Comments

@phelps-matthew
Copy link

I have a large dataset comprising renders of a single object taken over a fairly dense sampling of poses (rotations and translations). I also have the camera intrinsics and distortion coefficients (though it looks like these are usually not incorporated in most radiance field work?).

I was hoping you might be able to lend some guidance on how I can use this supplemental information to form a dataset that is compatible with svox2. Specifically, do you have any tips on how I might leverage colmap and colmap2nsvf.py? When running proc_colmap.sh on a directory of raw images, I see it produces its own pose.txt estimates, database.db, and points.npy and appears to sample only a subset of the given images. Are there any modifications I should be making that are immediately evident to you?

Any help is greatly appreciated!

@sxyu
Copy link
Owner

sxyu commented Dec 15, 2021

Hi, thanks for the question.

If you want to use your own camera poses, you will have to process them our NSVF-based format, which is fairly simple anyway (see below). Other than proc_colmap there is also proc_record3d.py which processes captures from the iPhone app Record3D to our format; this might be a helpful example.

Currently svox2 itself only supports the pinhole model fx/fy/cx/cy.
The run_colmap.py script (called by proc_colmap.sh) actually estimates radial distortion parameters by default with COLMAP but will undistort the images. For simplicity, you can also use OpenCV to undistort your own images.

Format reference:

intrinsics.txt: 4x4 matrix,

fx 0 0 cx 
0 fy 0 cy
0 0 1 0
0 0 0 1

images/ or rgb/: images (*.png or *.jpg)
pose/: 4x4 c2w pose matrix for each image (*.txt), OpenCV convention

@phelps-matthew
Copy link
Author

phelps-matthew commented Dec 15, 2021

Thank you kindly! I may try undistorting all my images, though the distortion coefficients are very small here, so I'm going to ignore them for the moment.

I was able to get the nsvf dataset loader working after formatting my images and poses to the following convention (had to add in a conversion from grayscale to rgb)

<dataset_name>
|-- bbox.txt         # bounding-box file
|-- intrinsics.txt   # 4x4 camera intrinsics
|-- images
    |-- 0_000001.png        # target image for each view
    ...
    |-- 1_000001.png
    ...
|-- pose
    |-- 0_000001.txt        # camera pose for each view (4x4 matrices)
    ...
    |-- 1_000001.txt
    ...

I'll continue training and testing, granted there are quite a number of hyperparamers to adjust here, but hoping I can start to see the rough formation of my imaged object.

Do you know what convention rotation matrices are to follow for NSVF? Having a difficult time determining if my axes are aligned with its standard. For example, here is my distribution of camera poses

image

@phelps-matthew
Copy link
Author

In case someone else will find this helpful.. I believe COLMAP follows the format of the projection matrix given as transforming 3D camera coordinates to world coordinates. Hence, to go from the above image as formed from W -> C transformation to this image,
image

try the following:

# Given 3x3 W -> C SO(3) matrix and r, the translation vector, form the correct 4x4 transformation matrix
# X_w = R^T X_c - R^T t (cam to world, what was needed)
# X_c = R X_w + t (world to cam, what I had before)                                                                                                                                                                                       
Rt = np.matmul(so3.transpose(), r)                                                                                                                                                                                                                                                                                        
trans = np.vstack((np.hstack((so3.transpose(), -Rt.reshape(-1, 1))), [0, 0, 0, 1]))

You can then view using python view_data.py <data_root>. All one needs is images, poses, and intrinsics that follow the above format (no bbox.txt or other files strictly needed).

@phelps-matthew
Copy link
Author

images/ or rgb/: images (*.png or .jpg) pose/: 4x4 c2w pose matrix for each image (.txt), OpenCV convention

Apologies, I totally missed this remark! Would have saved myself a headache 😂

@qhdqhd
Copy link

qhdqhd commented Jan 2, 2022

how can i use views with different intrinsics (images captured by multi-cameras)?

@LinGeLin
Copy link

LinGeLin commented Jan 3, 2022

what does <CHECKPOINT.npz> <data_dir> mean?

@povolann
Copy link

povolann commented Jan 2, 2023

I am little bit confused about the intrinsic matrix, shouldn't it be like this?

fx 0 cx 0
0 fy cy 0
0 0 1 0
0 0 0 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants