Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

What's basetransf matrix used for? #6

Closed
Qingcsai opened this issue May 5, 2022 · 5 comments
Closed

What's basetransf matrix used for? #6

Qingcsai opened this issue May 5, 2022 · 5 comments

Comments

@Qingcsai
Copy link

Qingcsai commented May 5, 2022

Hi, I have a little question when I apply my own data using this code.
What's the self.basetransf matrix used for as in multiviewvideo.py do?
I find this 3x4 matrix is applied for all camera poses and all the frametransf, but wht's the purpose for it? :)

mvp/data/multiviewvideo.py

Lines 410 to 415 in d758f53

if "camera" in self.keyfilter:
result["campos"] = np.dot(self.basetransf[:3, :3].T, self.campos[cam] - self.basetransf[:3, 3])
result["camrot"] = np.dot(self.basetransf[:3, :3].T, self.camrot[cam].T).T
result["focal"] = self.focal[cam]
result["princpt"] = self.princpt[cam]
result["camindex"] = self.allcameras.index(cam)

mvp/data/multiviewvideo.py

Lines 385 to 387 in d758f53

result[k] = to4x4(np.dot(
np.linalg.inv(to4x4(frametransf)),
to4x4(self.basetransf))[:3, :4])

Besides, I find it necessary to apply this basetransf, because when I change it to an eyes matrix, it didn't converge during training. So how to get my own basetransf?

Your answer will help me a lot!
Thank you!

@stephenlombardi
Copy link

Hi,

The purpose of basetransf is to center the object of interest. The reason why we need it is because the coordinate frame (i.e., extrinsics) of the cameras may not have the object at the origin of the world. For example, in our camera calibration process one of the cameras is selected as the world origin (0,0,0) after calibration (in other words, one of the cameras is the origin of the coordinate frame). However, the raymarching code shoots rays from the camera and intersects them with the axis-aligned box from [-1,1]^3. For that reason, we need basetransf, which transforms the camera locations/orientations into a new coordinate frame where object the cameras are pointed at are at the world origin (0,0,0). There's also a parameter called "volradius" which is the radius of this axis-aligned bounding box [-1,1]^3 in world space. In other words, it scales the coordinate frame down so that the object fits in [-1,1]^3.

To determine basetransf for your own data there's a few things you can try. The simplest would be to just average the camera positions and put it into the last column of basetransf with the identity matrix along the diagonal (i.e.,
[1., 0., 0., cam_average[0]],
[0., 1., 0., cam_average[1]],
[0., 0., 1., cam_average[2]]).
Note that this will only work well if your cameras are located on a sphere and all pointed inward to the same point. In general, you want the last column of basetransf to be the object position in the camera coordinate frame. Let me know if this helps.

@Qingcsai
Copy link
Author

Qingcsai commented May 6, 2022

Thanks! It helps a lot, my cameras are located on a sphere with about 100+ views, and after I transl the scene/object to world origin (0,0,0), I find it not converge for some other reasons, I am checking for it.

@Qingcsai
Copy link
Author

Qingcsai commented May 11, 2022

Hi @stephenlombardi,
I am still getting some troubles with your code apply at interhand dataset, could you help me?
The cameras are located on a sphere with about 100+ views, after I set basetransf with the identity matrix along the diagonal as you said and the translation are one of the cameras to keep the cameras are pointed at the world origin (0,0,0).

And in the simplest cases I use the neural volumn setting rather than mvp settings.

However I find it converge so slow during training.

Here is the result of rendering after training 5000 epoch, with batchsize=16, volradius=512:

render_ROM0_None.mp4

As you can see, the volumn is distributed everywhere rather than on a tight area, which I think is abnormal.

The images are 512*334 size, am I setting the right volradius=512? Because I see the dryice1 data is 667*1024 size and setting the volradius=256.

@Qingcsai
Copy link
Author

Qingcsai commented May 17, 2022

Oh, It's is training normally for nv setting right now after setting the right camera params. Closed as marked.

@TinBacon
Copy link

TinBacon commented Nov 10, 2022

Oh, It's is training normally for nv setting right now after setting the right camera params. Closed as marked.

hello, i have a same problem with u, could u tell me which camera parameters u have set that make everything right?

thanks very much!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants