Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the camera intrinsics matrix #4

Closed
OasisYang opened this issue Jul 30, 2021 · 11 comments
Closed

About the camera intrinsics matrix #4

OasisYang opened this issue Jul 30, 2021 · 11 comments

Comments

@OasisYang
Copy link

Hi! Thanks for your wonderful dataset!
I have a question about the camera intrinsics matrix. I found for all data the principal_point is [0, 0], which is really rare for real-world cameras. Could you please explain it briefly? Thanks in advance.

@davnov134
Copy link
Contributor

HI, we store the intrinsics in the PyTorch3D convention. More info here:
https://pytorch3d.org/docs/cameras

@OasisYang
Copy link
Author

Thanks for your reply! I know I need to convert the given principal point to screen space. But I mean after converting, the principal point also located at the center of image, which is not very common. Did you warp the image or implement other preprocessing steps to make sure it?

@davnov134
Copy link
Contributor

The location of the principal point is decided by the COLMAP image rectification algorithm.
I just checked the raw COLMAP data, and it seems that the COLMAP undistorter also resamples the image so that the principal point is exactly coinciding with the center of the image.
Thanks for spotting this.

@liuyuan-pal
Copy link

Thanks for sharing this dataset.

I finally figure out how to convert the annotation to the opencv-style extrinsics and intrinsics, which may be helpful for others:

def co3d_annotation_to_opencv_pose(entry):
    p = entry.viewpoint.principal_point
    f = entry.viewpoint.focal_length
    h, w = entry.image.size
    K = np.eye(3)
    s = (min(h, w) - 1) / 2
    K[0, 0] = f[0] * (w - 1) / 2
    K[1, 1] = f[1] * (h - 1) / 2
    K[0, 2] = -p[0] * s + (w - 1) / 2
    K[1, 2] = -p[1] * s + (h - 1) / 2

    R = np.asarray(entry.viewpoint.R).T   # note the transpose here
    T = np.asarray(entry.viewpoint.T)
    pose = np.concatenate([R,T[:,None]],1)
    pose = np.diag([-1,-1,1]).astype(np.float32) @ pose # flip the direction of x,y axis

    # "pose" is the extrinsic and "K" is the intrinsic
    # pose = [R|t]
    # x_img = K (R @ x_wrd + t)
    # x_img is in pixel

However, I still have a question about how to convert the points from the estimated depth into the coordinate system of "pointcloud.ply".

@MaybeOjbk
Copy link

Thanks for sharing this dataset.

I finally figure out how to convert the annotation to the opencv-style extrinsics and intrinsics, which may be helpful for others:

def co3d_annotation_to_opencv_pose(entry):
    p = entry.viewpoint.principal_point
    f = entry.viewpoint.focal_length
    h, w = entry.image.size
    K = np.eye(3)
    s = (min(h, w) - 1) / 2
    K[0, 0] = f[0] * (w - 1) / 2
    K[1, 1] = f[1] * (h - 1) / 2
    K[0, 2] = -p[0] * s + (w - 1) / 2
    K[1, 2] = -p[1] * s + (h - 1) / 2

    R = np.asarray(entry.viewpoint.R).T   # note the transpose here
    T = np.asarray(entry.viewpoint.T)
    pose = np.concatenate([R,T[:,None]],1)
    pose = np.diag([-1,-1,1]).astype(np.float32) @ pose # flip the direction of x,y axis

    # "pose" is the extrinsic and "K" is the intrinsic
    # pose = [R|t]
    # x_img = K (R @ x_wrd + t)
    # x_img is in pixel

However, I still have a question about how to convert the points from the estimated depth into the coordinate system of "pointcloud.ply".

Thanks a lot, and also we should use parameters of train_dataset[idx].camera instead of parameters of entry.viewpoint, when we need to crop images, because after we crop and resize images, principal_point and focal_length may changed

@MaybeOjbk
Copy link

MaybeOjbk commented Dec 18, 2021

my test code is here:

  train_dataset = datasets['train']
  def co3d_annotation_to_opencv_pose(idx):
      camera = train_dataset[idx].camera
      p = camera.principal_point[0]
      f = camera.focal_length[0]
      R = camera.R[0]
      T = camera.T[0]
      _, h, w = train_dataset[idx].image_rgb.size()
      K = np.eye(3)
      s = (min(h, w) - 1) / 2
      K[0, 0] = f[0] * (w - 1) / 2
      K[1, 1] = f[1] * (h - 1) / 2
      K[0, 2] = -p[0] * s + (w - 1) / 2
      K[1, 2] = -p[1] * s + (h - 1) / 2
  
      R = np.asarray(R).T   # note the transpose here
      T = np.asarray(T)
      pose = np.concatenate([R,T[:,None]],1)
      pose = np.diag([-1,-1,1]).astype(np.float32) @ pose # flip the direction of x,y axis
      return K, pose

@shapovalov
Copy link
Contributor

Please note that PyTorch3D NDC convention has −1 and 1 coordinates at the corners of the image, not the centres of the corner pixels, so you should not subtract 1 from h, w, and min(h, w).

Please try to use the provided data loaders. If they do not fulfil some needs, please let us know.

The reference for parsing the viewpoint (applying the crop if needed) is https://github.com/facebookresearch/co3d/blob/main/dataset/co3d_dataset.py#L490 .
For conversion to OpenCV format, PyTorch3D has a function
https://github.com/facebookresearch/pytorch3d/blob/main/pytorch3d/utils/camera_conversions.py#L65
with the actual implementation in
https://github.com/facebookresearch/pytorch3d/blob/main/pytorch3d/renderer/camera_conversions.py#L61 .

@Red-Fairy
Copy link

my test code is here:

  train_dataset = datasets['train']
  def co3d_annotation_to_opencv_pose(idx):
      camera = train_dataset[idx].camera
      p = camera.principal_point[0]
      f = camera.focal_length[0]
      R = camera.R[0]
      T = camera.T[0]
      _, h, w = train_dataset[idx].image_rgb.size()
      K = np.eye(3)
      s = (min(h, w) - 1) / 2
      K[0, 0] = f[0] * (w - 1) / 2
      K[1, 1] = f[1] * (h - 1) / 2
      K[0, 2] = -p[0] * s + (w - 1) / 2
      K[1, 2] = -p[1] * s + (h - 1) / 2
  
      R = np.asarray(R).T   # note the transpose here
      T = np.asarray(T)
      pose = np.concatenate([R,T[:,None]],1)
      pose = np.diag([-1,-1,1]).astype(np.float32) @ pose # flip the direction of x,y axis
      return K, pose

Thanks for sharing. I tried to obtain the cam2world matrix by using your code to get the pose and then inverse it. However, I found all the translation part of the cam2world matrix (i.e., [:, 3:]) has a negative third entry, indicating that the object is placed at z<0, which seems very strange. Did you observe this phenomenon? Thanks in advance.

@AzmiHaider92
Copy link

AzmiHaider92 commented Feb 16, 2024

Edited:

In these docs:
https://pytorch3d.org/docs/cameras

the focal lengths are:

s = min(w,h)
K[0, 0] = f[0] * s / 2
K[1, 1] = f[1] * s / 2

And in the code posted above, the focal lengths are:

K[0, 0] = f[0] * w / 2
K[1, 1] = f[1] * h / 2

Which one is correct?

Thanks

@shapovalov
Copy link
Contributor

At some point after CO3Dv1 release PyTorch3D changed the NDC convention; they differ for non-square images.
If the viewpoint annotation contains intrinsics_format: ndc_isotropic field, it is the new format (your former snippet applies), otherwise it is the legacy format (the latter snippet applies).

See https://github.com/facebookresearch/co3d/blob/main/co3d/dataset/data_types.py#L72

Hope it helps.

@AzmiHaider92
Copy link

Yes. This is very helpful.
Thank you so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants