Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching the geometry to the input image #22

Closed
jamalknight opened this issue Mar 26, 2020 · 6 comments
Closed

Matching the geometry to the input image #22

jamalknight opened this issue Mar 26, 2020 · 6 comments
Assignees
Labels
question Further information is requested

Comments

@jamalknight
Copy link

Hi there

I have a question about aligning the geometry to a camera in a 3D app like Maya.
Object classification/segmentation is ok example

The geometry obj file is created, but I was wondering if there is a way to align the geometry with a perspective camera that matches the original image.

  • This geometry is in 3d space not aligned here
  • The geometry is manually aligned here by eye here, but might not be accurate

I would like it to match the image - is there a relatively simple way this could be done?

@gkioxari
Copy link
Contributor

gkioxari commented Mar 26, 2020

Hi @jamalknight! This is an excellent question! I want to give a complete (and thus somewhat long!) response, as I think this will be useful for others as well. My answer below assumes a perspective camera.
(Edit: I posted a different answer before because I misunderstood the issue but this answer version should be addressing your question!)

What does Mesh R-CNN output?

First, let's see what Mesh R-CNN outputs. Mesh R-CNN returns the 3D shape of an object in the camera coordinate system confined in a 3D box which respects the aspect ratio of the object detected in the image. If you provide the focal length f of the camera and the actual depth location t_z of the object's center , i.e. how far the center of the object is from the image plane in the Z axis, then Mesh R-CNN would pixel align the predicted 3D object shape with the image and the prediction would correspond to the true metric size of the object - its actual scale in the real world!.

Metric Scale

While most images nowadays have access to their focal_length f from the image metadata, knowing t_z is difficult. Also note that Mesh R-CNN does not make a prediction for t_z because Pix3D does not contain useful metric depth of the objects. In the Pix3D annotations, the tuple (f, t_z) provided does not correspond to the actual camera metadata nor metric depth of the object but is computed subsequently at annotation time by their annotation process and annotation tool and thus is somewhat adhoc. This is the reason we don't tackle the problem of estimating t_z (this problem is also called the scene layout prediction problem).

I don't care about metric scale. I just want to pixel align.

However if you don't care about metric scale and you only care about pixel aligning the object to the image, that is possible with our demo! The demo runs with a default focal_length f=20 (this is the blender focal length assuming 32mm sensor width and is not the true focal_length of the image! We make it up!) . The demo also places the object at some arbitrary t_z > 0.0, again this is not the true metric depth of the object. Given these choices of (f, t_z), the demo will output an object shape placed at t_z. The metric size of the predicted object from the demo will not correspond to the true size of the object in the world, but it will be a scaled version of it. Now to pixel align the predicted shape with the image, all you need to do is render the 3D mesh with f=20. Note that the value 20 is inconsequential. You would be getting the same pixel alignment if f was something else, but it's important that the value of f you pick when running the demo is also used when rendering!

Here is an example! When I run the demo on an input image (1st image), it recognizes the sofa (2nd image). I get a 3D shape prediction for the sofa which after I render with blender with focal length f=20 I get the final result (3rd image).

input
segmentation
rendered_output

@vadimkantorov
Copy link

vadimkantorov commented Aug 5, 2021

@gkioxari Could you please share the blender script (if you still have it) you used to produce the third image? Also I wonder if it's possible to render the mesh onto the image using PyTorch3D directly. Figuring this out (new to 3D)...

@vadimkantorov
Copy link

vadimkantorov commented Aug 6, 2021

I've imported the "chair" mesh (that I obtained from running demo.py) into Blender, but the camera looks away from the object. Do I need to manually reset the camera position / orientation? If yes, what should it be (location, rotation angles, focal length, clip start/end)?

Thank you!

Screenshot 2021-08-06 at 11 03 34

@vadimkantorov
Copy link

vadimkantorov commented Aug 6, 2021

I am currently using (location: (0, 0, 0), rotation: (180deg, 0,180deg), focal length: 20mm, clip start/end: 0.1m/100m)

I'm still getting a gray render when I go to camera view. Is it a problem with the light? or maybe clipping?

Here's the original image and mesh produced by MeshRCNN: chair3.zip

Thank you @gkioxari !

@vadimkantorov
Copy link

Hmm, enabling "Depth of Field" makes it show something. Not sure why "Depth of Field" is needed...

@vadimkantorov
Copy link

vadimkantorov commented Aug 6, 2021

Now I got the one below. Does it make sense?

Welcoming any advice on adjusting camera / light parameters :) I'm a complete noob in Blender :(

0_mask_chair_1 000

Screenshot 2021-08-06 at 14 17 52

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants