-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding the depth returned by render_data #125
Comments
Hey @Neburski, could you confirm that the raw depth value coming from render_data is 2 meters at the center pixel? At first glance, I believe this is an issue with how you are using the camera intrinsics matrix. Note that the depth returned by nvisii is the distance from the ray origin, which is 0,0,0 in your case. These values cannot be transformed using the intrinsics matrix, since these distances do not start from a near plane. Instead, I recommend using the "position" option to get XYZ coordinates in world space, rather than doing this transformation yourself from depth. |
Hey @natevm, Thank you for your response. The depth as you described is also how I understood it and is what I'm currently after. As I'm still learning the NVISII code base and learning how to use Optix, I'm at bit at a loss as to why I'm only seeing half the depth I'm expecting. In any case, I hope we both can figure out where it's going wrong. First, my script is based on the NVISII example: When I use the "position" option, I obtain the correct location of the plane at 2 m. That is good news! See the green boxes in the next figure which state the location of the plane when using the "position" option. The red boxes is when I use the "depth" option. However, I'm still confused why the depth option doesn't return the expected depth. My understanding is that it should return the ray length from the camera (ray origin) at (0, 0, 0) until intersection with the plane (closest hit), but I'm only getting half of what I'm expecting. The reason why I'm mostly interested in the "depth" option is that this allows me to obtain the ray length from origin to the closest hit. With the "position" option, I would still need to account for the location of the camera. This is easily checked by placing the camera 1 m backwards, i.e., Also to clarify, the I am trying to find in the NVISII code where the "depth" is calculated and stored, i.e., where the payload.tHit is being saved. Do you have any pointers on how I can debug this where I 'step' through the process of a single ray? Also, why do I need to set the frame_count to 2? I'm primarily only interested in a single frame (for now). Finally my updated script to reproduce the figures I posted above (for the expected depth at 3 m). import os
import nvisii
import numpy as np
import math
import matplotlib.pyplot as plt
plt.rcParams["figure.autolayout"] = True
width = 500
height = 500
nvisii.initialize(headless=True, verbose=True, lazy_updates = True)
camera = nvisii.entity.create(
name = "camera",
transform = nvisii.transform.create("camera"),
camera = nvisii.camera.create(
name = "camera",
aspect = float(width)/float(height)
)
)
# Place the camera at the origin looking in the positive z-direction.
# y-axis is the up direction in the final 'iamge data'
camera.get_transform().look_at(
at = (0,0,0),
up = (0,1,0),
eye = (0,0,-1)
)
nvisii.set_camera_entity(camera)
# Create a scene to use for exporting segmentations
plane = nvisii.entity.create(
name="plane",
mesh = nvisii.mesh.create_plane("plane", size=(2, 2)),
transform = nvisii.transform.create("plane"),
material = nvisii.material.create("plane")
)
plane_distance = 2
plane.get_material().set_roughness(1.0)
plane.get_transform().set_position((0,0,plane_distance))
plane.get_transform().set_scale((1,1,1))
nvisii.set_dome_light_intensity(0)
# We only use primary ray for the center of each pixel. This results in 1 ray /
# pixel.
nvisii.sample_pixel_area(
x_sample_interval = (.5, .5),
y_sample_interval = (.5, .5)
)
depth_array = nvisii.render_data(
width=int(width),
height=int(height),
start_frame=0,
frame_count=2, # When this is 1, we get all 0 ray lengths
bounce=int(0),
options="depth"
)
depth_array = np.array(depth_array).reshape(height,width,4)
depth_array = np.flipud(depth_array)
# save the segmentation image
xyz_dat = nvisii.render_data(
width=int(width),
height=int(height),
start_frame=0,
frame_count=2, # When this is 1, we get all 0 ray lengths
bounce=int(0),
options="position"
)
xyz_dat = np.array(xyz_dat).reshape(height,width,4)
xyz_dat = np.flipud(xyz_dat)
# The following line calculates ray lenghts from the origin which in our case
# is the distance between camera origin and plane intersection points.
# The xyz_dat array appears to be working with homogeneous coordinates, thus we
# should divide the XYZ coordinates by the forth coordinates to appropriately
# scale the XYZ coordinates.
xyz_depth = np.linalg.norm(xyz_dat[..., :3]/xyz_dat[..., 3, np.newaxis], axis=-1)
def convert_from_uvd(u, v, d,fx,fy,cx,cy):
x_over_z = (cx - u) / fx
y_over_z = (cy - v) / fy
z = d / np.sqrt(1. + x_over_z**2 + y_over_z**2)
x = x_over_z * z
y = y_over_z * z
return x, y, z
xyz = []
intrinsics = camera.get_camera().get_intrinsic_matrix(width,height)
# Convert the depth / pixel to xyz coordinates
for i in range(height):
for j in range(width):
x,y,z = convert_from_uvd(i,j, depth_array[i,j,0],
intrinsics[0][0], intrinsics[1][1], intrinsics[2][0],intrinsics[2][1])
xyz.append([x,y,z])
pts = np.array(xyz).reshape(width, height, -1)
ray_len = np.linalg.norm(pts, axis=-1)
ray_len = np.where(ray_len>100, np.nan, ray_len)
fig, axs = plt.subplots(2, 2)
img = axs[0, 0].imshow(pts[..., 2], cmap=plt.cm.viridis_r)
axs[0, 0].set_title(f"Raw Depth: avg = {np.mean(depth_array[..., 0]):.2f} [m] v.s. {plane_distance=:.2f} [m]")
fig.colorbar(img, ax=axs[0, 0], label="Raw Depth [m]")
img = axs[0, 1].imshow(xyz_dat[..., 2], cmap=plt.cm.viridis_r)
axs[0, 1].set_title(f"Raw Depth from XYZ: avg_z = {np.mean(xyz_dat[..., 2]):.2f} [m] v.s. {plane_distance=:.2f} [m]")
fig.colorbar(img, ax=axs[0, 1], label="Raw Depth from XYZ [m]")
img = axs[1, 0].imshow(xyz_depth, cmap=plt.cm.viridis_r)
axs[1, 0].set_title(f"Primary Ray Lengths from XYZ: minimum = {np.min(xyz_depth):.2f} [m] v.s. {plane_distance=:.2f} [m]")
fig.colorbar(img, ax=axs[1, 0], label="Ray Length from XYZ [m]")
img = axs[1, 1].imshow(ray_len, cmap=plt.cm.viridis_r)
axs[1, 1].set_title(f"Primary Ray Lengths: minimum = {np.min(ray_len):.2f} [m] v.s. {plane_distance=:.2f} [m]")
fig.colorbar(img, ax=axs[1, 1], label="Ray Length [m]")
fig.canvas.manager.full_screen_toggle()
for ax in axs.flatten():
ax.set_xlabel("X Pixel [#]")
ax.set_ylabel("Y Pixel [#]")
plt.show(block = True)
# let's clean up the GPU
nvisii.deinitialize() |
Apologies for the double post, I had a better look at the I only have a 'basic' understanding about how Optix works (I'm still learning), but when looking at all the In any case, looking at line 915 (see screenshot above), and considering that the "position" option for |
Some more digging. So, my understanding is that the saved I'm assuming that in my case, it is going into the Is it possible that Optix has a different internal representation for the geometry than I expect? |
With ray tracing, rays are traced in world space, and the rays are only guided by the inverse of the intrinsics matrix to determine field of view, but hit depths do not necessarily follow the near plane of this frustum anymore. With a rasterizer, hit depth is with respect from each triangle to each individual pixel on the near plane, while in a ray tracer there is no near plane, and instead it’s the distance from the hit object to the origin of the ray. The later definition is required to handle secondary rays, which are required for reflections / refractions, and do not follow an intrinsics matrix. |
Hm, I didn’t realize that code was from our examples. @jontremblay wrote that I believe. I still do not think that code is correct though for the reasons I explained earlier. We should probably double-check the correctness of that example…
The closest hit program called depends on the ray type, visibility flags, and instance ID. In this case, the volume closest hit program is only called for instances referencing that program in their hit record. So it won’t be called unless your object is a volume, as I don’t add a reference to that closest hit program for surface instances. Yes, feel free to rename the issue. :) |
I am out this weekend. I have an updated version somewhere from what I
remember there was a big and forgot to push the changes. If you have a fix
that might simplify the process. Sorry about that.
…On Fri, Sep 10, 2021 at 13:38 Nathan V. Morrical ***@***.***> wrote:
Hm, I didn’t realize that code was from our examples. @jontremblay
<https://github.com/jontremblay> wrote that I believe. I still do not
think that code is correct though for the reasons I explained earlier. We
should probably fix that example…
Am I correct in understanding that only the
OPTIX_CLOSEST_HIT_PROGRAM(TriangleMesh) is called when an intersection is
found, i.e., the OPTIX_INTERSECT_PROGRAM(VolumeIntersection) will not be
called in this case? I'm asking because a special prd.tHit is calculated in
the OPTIX_INTERSECT_PROGRAM(VolumeIntersection) and I don't immediately
understand what is happening in that program.
The closest hit program called depends on the ray type, visibility flags,
and instance ID. In this case, the volume closest hit program is only
called for instances referencing that program in their hit record. So it
won’t be called unless your object is a volume, as I don’t add a reference
to that closest hit program for surface instances.
Yes, feel free to rename the issue. :)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#125 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABK6JIHOWLP4ROGLSNPMXTDUBJUDHANCNFSM5DSDRKKA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Oh, looking at your code again, I don’t believe the intrinsics matrix accounts for the extrinsics of the camera, which might account for this discrepancy. The extrinsics accounts for the transform objects rotation, scale and translation, while the intrinsics is just the projection matrix. So that -1 offset on your camera isn’t being accounted for I don’t believe. In computer graphics, we typically refer to the extrinsics as the view matrix, which is the inverse of the object’s transform matrix. You can get the transformation from world space to camera space using camera.get_transform().get_world_to_local_matrix(). |
@jontremblay I missed this part. I suppose it’s not that convert_from_uvd function after all.
|
After some more testing there are some peculiar results. When doing two calls to Any ideas what is going on? Overview of changesFirst a short explanation of the changes in the script:
ResultsThe results depend on the order of the two Order 0This case is as before when using
When using
When using
Order 1In this case we first trace for the XYZ positions, then we trace for the depth values. When
When
When
Scriptimport os
import nvisii
import numpy as np
import math
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = 12, 9
plt.rcParams["figure.dpi"] = 100
plt.rcParams["figure.autolayout"] = True
# Some quick cli arguments
import argparse as ap
parser = ap.ArgumentParser(description="Reference Script")
parser.add_argument(
"--order", type=int, default=0, help="Select the order of the tracing"
)
parser.add_argument("--sf", type=int, default=0, help="The start frame")
parser.add_argument("--nf", type=int, default=2, help="The frame count")
args = parser.parse_args()
order = args.order
start_frame = args.sf
num_frame = args.nf
# Bail if invalid order selected
assert order in range(2), "The selected order should be 0 or 1"
width = 500
height = 500
nvisii.initialize(headless=True, verbose=True, lazy_updates=True)
camera = nvisii.entity.create(
name="camera",
transform=nvisii.transform.create("camera"),
camera=nvisii.camera.create(name="camera", aspect=float(width) / float(height)),
)
# Place the camera at the origin looking in the positive z-direction.
# y-axis is the up direction in the final 'iamge data'
camera.get_transform().look_at(at=(0, 0, 1), up=(0, 1, 0), eye=(0, 0, 0))
nvisii.set_camera_entity(camera)
# Create a scene to use for exporting segmentations
plane = nvisii.entity.create(
name="plane",
mesh=nvisii.mesh.create_plane("plane", size=(2, 2)),
transform=nvisii.transform.create("plane"),
material=nvisii.material.create("plane"),
)
plane_distance = 2
plane.get_material().set_roughness(1.0)
plane.get_transform().set_position((0, 0, plane_distance))
plane.get_transform().set_scale((1, 1, 1))
nvisii.set_dome_light_intensity(0)
# We only use primary ray for the center of each pixel. This results in 1 ray /
# pixel.
nvisii.sample_pixel_area(x_sample_interval=(0.5, 0.5), y_sample_interval=(0.5, 0.5))
if order == 0:
fSupTitle = f"Order: Depth $\\rightarrow$ XYZ, {start_frame=}, {num_frame=}"
depth_array = nvisii.render_data(
width=int(width),
height=int(height),
start_frame=start_frame,
frame_count=start_frame + num_frame,
bounce=int(0),
options="depth",
)
xyz_dat = nvisii.render_data(
width=int(width),
height=int(height),
start_frame=start_frame,
frame_count=start_frame + num_frame,
bounce=int(0),
options="position",
)
elif order == 1:
fSupTitle = f"Order: XYZ $\\rightarrow$ Depth, {start_frame=}, {num_frame=}"
xyz_dat = nvisii.render_data(
width=int(width),
height=int(height),
start_frame=start_frame,
frame_count=start_frame + num_frame,
bounce=int(0),
options="position",
)
depth_array = nvisii.render_data(
width=int(width),
height=int(height),
start_frame=start_frame,
frame_count=start_frame + num_frame,
bounce=int(0),
options="depth",
)
# Reshape the depth array. The depth_array has shape (heigth, width, 4). For
# the last axis, the first 3 indices are the depth, but all axis contain the
# same value. This is by design in NVISII, i.e., the frame buffer shape. Only
# the first index is selected as the others are not needed.
depth_array = np.array(depth_array).reshape(height, width, 4)[..., 0]
depth_array = np.flipud(depth_array)
# The array xyz_dat contains the coordinates of the ray intersections. This
# also comes as a (heigth, width, 4). The xyz_dat array appears to be working
# with homogeneous coordinates, thus we should divide the XYZ coordinates by
# the forth coordinates to appropriately scale the XYZ coordinates.
xyz_dat = np.array(xyz_dat).reshape(height, width, 4)
xyz_dat = np.flipud(xyz_dat)
xyz_dat = xyz_dat[..., :3] / xyz_dat[..., 3, np.newaxis]
# Calculates the ray lenghts from the origin to the coordinate. In this case
# this is the distance between camera location and the plane intersection
# points.
ray_len = np.linalg.norm(xyz_dat, axis=-1)
fig, axs = plt.subplots(2, 2)
fig.suptitle(fSupTitle)
img = axs[0, 0].imshow(depth_array, cmap=plt.cm.viridis_r)
axs[0, 0].set_title(
f"NVISII depth: min. = {np.min(depth_array):.2f} [m] v.s. {plane_distance=:.2f} [m]"
)
fig.colorbar(img, ax=axs[0, 0], label="Ray Length (NVISII Depth) [m]")
img = axs[0, 1].imshow(ray_len, cmap=plt.cm.viridis_r)
axs[0, 1].set_title(
f"NVISII XYZ: min. = {np.min(ray_len):.2f} [m] v.s. {plane_distance=:.2f} [m]"
)
fig.colorbar(img, ax=axs[0, 1], label="Ray Length (NVISII XYZ) [m]")
# Keep this plot empty. Normally would contain the Z coordinate calculated from depth.
axs[1, 0].remove()
img = axs[1, 1].imshow(xyz_dat[..., 2], cmap=plt.cm.viridis_r)
axs[1, 1].set_title(
f"Z-coordinate from XYZ: avg = {np.mean(xyz_dat[..., 2]):.2f} [m] v.s. {plane_distance=:.2f} [m]"
)
fig.colorbar(img, ax=axs[1, 1], label="Z-coordinate (NVISII XYZ) [m]")
#fig.canvas.manager.full_screen_toggle()
for ax in axs.flatten():
ax.set_xlabel("X Pixel [#]")
ax.set_ylabel("Y Pixel [#]")
plt.show(block=True)
# let's clean up the GPU
nvisii.deinitialize() |
Ok so I looked at it with the help of Moustafa on some scenes we generated and the depth does not match the real distance of the ray. Should we tag this as a bug @natevm and say tell people to use position? |
According to the above results, the depth does indeed match the intended results. It’s just that those results seem to depend on the sample or function order… the later would be a bug, the former may or may not be a bug depending on how subsequent Python code manipulates the data. With antialiasing on, of course the per pixel results will change depending on frame start and frame count. But I wouldn’t expect such large differences. Swapping the order suggests to me that there is some sort of race condition happening. I’d be curious to see if these problems go away when using interactive mode instead of headless mode. @Neburski would you be able to test that? I’m still not convinced we have a handle on what the bug is @TontonTremblay. So I’m hesitant to conclude at the moment that it’s how depth is being calculated. |
I have a set of utilities that extract depth images from intrinsics = visii.entity.get(
camera_name).get_camera().get_intrinsic_matrix(width, height)
intrinsics = mat_to_numpy(intrinsics)
distance = render_utils.render_distance(width, height)
depth = render_utils.depth_image_from_distance_image(
distance, intrinsics) The depth image also looks correct to me. The camera is looking at Code snippet that contains the relevant functions used above. def reshape_data_tuple(data, width, height):
"""Reshapes a data tuple to OpenCV coordinates
"""
# H x W x 4 array but (0,0) is bottom left corner
# need to flip it using PIL
data = np.array(data).reshape(height, width, 4)
# flip along first axis
# this is equivalent to PIL.FlIP_TOP_BOTTOM
data = np.flip(data, axis=0)
return data
def render_distance(width, height):
"""Renders a distance image.
- This is NOT A DEPTH IMAGE in the standard computer vision
conventions.
- Background pixels are assigned a distance of 0
d = sqrt(X_c^2 + Y_c^2 + Z_c^2) in OpenCV convention
https://docs.opencv.org/3.4/d9/d0c/group__calib3d.html
It's the distance from camera origin to the point P.
Args:
width: int, image width
height: int, image height
Returns:
distance: numpy array H x W, dtype = np.float (meters)
Missing/background values are set to 0 distance
"""
visii.sample_pixel_area(
x_sample_interval=(.5, .5),
y_sample_interval=(.5, .5))
# tuple
# background/empty pixels get assigned the value
# -3.4028234663852886e+38
distance = visii.render_data(
width=int(width),
height=int(height),
start_frame=0,
frame_count=1,
bounce=int(0),
options="depth"
)
# H x W x 4, dtype=fl
distance = reshape_data_tuple(distance, width, height)
# H x W
distance = distance[..., 0]
# render segmentation image to detect background pixels
# and set their distance values to 0
seg = render_segmentation(width, height)
distance[seg < 0] = 0
return distance
def depth_image_from_distance_image(distance, intrinsics):
"""Computes depth image from distance image.
Background pixels have depth of 0
Args:
distance: HxW float array (meters)
intrinsics: 3x3 float array
Returns:
z: HxW float array (meters)
"""
fx = intrinsics[0, 0]
cx = intrinsics[0, 2]
fy = intrinsics[1, 1]
cy = intrinsics[1, 2]
height, width = distance.shape
xlin = np.linspace(0, width - 1, width)
ylin = np.linspace(0, height - 1, height)
px, py = np.meshgrid(xlin, ylin)
x_over_z = (px - cx) / fx
y_over_z = (py - cy) / fy
z = distance / np.sqrt(1. + x_over_z**2 + y_over_z**2)
return z |
@manuelli Thank you for testing this perpendicular case. The previous one is difficult to validate if everything is at the correct distances. Your remark about depth v.s. distance (I'm aware of the difference), however, both are incorrect in my case. Can I ask some of you to try the script I placed in a previous post above. It is the full script which I have in a file called python -i plane_depth.py If possible, could you also try the following option: python -i plane_depth.py --order 1 Can you then report what the results are? |
@Neburski could you run that script with headless=False and see if you still get this issue? I suspect this is a race condition when headless mode is enabled. |
@natevm When I set Also, swapping the order has no impact anymore as it should be. |
Aha, this is useful information. That confirms this is indeed a race condition when lazy updating is enabled. (I think headless mode always assumes lazy updating, which delays updating scene acceleration structures until a render call, at which point all updates are batched for improved performance) |
For reference I rendered those scenes with |
For the moment, I'm going to change the behavior of "lazy_updates". In retrospect, lazy_updates was kinda a temporary measure to try to gain some perf. But I think a better mechanism is to just use the "enable_updates" and "disable_updates" functions instead to scope sections where many components are being generated. So I'm for now going to remove lazy_updates from the initialize function and then assume the "false" behavior there. |
When using the render_data method, I noticed the following two issues when working with the attached minimal working example at bottom:
Either I'm not understanding the 'depth' option properly or incorrectly configuring the ray tracer or could there be a bug?
Short explanation of the minimal working example
At the end of the script, it plots the depth and the primary ray lengths and as can be seen from those plots the resulting depth is 1 m instead of the expected 2 m.
Details about my setup
I compiled NVISII from scratch using the following:
I'm more familiar with Linux systems, however, the NVIDIA GPU I have available is currently on a windows only system. I'm not able to build the Debug version to use Nsight (mostly due to swig requiring the python debugging library? Some pointers would be appreciated).
Feel free to request more information.
Minimal working example.
The text was updated successfully, but these errors were encountered: