<a href="https://colab.research.google.com/github/haosulab/SAPIEN-tutorial/blob/master/rendering/1_camera.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

> Note: Some core features of SAPIEN are not available on Colab, including the interactive viewer and ray-tracing functionalities. You need to run SAPIEN locally for full features. You can also find the latest SAPIEN tutorial at [SAPIEN's documentation](https://sapien.ucsd.edu/docs/latest/index.html).

# Rendering in SAPIEN

SAPIEN is integrated with the powerful `SapienRenderer`, which supports both rasterization and ray tracing. With `SapienRenderer`, you can generate photorealistic images or depth maps at an incredibly high speed. In this tutorial series, you will learn how to render in SAPIEN.

# Rendering Tutorial 1: Camera

In this tutorial, you will learn the following:

- Create a camera `CameraEntity` and mount it to an actor
- Off-screen rendering for RGB, depth, point cloud and segmentation



## Preparation

> Note: you need GPU runtime to run the notebook. If you are running on Colab, you might see a warning asking you to restart the runtime after running the following cell for the first time. In that case, restart the runtime as instructed and rerun the cell (Otherwise there might be some issues with imported packages). The warning should disappear after restarting.

In [None]:
%pip install sapien
%pip install open3d

import sapien.core as sapien
import numpy as np
import open3d as o3d
import os
from PIL import Image, ImageColor

try:
    import google.colab
    IN_COLAB = True
except:
    IN_COLAB = False

## Create and mount a camera

First of all, let’s set up the engine, renderer, scene, lighting, and load a URDF file.

In [None]:
engine = sapien.Engine()
renderer = sapien.SapienRenderer()
engine.set_renderer(renderer)

scene = engine.create_scene()
scene.set_timestep(1 / 100.0)

if IN_COLAB:
    !gdown -q 1phqv-pvgvYHmkJKI3KH8uHwxwf-HKA-m
    !unzip -o -q 179.zip
    assets_dir = "."
else:
    assets_dir = "../assets"

urdf_path = os.path.join(assets_dir, "179/mobility.urdf")

loader = scene.create_urdf_loader()
loader.fix_root_link = True
asset = loader.load_kinematic(urdf_path)


scene.set_ambient_light([0.5, 0.5, 0.5])
scene.add_directional_light([0, 1, -1], [0.5, 0.5, 0.5], shadow=True)
scene.add_point_light([1, 2, 2], [1, 1, 1], shadow=True)
scene.add_point_light([1, -2, 2], [1, 1, 1], shadow=True)
scene.add_point_light([-1, 0, 1], [1, 1, 1], shadow=True)

We create the Vulkan-based renderer by calling `sapien.SapienRenderer(offscreen_only=...)`. If `offscreen_only=True`, the on-screen display is disabled. It works without a window server like x-server. You can forget about all the difficulties working with x-server and OpenGL!

Next, you can create a camera as follows:

In [None]:
near, far = 0.1, 100
width, height = 640, 480
camera = scene.add_camera(
    name="camera",
    width=width,
    height=height,
    fovy=np.deg2rad(35),
    near=near,
    far=far,
)
camera.set_pose(sapien.Pose(p=[1, 0, 0]))

This camera is now placed at coordinate [1, 0, 0] without rotation.

An camera can also be mounted onto an `Actor` to keep a pose relative to the actor as follows:

In [None]:
camera_mount_actor = scene.create_actor_builder().build_kinematic()
camera.set_parent(parent=camera_mount_actor, keep_pose=False)

# Compute the camera pose by specifying forward(x), left(y) and up(z)
cam_pos = np.array([-2, -2, 3])
forward = -cam_pos / np.linalg.norm(cam_pos)
left = np.cross([0, 0, 1], forward)
left = left / np.linalg.norm(left)
up = np.cross(forward, left)
mat44 = np.eye(4)
mat44[:3, :3] = np.stack([forward, left, up], axis=1)
mat44[:3, 3] = cam_pos
camera_mount_actor.set_pose(sapien.Pose.from_transformation_matrix(mat44))

The camera is mounted on the the `camera_mount_actor` through `set_parent`. The pose of the camera relative to the mount is specified through `set_local_pose`.

> Note: Calling `set_local_pose` without a parent sets the global pose of the camera. Callling `set_pose` with a parent results in an error, as it is ambiguous.

The process of adding and mounting a camera can be achieved through the convenience function `add_mounted_camera`. The following cell will have the exact same effect as first `add_camera` and then `set_parent`.

    near, far = 0.1, 100
    width, height = 640, 480
    camera_mount_actor = scene.create_actor_builder().build_kinematic()
    camera = scene.add_mounted_camera(
        name="camera",
        actor=camera_mount_actor,
        pose=sapien.Pose(),  # relative to the mounted actor
        width=width,
        height=height,
        fovy=np.deg2rad(35),
        near=near,
        far=far,
    )

If the mounted actor is kinematic (or static), the camera moves along with the actor when the actor of the actor is changed through set_pose. If the actor is dynamic, the camera moves along with it during dynamic simulation.

> Note: Note that the axes conventions for SAPIEN follow the conventions for robotics, while they are different from those for many graphics softwares (like OpenGL and Blender). For a SAPIEN camera, the x-axis points forward, the y-axis left, and the z-axis upward.
>
> However, do note that the “position” texture (camera-space point cloud) obtained from the camera still follows the graphics convention (x-axis right, y-axis upward, z-axis backward). This maintains consistency of SAPIEN with most other graphics software. This will be further discussed below.

## Render an RGB image

To render from a camera, you need to first update all object states to the renderer. Then, you should call `take_picture()` to start the rendering task on the GPU.

In [None]:
scene.step()  # make everything set
scene.update_render()
camera.take_picture()

Now, we can acquire the RGB image rendered by the camera.

In [None]:
rgba = camera.get_float_texture('Color')  # [H, W, 4]
# An alias is also provided
# rgba = camera.get_color_rgba()  # [H, W, 4]
rgba_img = (rgba * 255).clip(0, 255).astype("uint8")
Image.fromarray(rgba_img)

## Generate point cloud

Point cloud is a common representation of 3D scenes. The following code showcases how to acquire the point cloud in SAPIEN.

In [None]:
# Each pixel is (x, y, z, render_depth) in camera space (OpenGL/Blender)
position = camera.get_float_texture('Position')  # [H, W, 4]

We acquire a “position” image with 4 channels. The first 3 channels represent the 3D position of each pixel in the OpenGL camera space, and the last channel stores the z-buffer value commonly used in rendering. When is value is 1, the position of this pixel is beyond the far plane of the camera frustum.

In [None]:
# OpenGL/Blender: y up and -z forward
points_opengl = position[..., :3][position[..., 3] < 1]
points_color = rgba[position[..., 3] < 1][..., :3]
# Model matrix is the transformation from OpenGL camera space to SAPIEN world space
# camera.get_model_matrix() must be called after scene.update_render()!
model_matrix = camera.get_model_matrix()
points_world = points_opengl @ model_matrix[:3, :3].T + model_matrix[:3, 3]

Note that the position is represented in the OpenGL camera space, where the negative z-axis points forward and the y-axis is upward. Thus, to acquire a point cloud in the SAPIEN world space (x forward and z up), we provide `get_model_matrix()`, which returns the transformation from the OpenGL camera space to the SAPIEN world space. Let's visualize the point cloud by Open3D:

In [None]:
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(points_world)
pcd.colors = o3d.utility.Vector3dVector(points_color)
coord_frame = o3d.geometry.TriangleMesh.create_coordinate_frame()
o3d.visualization.draw_plotly([pcd, coord_frame])

Besides, the depth map can be obtained as well.

In [None]:
depth = -position[..., 2]
depth_image = (depth * 1000.0).astype(np.uint16)
Image.fromarray(depth_image)

The point cloud and depth map are based on the *clean* depth obtained during the rendering process. In future tutorial, you will see that SAPIEN is also able to generate *realistic* depth that looks extremely similar to the depth computed by real-world depth sensors. Using realistic depth can greatly help closing the sim-to-real gap, which we will discuss in the future.

## Visualize segmentation

SAPIEN provides the interfaces to acquire object-level segmentation.

In [None]:
seg_labels = camera.get_uint32_texture('Segmentation')  # [H, W, 4]
colormap = sorted(set(ImageColor.colormap.values()))
color_palette = np.array([ImageColor.getrgb(color) for color in colormap],
                            dtype=np.uint8)
label0_image = seg_labels[..., 0].astype(np.uint8)  # mesh-level
label1_image = seg_labels[..., 1].astype(np.uint8)  # actor-level
# Or you can use aliases below
# label0_image = camera.get_visual_segmentation()
# label1_image = camera.get_actor_segmentation()
label0_pil = Image.fromarray(color_palette[label0_image])
print("Mesh-level segmentation")
display(label0_pil)
print("Actor-level segmentation")
label1_pil = Image.fromarray(color_palette[label1_image])
display(label1_pil)