Visualization of hdf5 VR demos.

Source: https://github.com/StanfordVL/behavior/blob/main/docs/vr_demos.md

## Processed demos
The following are the available keys to index into the hdf5 file. The dimensionality of each component is noted parenthetically, where `N` indicates the number of frames in the demo.

- action (N x 28) -- see BehaviorRobot description in the [Embodiments section](agents.md) for details about the actuation of this robot. This vector contains two additional dimensions that correspond to the `hand reset` action in VR: an action that teleports the simulated hands to the exact pose of the VR hand controller when they have diverged. These actions are not used by AI agents but are necessary to understand the demos.
- proprioception (N x 22) -- proprioceptive feedback. More details in the [Embodiments section](agents.md).
- rgb (N x 128 x 128 x 3) -- rgb image from camera
- depth (N x 128 x 128 x 1) -- depth map
- seg (N x 128 x 128 x 1) -- segmentation of scene
- ins_seg (N x 128 x 128 x 1) -- instance segmentation
- highlight ( N x 128 x 128 x 1) -- activity relevant object binary mask, active for all objects included in the activity goal (except the agent and the floor)
- task_obs (N x 456) -- task observations, including ground truth state of the robot, and ground truth poses and grasping state of a maximum of a fixed number of activity relevant objects

In [11]:
import h5py
import numpy as np

hdf5_path="/datasets/behavior-100-replay/bottling_fruit_0_Wainscott_0_int_0_2021-05-24_19-46-46_episode.hdf5"


def print_hdf5_tree(file, indent=0):
    """
    Recursively prints the structure of an HDF5 file in tree form.
    """
    if isinstance(file, h5py.Group):  # If it's a group, iterate through its items
        for key, item in file.items():
            print("  " * indent + f"- {key}")
            if isinstance(item, h5py.Group):
                print_hdf5_tree(item, indent + 1)  # Recurse into groups
            elif isinstance(item, h5py.Dataset):
                print("  " * (indent + 1) + f"(Dataset: shape={item.shape}, dtype={item.dtype})")


with h5py.File(hdf5_path, 'r') as hdf_file:
    print("HDF5 File Structure:")
    print_hdf5_tree(hdf_file)
    print(hdf_file.attrs.keys())

HDF5 File Structure:
- action
  (Dataset: shape=(3000, 28), dtype=float64)
- depth
  (Dataset: shape=(3000, 128, 128, 1), dtype=float32)
- highlight
  (Dataset: shape=(3000, 128, 128, 1), dtype=float32)
- ins_seg
  (Dataset: shape=(3000, 128, 128, 1), dtype=float32)
- proprioception
  (Dataset: shape=(3000, 22), dtype=float32)
- rgb
  (Dataset: shape=(3000, 128, 128, 3), dtype=float32)
- seg
  (Dataset: shape=(3000, 128, 128, 1), dtype=float32)
- task_obs
  (Dataset: shape=(3000, 456), dtype=float32)
<KeysViewHDF5 ['/metadata/activity', '/metadata/activity_id', '/metadata/physics_timestep', '/metadata/render_timestep', '/metadata/scene_id', '/metadata/vr_settings']>


In [15]:
with h5py.File(hdf5_path, "r") as hdf_file:
    depth = hdf_file["depth"][:]          # (3000, 128, 128, 1)
    rgb = hdf_file["rgb"][:]              # (3000, 128, 128, 3)
    seg = hdf_file["seg"][:]              # (3000, 128, 128, 1)
    highlight = hdf_file["highlight"][:]  # (3000, 128, 128, 1)
    
np.unique(highlight)

array([0., 1.], dtype=float32)

In [4]:
import h5py
import numpy as np
import av
import cv2


# Open the HDF5 file

with h5py.File(hdf5_path, "r") as hdf_file:
    depth = hdf_file["depth"][:]          # (3000, 128, 128, 1)
    rgb = hdf_file["rgb"][:]              # (3000, 128, 128, 3)
    seg = hdf_file["seg"][:]              # (3000, 128, 128, 1)
    highlight = hdf_file["highlight"][:]  # (3000, 128, 128, 1)

# Normalize depth and highlight to [0, 255] and convert to uint8
depth = (depth.squeeze() * 255).clip(0, 255).astype(np.uint8)
highlight = (highlight.squeeze() * 255).clip(0, 255).astype(np.uint8)

# Convert RGB from [0, 1] to [0, 255] and uint8
rgb = (rgb * 255).clip(0, 255).astype(np.uint8)

# Segmented RGB (RGB + Segmentation overlay)
seg_colored = np.stack([seg.squeeze()] * 3, axis=-1) * 255  # Convert seg to 3-channel mask
segmented_rgb = (rgb + seg_colored).clip(0, 255).astype(np.uint8)

# RGB with highlight overlay
highlight_colored = np.stack([highlight] * 3, axis=-1)      # Convert highlight to 3-channel mask
rgb_highlight = (rgb + highlight_colored).clip(0, 255).astype(np.uint8)

# Output video path
output_path = "visualizations.mp4"
fps = 30

# Create a PyAV video container
container = av.open(output_path, mode='w')
stream = container.add_stream("h264", rate=fps)
# stream = container.add_stream("mpeg4", rate=fps)
stream.width = 128 * 4  # Combined frame width
stream.height = 128     # Frame height
stream.pix_fmt = "yuv420p"

# Create video frames
for i in range(depth.shape[0]):
    # Prepare each frame
    depth_frame = cv2.cvtColor(depth[i], cv2.COLOR_GRAY2BGR)  # Convert to 3 channels
    rgb_frame = rgb[i]
    seg_rgb_frame = segmented_rgb[i]
    highlight_frame = rgb_highlight[i]

    # Concatenate frames horizontally
    combined_frame = np.hstack((depth_frame, rgb_frame, seg_rgb_frame, highlight_frame))
    
    # Convert to PyAV format (RGB to YUV420)
    frame = av.VideoFrame.from_ndarray(combined_frame, format="rgb24")
    packet = stream.encode(frame)
    if packet:
        container.mux(packet)

# Finalize the video
container.close()
print(f"Video saved to {output_path}")


Video saved to visualizations.mp4


In [5]:
from IPython.display import HTML
from IPython.display import Video

# Path to the video file
video_path = "visualizations.mp4"

# Display the video in the notebook
HTML(f"""
<video width="640" height="480" controls>
  <source src="{video_path}" type="video/mp4">
  Your browser does not support the video tag.
</video>
""")

## Raw VR demos
The metadata can be accessed by keying into the hdf5.attrs with the following keys:
- `/metadata/start_time`: the date the demo was recorded
- `/metadata/physics_timestep`: the simulated time duration of each step of the physics simulator (1/300 seconds for all our demos)
- `/metadata/render_timestep`: the simulated time between each rendered image, determines the framerate (1/30 seconds). `render_timestep / physics_timestep` gives the number of physics simulation steps between two generated images (10)
- `/metadata/git_info`: the git info for activity definition, `iGibson`, `ig_dataset`, and `ig_assets`. This is used to ensure participants are using a compatible version of iGibson if replaying the demo
- `/metadata/task_name`: The name of the activity, e.g. `washing_dishes`, `putting_away_groceries`...
- `/metadata/task_instance`: The instance of the activity that specifies the state (pose, extended state) of the sampled activity relevant objects at initialization
- `/metadata/scene_id`: The scene (`Rs_int`, `Wainscott_0_int`, etc.) where the activity was recorded
- `/metadata/filter_objects`: Whether only activity relevant objects were recorded in the activity
- `/metadata/obj_body_id_to_name`: mapping of pybullet IDs to the semantic object name


### HDF5 Content

The following are the available keys to index into the hdf5 file. The dimensionality of each component is noted parenthetically, where `N` indicates the number of frames in the demo.

- `frame_data` (N x 4) 
    - `goal_status` 
        - `satisfied` (N x total_goals) -- Total satisfied top-level predicates, where total_goals is the number of predicates 
        - `unsatisfied` (N x total_goals) -- Total unsatisfied top-level predicates, where total_goals is the number of predicates 
- `physics_data` 
    - `string` (bullet_id: total number of activity-relevant scene objects)
    - `position` (N x 3) -- The 3D position of the object center of mass
    - `orientation` (N x 4) -- The quaternion orientation of the object
    - `joint_state` (N x number of object joints) -- The pybullet joint state of each object 
- `vr` 
    - `vr_camera`
        - `right_eye_view` (N x 4 x 4) -- the view projection matrix
        - `right_eye_proj` (N x 4 x 4) -- the camera projection matrix 
        - `right_camera_pos` (N x 3) -- The 3D position of the camera 
    - `vr_device_data` 
        - `hmd` (N x 17) -- see below
        - `left_controller` (N x 27) -- see below
        - `right_controller` (N x 27) -- see below 
        - `vr_position_data` (N x 12) -- see below
        - `torso_tracker` (N x 8) -- see below
    - `vr_button_data`
        - `left_controller` (N x 3) -- see below
        - `right_controller` (N x 3) -- see below
    - `vr_eye_tracking_data` 
        - `left_controller` (N x 9) -- see below 
    - `vr_event_data` 
        - `left_controller` (N x 28) -- see below
        - `right_controller` (N x 28) -- see below
        - `reset_actions` (N x 2) -- reset for left and right controller
- `Agent_actions`
    - `vr_robot` (N x 28) -- see BEHAVIOR robot description in previous section
- `action` -- unused 

Additional description of the dimensions of the arrays noted above: the following are not keys but correspond to indices of the associated array:

- `hmd` (17) 
    - hmd tracking data is valid (1)
    - translation (3) 
    - rotation (4)
    - right vector (3) 
    - up vector (3)
    - forward vector (3)
- `left_controller`/`right_controller` (27) 
    - controller tracking data is valid (1)
    - translation (3)
    - rotation (4)
    - right vector (3) 
    - up vector (3)
    - forward vector (3)
    - base_rotation (4)
    - base_rotation * controller_rotation (4)
    - applied_force (6) -- p.getConstraintState(controller_constraint_id)
- `vr_button_data`
    - trigger fraction (1) --- open: 0 -> closed: 1 
    - touchpad x position (1) -- left: -1 -> right: 1
    - touchpad y position (1) -- bottom: -1 -> right: 1
- `Vr_eye_tracking_data`
    - eye-tracking data is valid (1)
    - origin of gaze in world space (3) 
    - direction vector of gaze in world space (3)
    - left pupil diameter (1)
    - right pupil diameter (1) 
- `vr_position_data` (12)
    - position of the system in iGibson space (3)
    - offset of the system from the origin (3)
    - applied force to vr body (6) -- p.getConstraintState(body_constraint_id)
- `torso_tracker` (8)
    - torso tracker is valid (1)
    - position (3)
    - rotation (4) 

In [9]:
import h5py
import numpy as np

hdf5_path="/datasets/behavior-100-replay/bottling_fruit_0_Wainscott_0_int_0_2021-05-24_19-46-46.hdf5"


def print_hdf5_tree(file, indent=0):
    """
    Recursively prints the structure of an HDF5 file in tree form.
    """
    if isinstance(file, h5py.Group):  # If it's a group, iterate through its items
        for key, item in file.items():
            print("  " * indent + f"- {key}")
            if isinstance(item, h5py.Group):
                print_hdf5_tree(item, indent + 1)  # Recurse into groups
            elif isinstance(item, h5py.Dataset):
                print("  " * (indent + 1) + f"(Dataset: shape={item.shape}, dtype={item.dtype})")


with h5py.File(hdf5_path, 'r') as hdf_file:
    print("HDF5 File Structure:")
    print_hdf5_tree(hdf_file)
    print(hdf_file.attrs.keys())

HDF5 File Structure:
- agent_actions
  - vr_robot
    (Dataset: shape=(3000, 28), dtype=float64)
- frame_data
  (Dataset: shape=(3000, 4), dtype=float64)
- goal_status
  - satisfied
    (Dataset: shape=(3000, 5), dtype=float64)
  - unsatisfied
    (Dataset: shape=(3000, 5), dtype=float64)
- physics_data
  - 1
    - joint_state
      (Dataset: shape=(3000, 0), dtype=float64)
    - orientation
      (Dataset: shape=(3000, 4), dtype=float64)
    - position
      (Dataset: shape=(3000, 3), dtype=float64)
  - 103
    - joint_state
      (Dataset: shape=(3000, 3), dtype=float64)
    - orientation
      (Dataset: shape=(3000, 4), dtype=float64)
    - position
      (Dataset: shape=(3000, 3), dtype=float64)
  - 107
    - joint_state
      (Dataset: shape=(3000, 0), dtype=float64)
    - orientation
      (Dataset: shape=(3000, 4), dtype=float64)
    - position
      (Dataset: shape=(3000, 3), dtype=float64)
  - 120
    - joint_state
      (Dataset: shape=(3000, 1), dtype=float64)
    - orientat