# LIBERO Dataset Exploration

This notebook explores the LIBERO spatial dataset structure stored in HDF5 format. Each file contains multiple robot manipulation demonstrations.

## Dataset Components

- **Actions**: 7D normalized vectors `[Δx, Δy, Δz, Δroll, Δpitch, Δyaw, gripper]` (range [-1, 1])
- **Observations**: 
  - RGB images: `agentview_rgb`, `eye_in_hand_rgb` (128×128×3)
  - End-effector: `ee_pos` (3D), `ee_ori` (3D Euler), `ee_states` (6D concatenated)
  - Robot: `joint_states` (7D), `gripper_states` (2D)
- **Episode Info**: `dones` (termination flags), `rewards` (sparse: 0/1)
- **States**: `robot_states` (9D: gripper + ee_pos + quaternion), `states` (92D environment state)


In [14]:
import h5py
from pathlib import Path

# LIBERO dataset exploration
# The dataset is stored in HDF5 format
# Each HDF5 file contains multiple demonstrations of a specific task

# Change this if you want a different suite / task
dataset_root = Path("../libero/datasets/libero_spatial")
hdf5_file = dataset_root / "pick_up_the_black_bowl_next_to_the_cookie_box_and_place_it_on_the_plate_demo.hdf5"

# Open the HDF5 file and extract demonstration data
with h5py.File(hdf5_file, "r") as f:
    data = f["data"]
    demos = list(data.keys())  # Get list of all demonstration IDs in the file

    # Select a specific demonstration (using index 2)
    demo0 = data[demos[6]]
    
    # Print available data keys to understand the structure
    print("One demo data contains the following keys:", list(demo0.keys()))
    
    # Extract all data from the demonstration into a dictionary
    # [:] reads the actual data from the HDF5 dataset (not just a reference)
    demo0_data = {
        'actions': demo0["actions"][:],  # Robot actions: [num_timesteps, action_dim] - typically 7D (position, orientation, gripper)
        'dones': demo0["dones"][:],  # Episode termination flags: [num_timesteps] - True when task is complete
        'rewards': demo0["rewards"][:],  # Reward signal: [num_timesteps] - sparse rewards for task completion
        'obs': {key: demo0["obs"][key][:] for key in demo0["obs"].keys()},  # Observations: dict of various sensor readings
        'robot_states': demo0["robot_states"][:],  # Full robot state: [num_timesteps, state_dim]
        'states': demo0["states"][:]  # Environment state: [num_timesteps, state_dim]
    }

One demo data contains the following keys: ['actions', 'dones', 'obs', 'rewards', 'robot_states', 'states']


In [15]:
# Re-inspect the first action (useful for comparison or debugging)
print('Libero represents actions using an end-effector cartesian representation.')
print('The seven dimentsion represent the position and oritnetation of the end-effector:')
print('Δx, Δy, Δz, Δroll, Δpitch, Δyaw, gripper')
print('Action shape:', demo0_data['actions'].shape)
print('Action store in a normalized way:', demo0_data['actions'].min(), demo0_data['actions'].max())

Libero represents actions using an end-effector cartesian representation.
The seven dimentsion represent the position and oritnetation of the end-effector:
Δx, Δy, Δz, Δroll, Δpitch, Δyaw, gripper
Action shape: (141, 7)
Action store in a normalized way: -1.0 1.0


In [16]:
# Check the shape of the done flags
# Each timestep has a boolean indicating if the episode terminated
print('In the done array, a one indicates the end of the demonstration')
print('The shape of the done array is:', demo0_data['dones'].shape)
print(demo0_data['dones'])

In the done array, a one indicates the end of the demonstration
The shape of the done array is: (141,)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]


In [17]:
# Check the shape of the reward signal
print('LIBERO typically uses sparse rewards 0 most of the time, 1 when task completes')
print('The shape of the reward signal is:', demo0_data['rewards'].shape)
# View all reward values
# In sparse reward settings, most values are 0, with 1 at task completion
print(demo0_data['rewards'])

LIBERO typically uses sparse rewards 0 most of the time, 1 when task completes
The shape of the reward signal is: (141,)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]


In [18]:
print('The vector robot_states contains the full robot configuration as computed by the physical engine: ')
print('-- 2 varibles for the left and right gripper joint states')
print('-- 3 variables for the end-effector coordinates')
print('-- 4 variables for the gripper base orientation as a quaternion')
print(demo0_data['robot_states'][0,:])

The vector robot_states contains the full robot configuration as computed by the physical engine: 
-- 2 varibles for the left and right gripper joint states
-- 3 variables for the end-effector coordinates
-- 4 variables for the gripper base orientation as a quaternion
[ 0.0362455  -0.03621179 -0.21422191 -0.01943306  1.17593119  0.99883435
 -0.03581595 -0.0300125  -0.01209963]


In [39]:
print(demo0_data['obs']['joint_states'][0,:])

[ 0.01097094 -0.17940503 -0.06104203 -2.45571358  0.01567369  2.21628218
  0.79897862]


In [19]:
print('The first 2 elements are the left and right gripper joint states, compare with the first 2 of the robot_states')
print(demo0_data['obs']['gripper_states'][0,:])

print('THe next 3 are the end-effector position and orientation, compare with the values of ee_pos')
print(demo0_data['obs']['ee_pos'][0,:])

The first 2 elements are the left and right gripper joint states, compare with the first 2 of the robot_states
[ 0.0362455  -0.03621179]
THe next 3 are the end-effector position and orientation, compare with the values of ee_pos
[-0.21422191 -0.01943306  1.17593119]


In [24]:
print('The variable states contains the full state of the environment, including the positions of the objects:')
print('It is a vector with 92 entries:', demo0_data['states'].shape)
print(demo0_data['states'].min())

The variable states contains the full state of the environment, including the positions of the objects:
It is a vector with 92 entries: (141, 92)
-2.4574334135159837


In [31]:
print('The obs entrie contains the following keys:')
print('-- agentview_rgb, eye_in_hand_rgb: the view of from 2 different cameras')
print('-- ee_pos, ee_ori, ee_states: the end-effector position and orientation. In ee_states the position and orientation are concatenated')
print('-- gripper_joint_states: the joint states of the gripper')
print('-- joint_states: the joint states of the robot arm')
print(demo0_data['obs'].keys())

The obs entrie contains the following keys:
-- agentview_rgb, eye_in_hand_rgb: the view of from 2 different cameras
-- ee_pos, ee_ori, ee_states: the end-effector position and orientation. In ee_states the position and orientation are concatenated
-- gripper_joint_states: the joint states of the gripper
-- joint_states: the joint states of the robot arm
dict_keys(['agentview_rgb', 'ee_ori', 'ee_pos', 'ee_states', 'eye_in_hand_rgb', 'gripper_states', 'joint_states'])


In [33]:
print('The agent view and the eye in hand view are 3 channel images with a resolution of 128x128')
print(demo0_data['obs']['agentview_rgb'].shape)
print(demo0_data['obs']['eye_in_hand_rgb'].shape)

The agent view and the eye in hand view are 3 channel images with a resolution of 128x128
(141, 128, 128, 3)
(141, 128, 128, 3)


In [35]:
print('The ee_states contains the position and orientation of the end-effector. The ee_ori is a 3D vector, so it is probably a euler angle representation')
print(demo0_data['obs']['ee_states'].shape)
print(demo0_data['obs']['ee_pos'].shape)
print(demo0_data['obs']['ee_ori'].shape)

print('The ee_states concatenates the ee_pos and ee_ori')
print(demo0_data['obs']['ee_states'][0, :])
print(demo0_data['obs']['ee_pos'][0, :])
print(demo0_data['obs']['ee_ori'][0, :])

The ee_states contains the position and orientation of the end-effector. The ee_ori is a 3D vector, so it is probably a euler angle representation
(141, 6)
(141, 3)
(141, 3)
The ee_states concatenates the ee_pos and ee_ori
[-0.21422191 -0.01943306  1.17593119  3.16233381 -0.11339417 -0.09502031]
[-0.21422191 -0.01943306  1.17593119]
[ 3.16233381 -0.11339417 -0.09502031]


In [37]:
print('The gripper states contains the joint states of the gripper')
print(demo0_data['obs']['gripper_states'].shape)

print('The joint states contains the joint states of the robot arm')
print(demo0_data['obs']['joint_states'].shape)

The gripper states contains the joint states of the gripper
(141, 2)
The joint states contains the joint states of the robot arm
(141, 7)
