# ðŸ’¿ Dataset Conversion

This notebook converts raw robot recordings (`.mcap` files) into the LeRobot format required for training. 

The process involves:
1.  **Exploring** the available raw data.
2.  **Configuring** the dataset parameters (e.g., observations, actions).
3.  **Running** the conversion script.

--- 
## 1. Explore Raw Data

First, let's list the available raw data directories. Each directory contains a set of `.mcap` files from different teleoperation sessions.

In [1]:
!du -sh /data/raw/*

60G	/data/raw/1_practice
31G	/data/raw/5_drawer-opening


In [2]:
!du -sh /data/raw/5_drawer-opening/anonymous/*

27M	/data/raw/5_drawer-opening/anonymous/20251119_093948_prepare.mcap
725M	/data/raw/5_drawer-opening/anonymous/20251119_093955_prepare.mcap
179M	/data/raw/5_drawer-opening/anonymous/20251119_094049_task_success_good.mcap
410M	/data/raw/5_drawer-opening/anonymous/20251119_094107_prepare.mcap
188M	/data/raw/5_drawer-opening/anonymous/20251119_094138_task_success_good.mcap
474M	/data/raw/5_drawer-opening/anonymous/20251119_094158_prepare.mcap
212M	/data/raw/5_drawer-opening/anonymous/20251119_094234_task_success_good.mcap
398M	/data/raw/5_drawer-opening/anonymous/20251119_094259_prepare.mcap
186M	/data/raw/5_drawer-opening/anonymous/20251119_094329_task_success_good.mcap
687M	/data/raw/5_drawer-opening/anonymous/20251119_094348_prepare.mcap
188M	/data/raw/5_drawer-opening/anonymous/20251119_094440_task_success_good.mcap
489M	/data/raw/5_drawer-opening/anonymous/20251119_094501_prepare.mcap
151M	/data/raw/5_drawer-opening/anonymous/20251119_094537_task_success_good.mcap
2.8G	/data/raw/5_d

--- 
## 2. Configure Conversion

Now, specify the input and output paths and define the dataset's structure. 

> **Action Required:** Update `RAW_DATA_DIR` and `OUTPUT_DIR` below.

In [3]:
import pathlib
from example_policies.data_ops.config.pipeline_config import PipelineConfig, ActionLevel

# --- Paths ---
# TODO: Set the input directory containing your .mcap files.
RAW_DATA_DIR = pathlib.Path("/data/raw/5_drawer-opening/anonymous/")

# TODO: Set your desired output directory name.
OUTPUT_DIR = pathlib.Path("/data/sort_duebel")

# --- Configuration ---
# TODO: A descriptive label for the task, used for VLA-style text conditioning.
TASK_LABEL = "sort the screws and plugs in the respective box sections"

cfg = PipelineConfig(
    task_name=TASK_LABEL,
    # Observation features to include in the dataset.
    include_tcp_poses=True,
    include_rgb_images=True,
    include_depth_images=False,
    # Action representation. DELTA_TCP is a good default.
    action_level=ActionLevel.TCP,
    # Subsampling and filtering. These are task-dependent.
    target_fps=10,
    max_pause_seconds=0.2,
    min_episode_seconds=1,
)

print(f"Input path:  {RAW_DATA_DIR}")
print(f"Output path: {OUTPUT_DIR}")

Input path:  /data/raw/5_drawer-opening/anonymous
Output path: /data/sort_duebel


--- 
## 3. Run Conversion

This cell executes the conversion process. It may take a while depending on the size of your data. You will see progress updates printed below.

In [5]:
from example_policies.data_ops.dataset_conversion import convert_episodes

episode_paths = get_selected_episodes(config.episodes_dir, success_only=True)
convert_episodes(RAW_DATA_DIR, OUTPUT_DIR, cfg)

ModuleNotFoundError: No module named 'example_policies.utils'

--- 
## âœ… Done!

Your new dataset is ready at the output path you specified. You can now proceed to the next notebook to train a policy.