# ðŸ’¿ Dataset Conversion

This notebook converts raw robot recordings (`.mcap` files) into the LeRobot format required for training. 

The process involves:
1.  **Exploring** the available raw data.
2.  **Configuring** the dataset parameters (e.g., observations, actions).
3.  **Running** the conversion script.

--- 
## 1. Explore Raw Data

First, let's list the available raw data directories. Each directory contains a set of `.mcap` files from different teleoperation sessions.

In [None]:
!du -sh /data/raw/*

--- 
## 2. Configure Conversion

Now, specify the input and output paths and define the dataset's structure. 

> **Action Required:** Update `RAW_DATA_DIR` and `OUTPUT_DIR` below.

In [None]:
import pathlib
from example_policies.data_ops.config.pipeline_config import PipelineConfig
from example_policies.utils.action_order import ActionMode

# --- Paths ---
# TODO: Set the input directory containing your .mcap files.
RAW_DATA_DIR = pathlib.Path("/data/raw/[TODO]")

# TODO: Set your desired output directory name.
OUTPUT_DIR = pathlib.Path("/data/[TODO]")

# --- Configuration ---
# TODO: A descriptive label for the task, used for VLA-style text conditioning.
TASK_LABEL = "pick and stack the [TODO] colored lego block on the white lego block"

cfg = PipelineConfig(
    task_name=TASK_LABEL,
    # Observation features to include in the dataset.
    include_tcp_poses=True,
    include_rgb_images=True,
    include_depth_images=False,
    include_last_command=False,
    # Action representation. TCP is a good default.
    action_level=ActionMode.TCP,
    # Subsampling and filtering. These are task-dependent.
    recording_fps=20,
    target_fps=10,
    max_pause_seconds=0.2,
    min_episode_seconds=1,
)

print(f"Input path:  {RAW_DATA_DIR}")
print(f"Output path: {OUTPUT_DIR}")

--- 
## 3. Run Conversion

This cell executes the conversion process. It may take a while depending on the size of your data. You will see progress updates printed below.

In [None]:
from example_policies.data_ops.dataset_conversion import convert_episodes
from example_policies.data_ops.utils.conversion_utils import get_selected_episodes

episode_paths = get_selected_episodes(RAW_DATA_DIR, success_only=True)
convert_episodes(episode_paths, OUTPUT_DIR, cfg)

--- 
## âœ… Done!

Your new dataset is ready at the output path you specified. You can now proceed to the next notebook to train a policy.