<a href="https://colab.research.google.com/github/jorgemunozl/vla-test/blob/main/TEST/thirteen_tests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Thirteen Test - Test our processor, CUDA nedded?

Let's see if our processor is good.

## Setup

The following cells set up the environment:
1. Clone the XHUMAN repository
2. Install dependencies
3. Authenticate with HuggingFace Hub
4. Import required libraries

In [1]:
# Clone

In [2]:
%cd XHUMAN

/content/XHUMAN


In [3]:
!uv pip install -e .[pi]

[2mUsing Python 3.12.12 environment at: /usr[0m
[2K[2mResolved [1m215 packages[0m [2min 30.67s[0m[0m
[2K[2mPrepared [1m45 packages[0m [2min 21.44s[0m[0m
[2mUninstalled [1m14 packages[0m [2min 969ms[0m[0m
[2K[2mInstalled [1m45 packages[0m [2min 250ms[0m[0m
 [32m+[39m [1masync-lru[0m[2m==2.1.0[0m
 [32m+[39m [1mav[0m[2m==15.1.0[0m
 [32m+[39m [1mcomm[0m[2m==0.2.3[0m
 [32m+[39m [1mdeepdiff[0m[2m==8.6.1[0m
 [31m-[39m [1mdiffusers[0m[2m==0.36.0[0m
 [32m+[39m [1mdiffusers[0m[2m==0.35.2[0m
 [32m+[39m [1mdraccus[0m[2m==0.10.0[0m
 [32m+[39m [1mevdev[0m[2m==1.9.2[0m
 [32m+[39m [1mfaker[0m[2m==40.1.2[0m
 [32m+[39m [1mfeetech-servo-sdk[0m[2m==1.0.0[0m
 [31m-[39m [1mhuggingface-hub[0m[2m==0.36.0[0m
 [32m+[39m [1mhuggingface-hub[0m[2m==0.35.3[0m
 [32m+[39m [1minquirerpy[0m[2m==0.3.4[0m
 [31m-[39m [1mipykernel[0m[2m==6.17.1[0m
 [32m+[39m [1mipykernel[0m[2m==7.1.0[0m
 [32m+[39m [1mje

In [5]:
import time
from contextlib import nullcontext
from typing import Any

import torch
from accelerate import Accelerator
from accelerate.utils import DistributedDataParallelKwargs
from torch.optim import Optimizer

from lerobot.configs import parser
from lerobot.datasets.sampler import EpisodeAwareSampler
from lerobot.datasets.utils import cycle
from lerobot.optim.factory import make_optimizer_and_scheduler
from lerobot.policies.pretrained import PreTrainedPolicy
from lerobot.utils.logging_utils import AverageMeter, MetricsTracker
from lerobot.utils.random_utils import set_seed
from lerobot.utils.train_utils import load_training_state
from lerobot.utils.utils import (
    format_big_number,
    has_method,
    init_logging,
)

from xhuman.policies.factory import make_xhuman_policy, make_xhuman_pre_post_processors
from xhuman.configs.train import TrainPipelineConfigXHUMAN
from xhuman.datasets.factory import make_dataset_xhuman
from xhuman.datasets.utils import split_train_eval_episodes
from xhuman.logger import logger

## Helper Functions

These functions handle dataset loading and policy updates. They are designed to work with distributed training using HuggingFace Accelerate.

In [6]:
def load_dataset(cfg: TrainPipelineConfigXHUMAN, episodes: list[int], is_main_process: bool = True, accelerator: Accelerator | None = None):
    """
    Load the dataset for training and evaluation.
    """
    # Dataset loading synchronization: main process downloads first to avoid race conditions
    cfg.dataset.episodes = episodes

    if is_main_process:
        logger.info("Creating dataset")
        dataset = make_dataset_xhuman(cfg)

    accelerator.wait_for_everyone()

    # Now all other processes can safely load the dataset
    if not is_main_process:
        dataset = make_dataset_xhuman(cfg)

    return dataset

In [7]:
from xhuman.policies.pi05ki.configuration_pi05ki import PI05KIConfig

policy_config = PI05KIConfig(repo_id="none",device="cuda")

## Configuration and Setup

Configure your dataset and policy settings here. The dataset configuration specifies which HuggingFace repository to load, and the policy configuration sets up the PI05 model architecture.

In [8]:
from xhuman.configs.default import LerobotDatasetConfig

dataset_config = LerobotDatasetConfig(
    repo_id="NONHUMAN-RESEARCH/test-general-idx",
)

In [9]:

cfg = TrainPipelineConfigXHUMAN(
    dataset=dataset_config,
    policy=policy_config # Example policy configuration, replace with your actual policy path
)
cfg.validate()

## Training Setup

Initialize the Accelerator for distributed training and set up the training environment. The accelerator automatically handles:
- Multi-GPU training
- Mixed precision training
- Gradient synchronization across processes

In [10]:
# Create Accelerator
# It will automatically detect if running in distributed mode or single-process mode
# We set step_scheduler_with_optimizer=False to prevent accelerate from adjusting the lr_scheduler steps based on the num_processes
# We set find_unused_parameters=True to handle models with conditional computation
ddp_kwargs = DistributedDataParallelKwargs(find_unused_parameters=True)
accelerator = Accelerator(step_scheduler_with_optimizer=False, kwargs_handlers=[ddp_kwargs])

init_logging(accelerator=accelerator)

# Determine if this is the main process (for logging and checkpointing)
is_main_process = accelerator.is_main_process

# Set seed if specified
if cfg.seed is not None:
    set_seed(cfg.seed, accelerator=accelerator)

# Use accelerator's device
device = accelerator.device
torch.backends.cudnn.benchmark = True
torch.backends.cuda.matmul.allow_tf32 = True

In [32]:
import lerobot.datasets.lerobot_dataset
from lerobot.datasets.utils import load_nested_dataset as original_load_nested_dataset
import datasets

# Monkey patch to handle schema mismatch
# The dataset parquet files have a schema that conflicts with the strict fixed-length definition
# expected by the code. We patch the loader to ignore strict features if loading fails.
def patched_load_nested_dataset(pq_dir, features=None, episodes=None):
    try:
        return original_load_nested_dataset(pq_dir, features=features, episodes=episodes)
    except (datasets.builder.DatasetGenerationError, datasets.table.CastError, Exception) as e:
        # Fallback if strict schema casting fails
        print(f"Warning: Dataset loading failed with strict features ({type(e).__name__}). Retrying without features constraint.")
        return original_load_nested_dataset(pq_dir, features=None, episodes=episodes)

lerobot.datasets.lerobot_dataset.load_nested_dataset = patched_load_nested_dataset

DEBUG_MODE = True  # Set to False for full training
DEBUG_MAX_EPISODES = 3  # Use only first N episodes for debugging

# First, get total episodes count (load minimal dataset to check)
if is_main_process:
    temp_dataset = make_dataset_xhuman(cfg)
    total_episodes = temp_dataset.meta.total_episodes
    del temp_dataset
    logger.info(f"Total episodes available: {total_episodes}")
else:
    # For non-main processes, use a reasonable default
    # In practice, this will be synced after main process loads
    total_episodes = 4  # Fallback - adjust if needed

accelerator.wait_for_everyone()

# Limit episodes for debugging
if DEBUG_MODE:
    episodes = list(range(min(DEBUG_MAX_EPISODES, total_episodes)))
    if is_main_process:
        logger.info(f"DEBUG MODE: Using only {len(episodes)} episodes")
else:
    episodes = list(range(total_episodes))

# Split episodes
train_episodes, eval_episodes = split_train_eval_episodes(
    episodes, split_ratio=cfg.split_ratio, seed=42
)

# Load dataset with ONLY train episodes (proper way to filter)
# This uses the load_dataset helper function which sets cfg.dataset.episodes
if is_main_process:
    logger.info(f"Loading train dataset with {len(train_episodes)} episodes")
dataset = load_dataset(cfg, train_episodes, is_main_process=is_main_process, accelerator=accelerator)
dataset.train_with_subtasks = True





## Dataset and Model Information

Display metadata about the loaded dataset and model. This includes:
- Total number of episodes and frames
- Model parameter counts
- Effective batch size (accounting for distributed training)

In [33]:
# Display dataset metadata and model configuration
if is_main_process:
    from pprint import pprint

    print("=" * 80)
    print("DATASET METADATA")
    print("=" * 80)
    print(f"\nDataset Repository: {dataset.repo_id}")
    print(f"Total Episodes: {dataset.meta.total_episodes}")
    print(f"Training Episodes: {len(train_episodes)}")
    print(f"Number of Frames: {dataset.num_frames:,}")
    print(f"Number of Episodes (loaded): {dataset.num_episodes}")


    print("\n" + "=" * 80)

DATASET METADATA

Dataset Repository: NONHUMAN-RESEARCH/test-general-idx
Total Episodes: 437
Training Episodes: 2
Number of Frames: 2,251
Number of Episodes (loaded): 2



In [64]:
# Create processors
preprocessor, postprocessor = make_xhuman_pre_post_processors(
    policy_cfg=cfg.policy,
    pretrained_path=cfg.policy.pretrained_path,
)

In [65]:
# Print training info
if is_main_process:
    logger.info(f"Output dir: {cfg.output_dir}")
    logger.info(f"Steps: {cfg.steps} ({format_big_number(cfg.steps)})")
    logger.info(f"Dataset frames: {dataset.num_frames} ({format_big_number(dataset.num_frames)})")
    logger.info(f"Dataset episodes: {dataset.num_episodes}")
    num_processes = accelerator.num_processes
    effective_bs = cfg.batch_size * num_processes
    logger.info(f"Effective batch size: {cfg.batch_size} x {num_processes} = {effective_bs}")

In [66]:
# Create dataloader
if hasattr(cfg.policy, "drop_n_last_frames"):
    logger.info(f"Dropping {cfg.policy.drop_n_last_frames} last frames")
    shuffle = False
    sampler = EpisodeAwareSampler(
        dataset.meta.episodes["dataset_from_index"],
        dataset.meta.episodes["dataset_to_index"],
        drop_n_last_frames=cfg.policy.drop_n_last_frames,
        shuffle=True,
    )
else:
    logger.info("Not dropping any frames")
    shuffle = True
    sampler = None

In [67]:
# Training initialization
# This logs the start of training and shows how many episodes will be used
if is_main_process:
    logger.info("Start offline training on a fixed dataset")
    logger.info(f"Train episodes: {len(train_episodes)}")
    logger.info(f"Total training steps: {cfg.steps}")

In [68]:
from torch.utils.data import Dataset

class SmartSubset(Dataset):
    def __init__(self, dataset, indices):
        self.dataset = dataset
        self.indices = indices

    def __getitem__(self, idx):
        return self.dataset[self.indices[idx]]

    def __len__(self):
        return len(self.indices)

    def __getattr__(self, name):
        # This is the magic part:
        # If the code asks for 'meta', 'fps', etc., and this class doesn't have it,
        # it automatically looks inside the original dataset.
        return getattr(self.dataset, name)

# --- USAGE ---
# Use SmartSubset instead of torch.utils.data.Subset
debug_subset = SmartSubset(dataset, range(0, 50))

# Now create your loader normally
train_dataloader = torch.utils.data.DataLoader(
    debug_subset,
    batch_size=2,
    shuffle=True,
    num_workers=0,
    drop_last=True
)

In [86]:
train_dataloader

<torch.utils.data.dataloader.DataLoader at 0x78562ffd8da0>

In [69]:
dl_iter = iter(train_dataloader)

In [62]:
import sys
import importlib

# Polyfill 'imp' for Python 3.12 compatibility
if "imp" not in sys.modules:
    import types
    imp = types.ModuleType("imp")
    imp.reload = importlib.reload
    sys.modules["imp"] = imp

%load_ext autoreload
%autoreload 2

# Call Subtask Processor
Important track the index for loss mask

In [76]:
frames = next(dl_iter)

In [77]:
frames.keys()

dict_keys(['observation.images.left', 'observation.images.top', 'observation.images.right', 'action', 'observation.state', 'timestamp', 'frame_index', 'episode_index', 'index', 'task_index', 'general_task_index', 'action_is_pad', 'task', 'general_task', 'train_with_subtask'])

In [79]:
frames["subtask"] = frames["task"]

In [80]:
batch = preprocessor.subtask(frames)

In [81]:
batch.keys()

dict_keys(['action', 'next.reward', 'next.done', 'next.truncated', 'info', 'action_is_pad', 'task', 'index', 'task_index', 'episode_index', 'subtask', 'subtask_tokens', 'observation.images.left', 'observation.images.top', 'observation.images.right', 'observation.state', 'observation.language.tokens', 'observation.language.attention_mask'])

In [82]:
sub, tokens  = batch["subtask_tokens"], batch["observation.language.tokens"]

In [84]:
sublang = tokenize.batch_decode(sub,skip_special_tokens=True)
sublang

['pick up the strawberry and put it in the basket',
 'pick up the strawberry and put it in the basket']

In [85]:
tokenslang = tokenize.batch_decode(tokens,skip_special_tokens=True)
tokenslang


['Task: pick up the strawberry and put it in the basket. Subtask: ',
 'Task: pick up the strawberry and put it in the basket. Subtask: ']

# Test loss calculation

In [58]:
tokens_id = "observation.language.tokens"
mask_id = "observation.language.attention_mask"

tokens, masks = batch[tokens_id], batch[mask_id]
noise = None
time = None
# Processor should have a method to get the subtask tokenized
# List of tensors


In [51]:
from transformers import AutoTokenizer

tokenize = AutoTokenizer.from_pretrained("google/paligemma-3b-pt-224")
stokens = tokenize.batch_decode(tokens)

In [60]:
stokens

['<bos>Task: pick up the strawberry and put it in the basket. Subtask: <pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>',
 '<bos>Task: pic