This repository contains code for data generation and robotic policy evaluation in the RoboBenchMart benchmark.
We present the dsynth (Darkstore Synthesizer) package, which includes specialized ManiSkill environments for retail setups, a scene generation algorithm, motion planning solvers for data collection, and scripts for policy evaluation.
The ManiSkill simulator requires the Vulkan API.
sudo apt-get install libvulkan1To test your Vulkan installation:
sudo apt install vulkan-tools
vulkaninfoFor troubleshooting, see the ManiSkill installation guide.
git clone https://gitlab.2a2i.org/cv/robo/darkstore-synthesizer
cd darkstore-synthesizer
conda create -n dsynth python=3.10
conda activate dsynth
pip install -r requirements.txt
pip install mplib==0.2.1To test your ManiSkill installation:
python -m mani_skill.examples.demo_random_actionDownload RoboCasa Assets:
python -m mani_skill.utils.download_asset RoboCasaDownload assets from HuggingFace:
hf download emb-ai/RoboBenchMart_assets --repo-type dataset --local-dir assetsIf you do not plan to fine-tune your policy on RoboBenchMart training data (you can still generate it yourself from scratch) or evaluate it in training environments, you can skip this step. Otherwise, download demo data from HuggingFace:
hf download emb-ai/RoboBenchMart_demo_envs --repo-type dataset --local-dir demo_envsGenerate a simple scene:
python scripts/generate_scene_continuous.py ds_continuous=small_sceneThe default save directory is generated_envs/, but you can change it using ds_continuous.output_dir=<YOUR_PATH>.
To visualize the generated environment in the SAPIEN viewer:
python scripts/show_env_in_sim.py generated_envs/ds_small_scene/ -s 42 --guiThe seed -s controls randomization of the scene layout, item arrangement, textures, and robot initial position.
You can use teleoperation to record demonstration trajectories:
python scripts/run_teleop_fetch.py --scene-dir generated_envs/ds_small_scene/See the example notebook for tutorials on scene generation, importing scenes into ManiSkill, and motion planning using the dsynth package.
We present seven atomic and three composite tasks for evaluating mobile manipulation policies in retail environments. These tasks can be grouped into two categories: pick-and-place (PnP) and opening/closing.
- PickToBasket: pick a specified item and place it in the attached basket
- MoveFromBoardToBoard: move a specified item to the next board
- PickFromFloor: pick an item from the floor and place it on a shelf
- Open Showcase / Close Showcase: open or close a specified door of the vertical showcase
- Open Fridge / Close Fridge: open or close the ice cream fridge door
Each task consists of two components:
- ManiSkill environment – defines the target object (for PnP), success criteria, robot initial position, and wall/ceiling textures
- Scene data – defines layouts, objects present in the scenes, and their arrangement on shelves, which are imported into the ManiSkill environment during episode initialization
Thus, different target objects require different ManiSkill environments, and different item sets require distinct scene data.
Each atomic task is evaluated under the following setups:
- Train scenes – same scenes and target objects as used in training
- Train scenes with initial pose randomization – same as training but with a different initial robot pose
- Test scenes – unseen layouts and object arrangements, but seen target objects
- Out-of-distribution items – unseen scenes and unseen target items (for PnP tasks only)
For more details, see the task documentation.
| Model | Description | Weights Downloading |
|---|---|---|
| Octo | Octo-base finetuned with 1 history image and 50 action horizon | hf download emb-ai/RoboBenchMart_octo --repo-type model --local-dir models/octo |
|
Finetuned |
hf download emb-ai/RoboBenchMart_pi0 --repo-type model --local-dir models/pi0 |
|
|
Finetuned |
hf download emb-ai/RoboBenchMart_pi05 --repo-type model --local-dir models/pi05 |
Follow the official installation instructions to set up the Octo environment.
Launch the Octo server (within the Octo environment):
python scripts/octo_server.py --model-path <PATH_TO_OCTO_WEIGHTS>Follow original installation instructions to set up environment with Pi0.
Apply a small patch to the Pi0 repository to add RoboBenchMart:
git apply path_to_robobenchmart/scripts/add_rbm.patch
Launch Pi0 (or Pi05) server (inside the Pi0 repository):
XLA_PYTHON_CLIENT_MEM_FRACTION=0.6 uv run scripts/serve_policy.py policy:checkpoint --policy.config=pi0_eval_rbm --policy.dir=<PATH_TO_Pi0_CHECKPOINT>
XLA_PYTHON_CLIENT_MEM_FRACTION=0.6 uv run scripts/serve_policy.py policy:checkpoint --policy.config=pi05_eval_rbm --policy.dir=<PATH_TO_Pi05_CHECKPOINT>Generate test scenes (not needed if you have already downloaded demo data, as they are included):
bash bash/generate_test_scenes.shRun the evaluation client script scripts/eval_policy_client.py.
Example for evaluating in the PickToBasketContNiveaEnv environment with 30 rollouts:
python scripts/eval_policy_client.py -e PickToBasketContNiveaEnv --scene-dir demo_envs/test_unseen_items_pick_to_basket --eval-subdir policy_evaluation --max-horizon 500 --num-traj 30To save videos for each rollout, add the --save-video flag.
Evaluation results will be saved in demo_envs/test_unseen_items_pick_to_basket/evaluation/policy_evaluation.
We recommend using different subdirectories (via --eval-subdir) for different policies to avoid mixing results.
Evaluation on tasks with out-of-distribution target items is done similarly, with the appropriate environment ID (-e) and scene directory (--scene-dir).
See the item distribution here.
To reproduce training environments for evaluation, you must specify the exact seeds used during trajectory collection (motion planning).
These seeds are stored in JSON files included in the demo data from HuggingFace.
Specify the path to these JSON files using --json-path:
python scripts/eval_policy_client.py --scene-dir demo_envs/pick_to_basket --json-path demo_envs/pick_to_basket/demos/motionplanning/pick_to_basket_stars_250traj_4workers.json --eval-subdir policy_evaluation --max-horizon 500 --num-traj 30Note that demo_envs/pick_to_basket is a directory containing training scenes.
To evaluate a policy with additional randomization in the robot's initial position, specify a separate seed using --robot-init-pose-start-seed 10000.
Choose a large seed (>1000) to ensure the robot's starting position differs from the training setup.
To run evaluations on seen, unseen, and out-of-distribution items (for Octo):
bash bash/eval_model.sh --model octoFor Pi0/Pi05:
bash bash/eval_model.sh --model pi0bash bash/eval_model.sh --model pi05Generate scenes for composite tasks:
bash bash/generate_scenes_composite.shEvaluation for composite tasks uses the scripts/eval_policy_composite_client.py script, which has a similar interface:
python scripts/eval_policy_composite_client.py --env-id PickNiveaFantaEnv --scene-dir demo_envs/composite_pick_to_basket --eval-subdir policy_evaluation_composite --max-horizon 1000 --num-traj 30 --save-videoTo run full evaluation on composite tasks (for Octo):
bash bash/eval_model_composite_tasks.sh --model octoFor Pi0/Pi05:
bash bash/eval_model_composite_tasks.sh --model pi0bash bash/eval_model_composite_tasks.sh --model pi05You can download raw h5 trajectories (~50Gb) collected via motion planning:
hf download emb-ai/RoboBenchMart_demo_envs_mp --repo-type dataset --local-dir demo_envsNext, replay all trajectories to obtain visual observations:
bash bash/replay.shTo convert data to RLDS format, refer to the RLDS builder repository.
If you have already downloaded demo data from HuggingFace, you need to collect demonstration trajectories.
First, run motion planning to collect raw .h5 trajectories without visual observations in training environments:
bash bash/run_mp_all.shThe resulting trajectories are stored in ../demos/motionplanning within the scene directories.
Motion planning is time-consuming.
We recommend running per-environment scripts such as bash/run_mp_CloseDoorFridgeContEnv.sh, bash/run_mp_MoveFromBoardToBoardVanishContEnv.sh, etc., in parallel to accelerate trajectory generation.
Next, replay all trajectories to obtain visual observations:
bash bash/replay.shTo convert data to RLDS format, refer to the RLDS builder repository.
To generate demo data from scratch, first generate training scenes:
bash bash/generate_scenes.shThe remaining steps are the same as above.
If you find RoboBenchMart useful for your work, please cite:
@article{soshin2025robobenchmart,
title={RoboBenchMart: Benchmarking Robots in Retail Environment},
author={Soshin, Konstantin and Krapukhin, Alexander and Spiridonov, Andrei and Shepelev, Denis and Bukhtuev, Gregorii and Kuznetsov, Andrey and Shakhuro, Vlad},
journal={arXiv preprint arXiv:2511.10276},
year={2025}
}
