Duke University
Paper: https://arxiv.org/abs/2511.07654
Video: https://youtu.be/NwvgLdydJFk
Website: http://generalroboticslab.com/TimeAwarePolicy
conda create --name timeaware python=3.8
conda activate timeaware
pip install -r requirements.txt --no-cache-dir
# IsaacGym installation
cd isaacgym/python && pip install -e . && cd ../..
Run following command to play with the time-aware policy!
๐ฎ Keyboard Controls
- โฌ๏ธ Increase the time ratio by 0.1
- โฌ๏ธ Reduce the time ratio by 0.1
python tw_evaluation.py --rendering --num_envs 1 --par_configs --checkpoint 20250717_162724_tw_FrankaCubeStack --index_episode best_rew --keyboard_ctrl --draw_scevel --goal_speed 0.6
python tw_evaluation.py --rendering --num_envs 1 --par_configs --checkpoint 20250715_123940_tw_FrankaGmPour --index_episode best_rew --keyboard_ctrl --draw_scevel --goal_speed 0.6
python tw_evaluation.py --rendering --num_envs 1 --par_configs --checkpoint 20250730_151924_tw_FrankaCabinet --index_episode best_rew --keyboard_ctrl --draw_scevel --goal_speed 0.6
Replace TASK_NAME to one of names from FrankaCubeStack, FrankaGmPour, or FrankaCabinet.
If you want to add your own custom environments, please follow steps described in IsaacGymEnvs.
During training, all results are saved in the \train_res folder.
We use wandb for logging. Therefore, you might need to log to your own account.
python tw_training.py --saving --fix_priv --task_name TASK_NAME
Replace CKPT to the time-unaware policy ckpt name (same as its folder name and wandb name).
python tw_training.py --saving --fix_priv --reset_critic --warmup_iters 50 --no_dense --epstimeRewardScale "[100, 100]" --successRewardScale 1000 --index_episode init --checkpoint CKPT --task_name TASK_NAME
Replace CKPT to the time-optimal policy ckpt name in the previous stage.
python tw_training.py --saving --stu_train --lr 5e-4 --warmup_rand --time_ratio --quiet False --wandb False --index_episode best_rew --checkpoint CKPT --task_name TASK_NAME
Replace CKPT to the augmented time-optimal policy ckpt name in the previous stage.
python tw_evaluation.py --saving --num_envs 10000 --target_success_eps 10000 --target_record_eps 1000 --save_threshold 10 --record_init_configs --use_par_checkpoint --index_episode best --checkpoint CKPT
Replace CKPT to same augmented time-optimal policy ckpt name in the previous stage.
This stage will use configurations that collected in the previous stage.
python tw_training.py --saving --lr 2e-4 --gamma 1. --no_dense --time2end --time_ratio --ratio_range "[0.2, 1]" --use_cost --fixed_configs --epstimeRewardScale "[100, 100]" --index_episode best --checkpoint CKPT --task_name TASK_NAME
After each evaluation, results are saved in the \eval_res folder.
python tw_evaluation.py --saving --num_envs 2000 --target_success_eps 2000 --strict_eval
# (For cube stacking task only) Use a container as target instead of another cube +
--use_container
# For the time-unaware policy +
--index_episode init --checkpoint CKPT
# For the time-aware policy +
--par_configs --index_episode best_rew --goal_ratio_range "[0.2, 1.0, 0.1]" --checkpoint CKPT
python tw_evaluation.py --saving --num_envs 2000 --target_success_eps 2000 --strict_eval
# Cube stacking (increase the restitution) +
--add_restitution
# Granular Media Pouring (increase the number of beans) +
--num_gms_eval 40
# Drawer Opening
# Increase the joint friction +
--friction_mul 2
# Increase weights in the drawer +
--num_props_eval 6
# For the time-unaware policy +
--index_episode init --checkpoint CKPT
# For the time-unaware policy & joint interpolation baseline +
--interpolate_joints 4
# For the time-aware policy +
--par_configs --index_episode best_rew --goal_ratio_range "[0.2, 1.0, 0.1]" --checkpoint CKPT
This experiment uses FrankaCubeStack task.
python tw_evaluation.py --saving --num_envs 2000 --target_success_eps 2000 --strict_eval --apply_disturbances --disturbance_v 10
# For the time-unaware policy +
--index_episode init --checkpoint CKPT
# For the time-aware policy +
--par_configs --index_episode best_rew --goal_ratio_range "[0.2, 1.0, 0.1]" --checkpoint CKPT
Heuristic stage-wise control: the manipulation process is divided into distinct stages. Each stage is assigned a tailored time ratio.
python tw_evaluation.py --saving --num_envs 2000 --target_success_eps 2000 --strict_eval --par_configs --index_episode best_rew --goal_speed 0.5 --checkpoint CKPT
# Cube stacking +
--budget_portion "[0.15, 0.35, 0.15, 0.35]" --speed_describe "[1, 0, 1, 0]"
# Granular media pouring +
--budget_portion "[0.5, 0.5]" --speed_describe "[1, 0]"
# Drawer opening +
--budget_portion "[0.2, 0.2, 0.3, 0.3]" --speed_describe "[1, 0, 1, 0]"
Online interface control: the user provides real-time time ratio via a simple and intuitive interface (e.g., keyboard or slider) to directly steer the behavior of the robot to align with high-level human intents.
python tw_evaluation.py --rendering --draw_scevel --keyboard_ctrl --simple_layout --num_envs 1 --par_configs --index_episode best_rew --checkpoint CKPT
We are using the joint impedance controller and the gripper control from franka_ros_interface. Please follow the documentation to setup the controller.
We using realsense camera for the experiment. To calibrate the camera external matrix, you can follow Franka-camera calibration.
After setting up the controller and the camera, you can start to receive the command and send to the joint impedance controller and gripper controller.
python tw_evaluation.py --num_envs 1 --real_robot --par_configs --index_episode best_rew --checkpoint CKPT
# To specify the real world scheduled time (e.g. 10s) +
--goal_time 10s
# To specify the time ratio (e.g. 0.5) +
--goal_speed 0.5
# To use stage wise control (e.g. 10s in total, fast then slow)
--goal_time 10 --budget_portion "[0.4, 0.6]" --speed_describe "[1, 0]"
Real robot online interface control
python tw_evaluation.py --num_envs 1 --real_robot --par_configs --keyboard_ctrl --draw_scevel --goal_speed 0.2 --index_episode best_rew --checkpoint CKPT
.
โโโ envs # Simulaion environments
โ โโโ assets
โ โโโ isaacgymenvs
โโโ model # Policy architecture
โ โโโ agent.py
โ โโโ utils.py
โโโ train_res # Training results and checkpoints
โ โโโ FrankaCabinet
โ โโโ FrankaCubeStack
โ โโโ FrankaGmPour
โโโ eval_res # Evaluation results
โ โโโ FrankaCabinet
โ โโโ FrankaCubeStack
โ โโโ FrankaGmPour
โโโ real_robot # Real robot scripts (object detector and communicator)
โ โโโ DemoCamera.py
โ โโโ RealSenseCamera.py
โ โโโ SocketClient.py
โ โโโ StateEstimator.py
โโโ isaacgym # Isaacgym simulator
โโโ tw_training.py # Training script
โโโ tw_training_utils.py # Training helper script
โโโ tw_evaluation.py # Evaluation script
โโโ tw_evaluation_utils.py # Evaluation helper script
โโโ tf_utils.py # Transformation helper script
โโโ utils.py # General (I/O) helper script
โโโ requirements.txt # Dependencies
โโโ README.md
This repository is released under the CC BY-NC-ND 4.0 License. Duke University has filed patent rights for the technology associated with this article. For further license rights, including using the patent rights for commercial purposes, please contact Duke's Office for Translation and Commercialization (otcquestions@duke.edu) and reference OTC DU9041PROV. See LICENSE for additional details.
This work is supported by DARPA TIAMAT program under award HR00112490419, ARO under award W911NF2410405, and ARL STRONG program under awards W911NF2320182, W911NF2220113, and W911NF242021.
If you find our paper or codebase helpful, please consider citing:
@misc{jia2025timeawarepolicylearningadaptive,
title={Time-Aware Policy Learning for Adaptive and Punctual Robot Control},
author={Yinsen Jia and Boyuan Chen},
year={2025},
eprint={2511.07654},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2511.07654},
}

