Skip to content

Holiday-Robot/FlashSAC

Repository files navigation

FlasahSAC

Official implementation of

FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control

Project Page PDF

Donghu Kim*1, Youngdo Lee*2,3, Minho Park2, Kinam Kim2, Takuma Seno4, I Made Aswin Nahrendra3, Sehee Min1, Daniel Palenicek5,6, Florian Vogt7, Danica Kragic7, Jan Peters5,6,8, Jaegul Choo2, Hojoon Lee1

1Holiday Robotics, 2KAIST, 3KRAFTON, 4Turing Inc, 5TU Darmstadt, 6hessian.AI, 7KTH Royal Institute of Technology, 8German Research Center for AI (DFKI)

(* indicates equal contribution)

arXiv'2026.

🎬 Teaser Video

teaser-2_3.mp4

About FlashSAC

FlashSAC is a fast and stable off-policy reinforcement learning algorithm that achieves the highest asymptotic performance in the shortest wall-clock time for high-dimensional robotic control.

This repository (FlashSAC) provides the full training framework, agent implementations, and environment integrations used in the paper, supporting over 100 tasks across diverse simulators: IsaacLab, MuJoCo Playground, ManiSkill, Genesis, HumanoidBench, MyoSuite, MuJoCo, Meta-World, and DeepMind Control Suite.

If you're using PPO, try FlashSAC!

Installation

1. Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"

2. Pin Python Version

Configuration Ubuntu GPU Python
Config 1 22.04 RTX 30x0, 40x0 uv python pin 3.10.18
Config 2 24.04 RTX 50x0, Bx00 (Blackwell) uv python pin 3.11.14

3. Install Dependencies

uv sync

4. Install MuJoCo

wget https://github.com/deepmind/mujoco/releases/download/2.1.0/mujoco210-linux-x86_64.tar.gz
tar xvf mujoco210-linux-x86_64.tar.gz && rm mujoco210-linux-x86_64.tar.gz
mkdir -p ~/.mujoco && mv mujoco210 ~/.mujoco/mujoco210

Add to ~/.bashrc:

export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/home/$USER/.mujoco/mujoco210/bin:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia
export MUJOCO_GL="egl"
export MUJOCO_EGL_DEVICE_ID="0"
export MKL_SERVICE_FORCE_INTEL="0"

Verify:

source ~/.bashrc
uv run python -c "import gymnasium; gymnasium.make('HalfCheetah-v4')"

5. Optional Environment Dependencies

By default, only MuJoCo and DMC are available. Install additional environments with:

uv sync --extra <environment>

Available extras: isaaclab, mujoco-playground, maniskill, genesis, humanoid-bench, myosuite, metaworld, all

Note

mujoco-playground has known issues with JAX > 0.5.2 (NaN values, training collapse — see issue #153) and may not work with Python 3.11.

Note

isaaclab cannot be installed alongside genesis or humanoid-bench due to dependency conflicts. If you need IsaacLab, install it in a separate virtual environment with uv sync --extra isaaclab. For the same reason, all installs every extra except isaaclab.

Training

Single Experiment

uv run python train.py

Override config values via --overrides:

uv run python train.py --overrides env=dmc --overrides env.env_name='humanoid-walk'

Batch Experiments

Example scripts for each environment are provided in scripts/:

bash scripts/run_mujoco.sh
bash scripts/run_isaaclab.sh

Configuration

Configs are managed via Hydra. The base config is configs/flashSAC_base.yaml, with modular sub-configs under configs/agent/ and configs/env/.

Logging

Both Weights & Biases and TensorBoard are supported. Set logger_type in configs/flashSAC_base.yaml:

logger_type: 'wandb'        # or 'tensorboard'

TensorBoard logs are saved to runs/. Launch with:

tensorboard --logdir runs

Performance Optimizations

FlashSAC adapts its configuration based on the simulator type for optimal speed:

GPU simulators (IsaacLab, MJP, Genesis, ManiSkill) CPU simulators (MuJoCo, DMC, HBench, Myosuite)
num_envs 1024 1
batch_size 2048 512
AMP On Off
Buffer device cuda:0 cpu

Note

torch.compile mode is determined by Python version. This is configured automatically — do not change it manually.

Python Compile mode PyTorch Notes
3.10 reduce-overhead 2.5.1 Legacy default
3.11 max-autotune 2.9.1 reduce-overhead causes 5–10x slowdowns after PyTorch 2.8

We use PyTorch 2.9.1 for Python 3.11 instead of 2.7.1 (IsaacLab's default), since IsaacLab will eventually migrate to newer versions. See pyproject.toml for version pinning details.

Key design choices:

  • AMP off for small batches — AMP incurs a GPU/CPU sync that becomes a bottleneck when batch and model sizes are small.
  • CPU buffer for CPU simulators — With only 1 env, the overhead of GPU buffer operations outweighs the benefit. GPU buffer only pays off with large parallel envs.
  • Compiled critical paths — Weight normalization, target critic EMA, _select_min_q_log_probs, and _compute_categorical_td_target are compiled for speed.

See the scripts/ directory for recommended per-environment configurations.

Checkpointing

Agent checkpoints and replay buffers can be saved and loaded during training.

Saving

Checkpoints are saved automatically at the end of training by default. To save at regular intervals, set save_checkpoint_per_interaction_step and optionally save_buffer_per_interaction_step:

uv run python train.py \
    --overrides save_checkpoint_per_interaction_step=24400 \
    --overrides save_buffer_per_interaction_step=24400

Checkpoints are saved to models/<group>/<exp>/<env_name>/seed<seed>-<timestamp>/step<N>/ and include the actor, critic, target critic, temperature, reward normalizer, and agent state (update step, grad scaler).

Loading

To resume training from a checkpoint, provide agent_load_path and optionally buffer_load_path:

uv run python train.py \
    --overrides agent_load_path='models/.../step24400' \
    --overrides buffer_load_path='models/.../step24400'

By default, optimizer and reward normalizer states are also restored. This can be configured via agent.load_optimizer and agent.load_reward_normalizer in the agent config.

Visualization (IsaacLab)

Trained IsaacLab agents can be visualized in the Isaac Sim viewport using play_isaaclab.py. This uses the same Hydra config system as training — pass the same --overrides you trained with so the network architecture matches the checkpoint.

uv run python play_isaaclab.py \
    --checkpoint_path 'models/.../step24400' \
    --num_envs 16 \
    --num_episodes 10 \
    --overrides env=isaaclab \
    --overrides env.env_name='Isaac-Velocity-Flat-G1-v0' \
    --overrides agent=flashSAC \
    --overrides agent.asymmetric_observation=true \
    --overrides agent.buffer_max_length=1

Key arguments:

Argument Description
--checkpoint_path Path to the saved checkpoint directory (contains actor.pt, etc.)
--num_envs Number of parallel environments to visualize (default: 16)
--num_episodes Number of episodes to run (default: 10)
--overrides Same Hydra overrides used during training

Note

agent.buffer_max_length can be set to a small value (e.g., 1) since the replay buffer is not used during play.

Project Structure

flash_rl/
  agents/       # Agent implementations (FlashSAC, random)
  buffers/      # Replay buffer implementations
  common/       # Logger (wandb / tensorboard)
  envs/         # Environment wrappers (Gymnasium 1.1 API)
  evaluation.py # Evaluation and video recording
configs/           # Hydra configs (base, agent, env)
scripts/           # Launch scripts per environment
results/           # Experiment results and plots
train.py           # Training entry point
play_isaaclab.py   # IsaacLab visualization entry point

Development

uv sync --dev    # install formatters, linter, type checker
./bin/lint       # run Black, Ruff, Mypy

Citation

@article{kim2026flashsac,
  title={FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control},
  author={Kim, Donghu and Lee, Youngdo and Park, Minho and Kim, Kinam and Nahendra, I Made Aswin and Seno, Takuma and Min, Sehee and Palenicek, Daniel and Vogt, Florian and Kragic, Danica and Peters, Jan and Choo, Jaegul and Lee, Hojoon},
  journal={arXiv preprint arXiv:2604.04539},
  year={2026}
}

About

FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors