This project uses MuJoCo, Gymnasium, and Stable-Baselines3 for reinforcement learning–based robotic control.
⚠️ This setup was tested on Ubuntu 22.04 running inside WSL2 (Windows Subsystem for Linux).
It should also work on native Ubuntu 22.04 / 24.04.
Update package lists and install required system libraries:
sudo apt-get update
sudo apt-get install -y git python3 python3-venv python3-pip
sudo apt-get install -y libglfw3 libglew-dev libgl1-mesa-glx libosmesa6Create and activate a clean Python virtual environment:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pipInstall MuJoCo, Gymnasium, and Stable-Baselines3:
pip install mujoco gymnasium "stable-baselines3[extra]"The Franka Emika Panda model is sourced from the official DeepMind Menagerie:
git clone https://github.com/google-deepmind/mujoco_menagerie.gitIf a setup script is provided, run:
bash setup.shTo train the PPO agent:
python train_ppo.pyTo run a trained model and visualize the rollout:
python eval_rollout_viewer.pyAlways activate the virtual environment before running training or rollout scripts:
source .venv/bin/activateTo evaluate the policy, we conducted inference tests across different spatial scales and temporal frequencies.
Important
Visualization Key:
- Red Line: Target trajectory.
- Green Line: Actual end-effector path.
▶ Click to view Inference Testing: Small Out-of-Distribution (1.0 Hz vs 1.3 Hz)
| 1.0 Hz Frequency | 1.3 Hz Frequency |
|---|---|
|
|
| Mean Position Error: 2.17 cm | Mean Position Error: 2.11 cm |
Note on Methodology: Results demonstrate zero-shot generalization to a compressed figure-8 scale (
▶ Click to view Inference Testing: Large Out-of-Distribution (1.0 Hz vs 1.3 Hz)
| 1.0 Hz Frequency | 1.3 Hz Frequency |
|---|---|
|
|
| Mean Position Error: 4.99 cm | Mean Position Error: 6.80 cm |
Note on Methodology: Results demonstrate zero-shot generalization to an expanded figure-8 scale (
Kinematic Limit Analysis: Our testing identifies a critical physical boundary for the Franka Panda. For any trajectory where the scale parameter
▶ Click to view Inference Testing: In-Distribution Baseline (1.0 Hz vs 1.3 Hz)
| 1.0 Hz Frequency (Baseline) | 1.3 Hz Frequency (Stress Test) |
|---|---|
|
|
| Mean Position Error: 2.62 cm | Mean Position Error: 2.86 cm |
Note on Methodology: These results represent the In-Distribution performance on the training scale (
▶ Click to view Inference Testing: Specialized Policy (0.75 Hz Baseline)
| In-Distribution Testing (0.75 Hz) |
|---|
|
| Mean Position Error: 1.84 cm |
Note on Methodology: These results represent the Specialized Policy performance on the standard training scale (






