Skip to content

rew35860/RobotArm_RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Figure-8 Robot Tracking — Setup Guide

This project uses MuJoCo, Gymnasium, and Stable-Baselines3 for reinforcement learning–based robotic control.

⚠️ This setup was tested on Ubuntu 22.04 running inside WSL2 (Windows Subsystem for Linux).
It should also work on native Ubuntu 22.04 / 24.04.


1. System Dependencies (Ubuntu)

Update package lists and install required system libraries:

sudo apt-get update
sudo apt-get install -y git python3 python3-venv python3-pip
sudo apt-get install -y libglfw3 libglew-dev libgl1-mesa-glx libosmesa6

2. Create Python Virtual Environment

Create and activate a clean Python virtual environment:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip

3. Install Python Dependencies

Install MuJoCo, Gymnasium, and Stable-Baselines3:

pip install mujoco gymnasium "stable-baselines3[extra]"

4. Robot Assets (MuJoCo Menagerie)

The Franka Emika Panda model is sourced from the official DeepMind Menagerie:

git clone https://github.com/google-deepmind/mujoco_menagerie.git

5. Automated Setup (Optional)

If a setup script is provided, run:

bash setup.sh

🏋️ Training

To train the PPO agent:

python train_ppo.py

👁️ Evaluation / Viewer

To run a trained model and visualize the rollout:

python eval_rollout_viewer.py

⚠️ Usage Reminder

Always activate the virtual environment before running training or rollout scripts:

source .venv/bin/activate

📊 Performance Validation

To evaluate the policy, we conducted inference tests across different spatial scales and temporal frequencies.

Important

Visualization Key:

  • Red Line: Target trajectory.
  • Green Line: Actual end-effector path.

▶ Click to view Inference Testing: Small Out-of-Distribution (1.0 Hz vs 1.3 Hz)
1.0 Hz Frequency 1.3 Hz Frequency
1.0Hz Small OOD 1.3Hz Small OOD
Mean Position Error: 2.17 cm Mean Position Error: 2.11 cm

Note on Methodology: Results demonstrate zero-shot generalization to a compressed figure-8 scale ($0.10 \times 0.06$ m) entirely unseen during the training phase. To ensure statistical significance, reported metrics are the grand mean derived from 10 randomized episodes. For each 20-second test rollout, a temporal average of the tracking error is computed, followed by an ensemble average across all 10 episodes.

▶ Click to view Inference Testing: Large Out-of-Distribution (1.0 Hz vs 1.3 Hz)
1.0 Hz Frequency 1.3 Hz Frequency
1.0Hz Large OOD 1.3Hz Large OOD
Mean Position Error: 4.99 cm Mean Position Error: 6.80 cm

Note on Methodology: Results demonstrate zero-shot generalization to an expanded figure-8 scale ($0.22 \times 0.15$ m). Reported metrics are the grand mean of 10 randomized 20-second episodes.

Kinematic Limit Analysis: Our testing identifies a critical physical boundary for the Franka Panda. For any trajectory where the scale parameter $a > 0.20$ m, the robot arm reaches the edge of its operational workspace. At these scales, the joints (specifically Joints 4 and 5) encounter hardware saturation, preventing the end-effector from completing the full path. The increased error and 0% success rate at 1.3 Hz reflect this mechanical constraint rather than policy divergence.

▶ Click to view Inference Testing: In-Distribution Baseline (1.0 Hz vs 1.3 Hz)
1.0 Hz Frequency (Baseline) 1.3 Hz Frequency (Stress Test)
1.0Hz In-Distribution 1.3Hz In-Distribution
Mean Position Error: 2.62 cm Mean Position Error: 2.86 cm

Note on Methodology: These results represent the In-Distribution performance on the training scale ($0.14 \times 0.10$ m). Reported metrics are the grand mean derived from 10 randomized episodes. For each 20-second test rollout, a temporal average of the tracking error is computed, followed by an ensemble average across all 10 episodes to ensure statistical robustness.

▶ Click to view Inference Testing: Specialized Policy (0.75 Hz Baseline)
In-Distribution Testing (0.75 Hz)
1.0Hz Specialized Baseline
Mean Position Error: 1.84 cm

Note on Methodology: These results represent the Specialized Policy performance on the standard training scale ($0.14 \times 0.10$ m) at the target frequency of 0.75 Hz. To ensure statistical significance, reported metrics are the grand mean derived from 50 randomized episodes. For each 20-second test rollout, a temporal average of the tracking error is computed, followed by an ensemble average across all 50 episodes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors