Skip to content

OpenKrab/ClawRL

Repository files navigation

ClawReinforcementLearning 🦞🧠

Bringing Reinforcement Learning to OpenClaw/Krab agents — Self-improving AI that learns from real actions and rewards.

Transform your static agents into adaptive learners that continuously improve from experience. Built on Stable-Baselines3 and Gymnasium, ClawReinforcementLearning enables your agents to:

  • 📚 Learn from real task outcomes and user feedback
  • 🎯 Adapt strategies based on reward signals
  • 🔄 Improve automatically over time with minimal human intervention
  • ⚡ Train locally with full privacy and control
  • 🌐 Integrate seamlessly with ClawFlow ecosystem

Quick Start (30 seconds)

# Install
pip install -r requirements.txt

# Run basic training
python scripts/train.py

# Use in your agent
from src.agent.rl_agent import RLAgent
agent = RLAgent()
action = agent.predict(obs)

Features

Feature Details
🧠 Multiple Algorithms PPO (default), DQN, A3C
📊 Custom Environment Gymnasium-compatible ClawEnv
💾 Smart Model Management Auto-save, checkpointing, best model tracking
🔗 Deep Integration ClawGraph, ClawMemory, ClawFlow compatible
📈 Monitoring TensorBoard integration, reward tracking
🚀 Production Ready Error handling, logging, config management

Installation

Option 1: ClawHub (Recommended)

clawflow install openkrab/claw-reinforcement-learning

Option 2: GitHub + Manual

git clone https://github.com/OpenKrab/ClawReinforcementLearning.git
cd ClawReinforcementLearning
pip install -r requirements.txt

Usage Examples

Basic Training

from src.agent.rl_agent import RLAgent

# Initialize
agent = RLAgent()

# Train
agent.train(timesteps=50000)

# Predict
obs, _ = agent.env.reset()
action = agent.predict(obs)
print(f"Smart action: {action}")

Training Script

# Train with custom parameters
python scripts/train.py \
  --algorithm PPO \
  --timesteps 100000 \
  --learning-rate 0.0001

# Monitor with TensorBoard
tensorboard --logdir ./logs

ClawFlow Integration

# In your ClawFlow config
integrations:
  rl:
    enabled: true
    auto_train_interval: "1d"  # Train daily
    reward_source: clawgraph   # Get rewards from ClawGraph

Architecture

┌─────────────────────────────────────┐
│    ClawReinforcementLearning        │
├─────────────────────────────────────┤
│                                      │
│  RLAgent (Stable-Baselines3)         │
│  ├─ Train: Learn from rewards       │
│  ├─ Predict: Choose actions         │
│  └─ Save/Load: Model management     │
│                                      │
│  ClawEnv (Gymnasium)                 │
│  ├─ Observations: State from graph   │
│  ├─ Actions: Discrete action space   │
│  └─ Rewards: Custom reward function  │
│                                      │
└─────────────────────────────────────┘
         │
         ├─→ ClawGraph (state/rewards)
         ├─→ ClawMemory (observations)
         ├─→ ClawFlow (orchestration)
         └─→ ClawTeam (multi-agent)

Configuration

Edit config.yaml:

rl:
  algorithm: PPO          # Algorithm to use
  total_timesteps: 10000  # Training iterations
  learning_rate: 0.0003   # Learning rate
  batch_size: 64          # Batch size
  
  env:
    max_steps: 50         # Episode length
    action_space: 10      # Number of actions
    observation_space: 20 # State vector size
  
  reward:
    success: 10.0         # Success reward
    failure: -5.0         # Failure penalty
    step_penalty: -0.1    # Per-step cost
  
  save_path: ~/.openkrab/rl_models

Project Structure

ClawReinforcementLearning/
├── src/
│   ├── __init__.py
│   ├── env/
│   │   ├── __init__.py
│   │   └── claw_env.py          # Gymnasium environment
│   └── agent/
│       ├── __init__.py
│       └── rl_agent.py          # RL agent wrapper
│
├── scripts/
│   └── train.py                 # Training script
│
├── examples/
│   └── basic_training.py        # Usage examples
│
├── tests/
│   └── test_env.py              # Unit tests
│
├── config.yaml                  # Configuration
├── requirements.txt             # Dependencies
├── SKILL.md                     # ClawHub skill manifest
└── README.md                    # This file

Training Your Own Model

Step 1: Prepare Environment

from src.env.claw_env import ClawEnv

env = ClawEnv()
obs, info = env.reset()

Step 2: Train

from src.agent.rl_agent import RLAgent

agent = RLAgent()
agent.train(timesteps=50000)  # Train

Step 3: Deploy

# Load and use
agent = RLAgent()
agent.model = agent.load_model()

obs, _ = agent.env.reset()
for _ in range(100):
    action = agent.predict(obs)
    obs, reward, done, truncated, info = agent.env.step(action)
    if done:
        break

Integration with ClawFlow

Once trained, use your model in ClawFlow:

# clawflow.yaml
agents:
  main:
    skills:
      - name: rl
        enabled: true
        model_path: ~/.openkrab/rl_models/ppo_claw.zip
      - name: claw_browser
      - name: claw_tools
    
    decision_policy: "reinforcement"  # Use RL for decisions

Performance Tips

  1. Increase Training Time: More timesteps = better convergence

    agent.train(timesteps=500000)  # More is better
  2. Tune Reward Function: Customize rewards in config.yaml

    reward:
      success: 20.0      # Higher reward for wins
      step_penalty: -0.2 # More penalty for slow actions
  3. Use GPU: Install PyTorch with CUDA

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
  4. Monitor Progress: Use TensorBoard

    tensorboard --logdir ./logs

Roadmap

  • v0.1: Basic PPO training, Gymnasium integration
  • 🔜 v0.2: ClawGraph integration for smart state observations
  • 🔜 v0.3: Multi-agent RL with ClawTeam
  • 🔜 v0.4: Safe training in ClawSandbox
  • 🔜 v0.5: User feedback reward integration
  • 🔜 v1.0: Production monitoring & auto-scaling

Troubleshooting

Model Not Found

python scripts/train.py --force-retrain

Dependencies Missing

pip install -r requirements.txt --upgrade

Out of Memory (GPU)

Reduce batch size in config.yaml:

batch_size: 32  # From 64

Slow Training

  • Use GPU (install CUDA-enabled PyTorch)
  • Increase batch_size (if memory allows)
  • Reduce observation_space size
  • Use faster algorithm (PPO is fastest)

Resources

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

MIT License - See LICENSE

Author

Built with 🦞 by the OpenKrab community


Ready to train smarter agents? Start with:

python scripts/train.py

Questions? Open an issue or join our Discord!

About

Beta

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages