🤖 RL Intern

Your autonomous Reinforcement Learning engineer.
Reads papers. Builds agents. Ships models. All while you sleep.

"Give it a task. Walk away. Come back to a trained model on the Hub."

✨ What is RL Intern?

RL Intern is an autonomous AI agent that acts as your personal RL engineer. Give it a natural language task and it will:

📄 Research — search arXiv, read papers, understand SOTA algorithms
🌍 Explore — inspect Gymnasium environments, understand state/action spaces
🛠️ Implement — generate complete training scripts (PPO, DQN, SAC, TD3)
🏋️ Train — run experiments in a sandboxed subprocess with live logs
📊 Evaluate — report mean reward, std deviation, episode stats
🚀 Ship — upload trained models directly to the Hugging Face Hub

🏗️ Architecture

The agent runs an async agentic loop with two queues:

Queue	Direction	Purpose
`submission_queue`	CLI → Agent	User input, approvals, undo, compact
`event_queue`	Agent → CLI	Streaming text, tool calls, results, errors

The loop routes every operation through Session → ToolRouter → 15 built-in RL tools (+ any MCP servers you configure), while a DoomLoopDetector automatically breaks stuck repetitive patterns.

⚡ Quick Start

1. Clone & install

git clone https://github.com/enfibiotech/rl-intern-.git
cd rl-intern-
uv sync
uv tool install -e .

Requires: Python 3.11+, uv

2. Set your API keys

cp .env.example .env

# Required
ANTHROPIC_API_KEY=sk-ant-...

# Required for HuggingFace uploads
HF_TOKEN=hf_...

# Optional
GITHUB_TOKEN=ghp_...
WANDB_API_KEY=...

3. Run it

rl-intern

That's it. You're now talking to your RL engineer. 🎉

🚀 Usage

Interactive mode (REPL)

rl-intern

> train a PPO agent on CartPole-v1 for 500k steps and upload to HuggingFace
> search for recent offline RL papers on arXiv
> what MuJoCo environments are best for SAC?
> evaluate my model at ./models/best_model on LunarLander-v2

Headless / one-shot mode

rl-intern "train DQN on LunarLander-v2 and report the mean reward"

CLI flags

Flag	Default	Description
`--model`	`claude-sonnet-4-20250514`	Any LiteLLM model string
`--max-iterations`	`300`	Agentic loop iteration cap
`--no-stream`	off	Disable token streaming
`--auto-approve`	off	Skip approval gates (use carefully)
`--config`	`configs/main_agent_config.json`	Custom config path

rl-intern --model anthropic/claude-opus-4-6 --max-iterations 150 "your prompt"
rl-intern --auto-approve "run a quick CartPole benchmark"

Slash commands (interactive mode)

Command	Action
`/help`	Show all commands
`/compact`	Manually compact the context window
`/undo`	Undo the last message
`/quit`	Exit

🧰 Built-in Tools (15)

Tool	Description
`search_rl_papers`	Search arXiv for RL papers by query
`read_rl_paper`	Fetch full abstract + metadata for an arXiv ID
`list_rl_environments`	List Gymnasium envs with optional filter
`inspect_environment`	Show obs/action space and spec for any env
`search_rl_docs`	Search docs for Gymnasium, SB3, CleanRL, TRL, RLlib
`generate_training_script`	Generate complete SB3 training script (PPO/DQN/SAC/TD3)
`run_bash`	Execute shell commands in sandbox (approval required)
`run_python`	Execute Python code in sandbox (approval required)
`read_file`	Read any file with optional line limit
`write_file`	Write file to disk (approval required)
`search_hf_rl_models`	Search HF Hub for trained RL models
`upload_model_to_hf`	Upload a model folder to HF Hub (approval required)
`install_package`	`pip install` any package (approval required)
`evaluate_model`	Run N evaluation episodes and report reward stats
`create_plan`	Output a numbered plan before complex tasks

🔌 MCP Server Support

RL Intern supports the Model Context Protocol — connect any MCP server to extend capabilities with external tools.

Edit configs/main_agent_config.json:

{
  "mcpServers": {
    "filesystem": {
      "transport": "stdio",
      "command": "npx -y @modelcontextprotocol/server-filesystem /tmp/workspace",
      "args": []
    },
    "github": {
      "transport": "stdio",
      "command": "npx -y @modelcontextprotocol/server-github",
      "env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}" }
    },
    "wandb": {
      "transport": "http",
      "url": "https://mcp.wandb.ai/sse",
      "headers": { "Authorization": "Bearer ${WANDB_API_KEY}" }
    }
  }
}

MCP tools appear automatically, prefixed with their server name:

> use filesystem__read_file to check my training logs
> search github for PPO implementations

Supported transports: stdio (local subprocess) · http / sse (remote server)

🧩 Adding Custom Tools

Drop a ToolSpec into agent/core/tools.py or pass it at runtime:

from agent.core.tools import ToolSpec
from agent.core.session import Session

async def my_handler(env_id: str) -> str:
    return f"Custom result for {env_id}"

my_tool = ToolSpec(
    name="my_rl_tool",
    description="Does something custom with a Gymnasium env",
    parameters={
        "type": "object",
        "properties": {
            "env_id": {"type": "string", "description": "Gymnasium env ID"}
        },
        "required": ["env_id"]
    },
    handler=my_handler,
    requires_approval=False,
)

session = Session(
    model_name="anthropic/claude-sonnet-4-20250514",
    extra_tools=[my_tool],
)

🐍 Programmatic Usage

Use RL Intern as a Python library — no CLI needed:

import asyncio
from agent.core.session import Session
from agent.core.agent_loop import AgentLoop
from agent.core.events import EventType

async def run(task: str):
    session = Session(model_name="anthropic/claude-sonnet-4-20250514")
    agent = AgentLoop(session, max_iterations=50)
    asyncio.create_task(agent.run())

    await agent.event_queue.get()          # READY
    await agent.send_user_input(task)

    while True:
        event = await agent.event_queue.get()
        if event.type == EventType.ASSISTANT_CHUNK:
            print(event.data["text"], end="", flush=True)
        elif event.type == EventType.TURN_COMPLETE:
            break

asyncio.run(run("List the top 5 Gymnasium environments for PPO"))

See examples/programmatic_usage.py for the full version.

🐳 Docker

docker build -t rl-intern .
docker run -it \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  -e HF_TOKEN=$HF_TOKEN \
  -v $(pwd)/workspace:/workspace \
  rl-intern

🗂️ Project Structure

---

🧪 Running Tests

uv run pytest tests/ -v

🤝 Contributing

Contributions are welcome! To add a new built-in tool:

Add an async def handle_* function in agent/core/tools.py
Add a ToolSpec entry in create_builtin_tools()
Add tests in tests/test_core.py
Open a PR 🎉

Made with ❤️ for the RL community

⭐ Star this repo if RL Intern saved you time!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
agent		agent
configs		configs
examples		examples
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 RL Intern

✨ What is RL Intern?

🏗️ Architecture

⚡ Quick Start

1. Clone & install

2. Set your API keys

3. Run it

🚀 Usage

Interactive mode (REPL)

Headless / one-shot mode

CLI flags

Slash commands (interactive mode)

🧰 Built-in Tools (15)

🔌 MCP Server Support

🧩 Adding Custom Tools

🐍 Programmatic Usage

🐳 Docker

🗂️ Project Structure

🧪 Running Tests

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 RL Intern

✨ What is RL Intern?

🏗️ Architecture

⚡ Quick Start

1. Clone & install

2. Set your API keys

3. Run it

🚀 Usage

Interactive mode (REPL)

Headless / one-shot mode

CLI flags

Slash commands (interactive mode)

🧰 Built-in Tools (15)

🔌 MCP Server Support

🧩 Adding Custom Tools

🐍 Programmatic Usage

🐳 Docker

🗂️ Project Structure

🧪 Running Tests

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages