An extensible agentic RL framework for training multi-turn agents across tool use, memory, and expert routing.
This repository is organized as a reusable training stack for:
- multi-turn rollouts
- task-level rewards
- trajectory conversion
- benchmark inspection
- HPC-scale training workflows
The project is meant to be a clean base for building new agentic RL methods, not a collection of one-off training scripts.
loop_agent: planner / executor / verifier style tool-use RLmemory_agent: chunk-wise memory compression for long-context QAexpert_router: routing across retrieval and external expert models under cost-aware preferences
agentic_rl.multi_turn: shared trajectory expansion and GRPO reward normalizationagentic_rl.core: shared runtime utilities such as LLM engine adaptersagentic_rl.methods.registry: typed method registry and method metadataagentic_rl.cli: unified inspection entrypoint for methods and benchmarks
Examples:
agentic-rl list-methods
agentic-rl show-method loop_agent
agentic-rl benchmarksThe repository includes a minimal HPC training layer:
requirements-train.txtfor environment setupconfigs/hpc.env.examplefor cluster paths and runtime variablesconfigs/models/*.shandconfigs/methods/*.shfor model/method launch configsscripts/launch_train.shas the shared Ray + training entrypointscripts/*.sbatchtemplates for debug and formal jobsscripts/preflight_check.pyfor dataset/checkpoint/path validation
See docs/hpc_training.md before submitting jobs.
- Install
slimeseparately or through your preferred environment setup. expert_routerexpects external services for retrieval and expert models.- For
func_callmode, setAGENTIC_RL_TAU2_ROOTto an external TAU2 checkout or asset directory.
This repository includes work developed with reference to upstream open-source projects.
See THIRD_PARTY_NOTICES.md for redistribution and attribution details.