Analyzes a Python LangGraph agent file and generates a ready-to-run Braintrust Remote eval — no execution required. It discovers node names, system prompts, and model configuration by reading source code directly, then wires them as tunable parameters so you can iterate on any node's prompt or model from the Braintrust Playground and observe end-to-end agent behavior without experimenting in production.
- Python 3.10+
- Node.js 18+ (for the generated TypeScript eval)
- A Braintrust account
BRAINTRUST_API_KEYandOPENAI_API_KEY
pip install -r requirements.txt
npm install
cp .env.example .env
# fill in your API keysTry it against the included example agent:
python run.py --agent langgraph_agent.pyOr point it at your own agent:
python run.py --agent path/to/my_agent.pyThis generates eval.py. Start the Remote eval dev server:
braintrust eval eval.py --devThen in Braintrust: Configuration → Remote evals → add http://localhost:8300, create a Playground, add a Task → Remote eval, and start iterating.
--agent Path to your Python LangGraph agent file (default: langgraph_agent.py)
--factory Factory function name (default: build_graph)
--params Per-node params to expose. Default: prompt,model
Available: max_tokens, model, prompt, temperature
Use "all" to expose everything
--extra-param App-specific parameter (repeatable). Format: key:type:default:description
Types: string, number, boolean
--lang Output language: python (default) or typescript
--project Braintrust project name (default: "LangGraph Agent")
--output-key State field to return as output
# Default — generates eval.py with prompt and model parameters
python run.py --agent my_agent.py
# Opt in to temperature tuning as well
python run.py --agent my_agent.py --params prompt,model,temperature
# Expose everything
python run.py --agent my_agent.py --params all
# Generate a TypeScript eval instead (self-contained, no Python infra needed)
python run.py --agent my_agent.py --lang typescriptFor parameters that can't be auto-discovered (RAG search depth, feature flags, tool toggles), pass them at the command line with --extra-param. These are injected into the generated eval and passed through to build_graph at runtime.
Format: key:type:default:description — types are string, number, boolean.
python run.py --agent my_agent.py \
--extra-param "ragSearchDepth:number:5:Number of chunks to retrieve" \
--extra-param "webSearchEnabled:boolean:false:Enable the web search tool"Tool descriptions are auto-discovered from @tool docstrings and Tool()/StructuredTool() constructors. Each tool gets a <toolName>Description parameter. You can override a tool's auto-discovered description with an explicit --extra-param:
python run.py --agent my_agent.py \
--extra-param "searchDescription:string:Search recent news only.:Web search tool"The tool uses Python's ast module to parse your agent file without executing it. It:
- Finds
graph.add_node("name", fn)calls to map node names to functions - Finds
SystemMessage(content=...)inside each function and resolves the value — handles string literals, module-level constants, and{**DEFAULT_PROMPTS, ...}dict patterns - Finds
ChatOpenAI(model=...)(and other supported LLM classes) for model and temperature - Finds tool descriptions from
@tooldocstrings andTool()/StructuredTool()constructors — each becomes a{toolName}Descriptionparameter, always included automatically
Limitations: prompts imported from other modules, loaded from files/env vars, or built dynamically (f-strings, string concatenation) won't be auto-discovered. Pass them via --extra-param or edit the generated eval file directly.
ChatOpenAI, AzureChatOpenAI, ChatAnthropic, ChatBedrock, ChatGoogleGenerativeAI, ChatCohere