Agent Skills for the full Build → Evaluate → Optimize lifecycle of LLM pipelines on orq.ai.
Skills are multi-step workflows that require reasoning (e.g. build an agent, run an experiment);
Commands are quick actions for immediate results (list traces, show analytics).
Each skill encodes best practices from prompt engineering, agent design, evaluation methodology, and experimentation into repeatable workflows. From creating agents and writing prompts, through trace analysis and dataset generation, to running validated experiments and iterating on results.
Built on the Agent Skills standard format, so it works with any compatible agent (Claude Code, Cursor, Gemini CLI, and others).
-
An orq.ai account
-
An API key from Settings → API Keys
export ORQ_API_KEY=your-key-here
Use this if you want easy access to all components — skills, MCP tools, and trace hooks — in one install. Installed via the orq-ai/claude-plugins marketplace.
# In Claude Code:
/plugin marketplace add orq-ai/claude-plugins
# Install all 3 plugins
/plugin install orq-skills@orq-claude-plugin
/plugin install orq-mcp@orq-claude-plugin
/plugin install orq-trace@orq-claude-plugin| Plugin | What it gives you |
|---|---|
orq-skills |
Skills, commands, and agents for the Build → Evaluate → Optimize lifecycle |
orq-mcp |
MCP server registration — Claude can call orq.ai APIs directly |
orq-trace |
OTLP tracing hooks that capture Claude Code sessions into orq.ai |
Verify with the interactive onboarding — checks ORQ_API_KEY, MCP reachability, and credentials:
/orq:quickstart
Use this when you're on a non-Claude agent (Cursor, Gemini CLI, Cline, Copilot CLI, Codex, Windsurf, and many others), or when you only want the skills without MCP/trace hooks.
npx skills add orq-ai/orq-skillsAuto-detects your agent and writes skills to the correct location (e.g. .claude/skills/, .cursor/rules/). Run inside your project directory.
Agent-specific install guides:
Use this when you want orq.ai MCP tools in a tool that isn't the Claude Code plugin (Claude Desktop, other MCP-capable clients, or manual Claude Code setup).
# Manual registration in Claude Code
claude mcp add --transport http orq-workspace https://my.orq.ai/v2/mcp \
--header "Authorization: Bearer ${ORQ_API_KEY}"For other clients, most accept a JSON block with url + headers:
{
"mcpServers": {
"orq-workspace": {
"type": "http",
"url": "https://my.orq.ai/v2/mcp",
"headers": { "Authorization": "Bearer ${ORQ_API_KEY}" }
}
}
}tests/scripts/validate-plugin-manifests.shQuick-action slash commands. Use /orq:<command> in Claude Code.
| Command | What It Does | Usage |
|---|---|---|
| quickstart | Interactive onboarding — credentials, MCP setup, skills tour | /orq:quickstart |
| workspace | Workspace overview — agents, deployments, prompts, datasets, experiments | /orq:workspace [section] |
| traces | Query and summarize traces with filters | /orq:traces [--deployment name] [--status error] [--last 24h] |
| models | List available AI models by provider | /orq:models [search-term] |
| analytics | Usage analytics — requests, cost, tokens, errors | /orq:analytics [--last 24h] [--group-by model] |
/orq:workspace agents # Show only agents
/orq:traces --status error --last 1h # Recent errors
/orq:models gpt-4 # Search for GPT-4 variants
/orq:analytics --group-by deployment # Cost per deployment
Skills are triggered by describing what you need. Claude picks the right skill automatically.
| Skill | What It Does | Documentation |
|---|---|---|
| setup-observability | Set up orq.ai observability for LLM applications — AI Router proxy, OpenTelemetry, tracing setup, and trace enrichment | SKILL.md |
| invoke-deployment | Invoke orq.ai deployments, agents, and models via the Python SDK or HTTP API — pass prompt variables, stream responses, and generate integration code | SKILL.md |
| build-agent | Design, create, and configure an orq.ai Agent with tools, instructions, knowledge bases, and memory | SKILL.md |
| build-evaluator | Create validated LLM-as-a-Judge evaluators following evaluation best practices | SKILL.md |
| analyze-trace-failures | Read production traces, identify what's failing, build failure taxonomies, and categorize issues | SKILL.md |
| run-experiment | Create and run orq.ai experiments — compare configurations with specialized agent, conversation, and RAG evaluation | SKILL.md |
| compare-agents | Run cross-framework agent comparisons using evaluatorq — compare orq.ai, LangGraph, CrewAI, OpenAI Agents SDK, and others | SKILL.md |
| generate-synthetic-dataset | Generate and curate evaluation datasets — structured generation, quick from description, expansion, and dataset maintenance | SKILL.md |
| optimize-prompt | Analyze and optimize system prompts using a structured prompting guidelines framework | SKILL.md |
"I need a customer support agent" → build-agent
"Create test cases for it" → generate-synthetic-dataset
"Build an evaluator for response accuracy" → build-evaluator
"Run an experiment to get a baseline" → run-experiment
/orq:traces --status error --last 24h # Find errors
"Analyze these failures" → analyze-trace-failures
"Fix the prompt based on the failure analysis" → optimize-prompt
"Re-run the experiment to verify the fix" → run-experiment
/orq:analytics --group-by deployment # Spot high error rates
"Analyze traces for the checkout agent" → analyze-trace-failures
"Build evaluators for the failure modes" → build-evaluator
"Generate a dataset covering edge cases" → generate-synthetic-dataset
"Run an experiment and compare" → run-experiment
"Optimize the prompt based on results" → optimize-prompt
"My prompt isn't performing well, help me improve it" → optimize-prompt
"Create test cases to compare before and after" → generate-synthetic-dataset
"Build an evaluator for [specific dimension]" → build-evaluator
"Run an experiment: current vs optimized prompt" → run-experiment
"Refine the prompt based on failure cases" → optimize-prompt