The definitive OpenAI, Anthropic, Google, MCP, Harness, Evals, and Production Agent Systems learning roadmap.
If this repository helps you, consider giving it a ⭐
The AI industry has entered the Agentic Era. Building production-grade AI systems now requires mastering agents, tool use, MCP, memory, long-running workflows, coding agents, agent harnesses, evals, and safety — but the knowledge is scattered across OpenAI blogs, Anthropic engineering posts, SDK docs, cookbooks, and research papers.
This repository consolidates 129 curated resources into one structured learning roadmap.
The goal: Become a world-class Agentic Engineer.
If you treat Claude Code as a coding CLI, many capabilities can feel like magic: it reads files, runs commands, edits code, delegates work, and stays oriented during complex tasks.
From an engineering perspective, the core is much simpler:
model + tools + one loop.
Understanding that loop makes the rest of the system easier to reason about:
- When the agent should plan first, and when it should act immediately
- Why an explicit todo list reduces drift in longer tasks
- Why subagents improve exploration while protecting the main context
- How skills, MCP, and hooks each add capability around the same core loop
These pages are based on the upstream English Markdown tutorials from shareAI-lab/mini-claude-code, with added Study Notes and inline source code for this handbook.
Supporting files are included in the same folder: requirements.txt, .env.example, v0_bash_agent_mini.py, and skills/.
Build shared vocabulary for workflow vs agent, tool loop, handoff, guardrails.
Should I build an agent? (4-question checklist from Barry Zhang's talk)
| Question | If No → Workflow | If Yes → Agent |
|---|---|---|
| Is the task complex enough? | Decision tree is fully mappable | Ambiguous problem space |
| Is the task valuable enough? | <$0.10 per run | >$1 per run, cost doesn't matter |
| Are all core capabilities doable? | Weak links break the chain | Model handles every step well |
| Is error cost low & detectable? | High cost + hard to detect → human-in-the-loop | Errors caught by tests/CI |
Think like the agent. Most failures come from designing with a human perspective. Put yourself inside the agent's context window: you only see ~10K–20K tokens (system prompt + tool descriptions + recent observations). Ask: does the agent have enough information to act correctly at each step?
→ Source: How We Build Effective Agents
| # | Title | Vendor |
|---|---|---|
| 1 | Prompt guidance | OpenAI |
| 2 | Function Calling | OpenAI |
| 3 | Tool use overview | Anthropic |
| 4 | Function calling - Gemini API | |
| 5 | Building effective agents | Anthropic |
| 6 | New tools for building agents | OpenAI |
| 7 | Agents SDK overview | OpenAI |
| Title | Vendor |
|---|---|
| How We Build Effective Agents: Barry Zhang, Anthropic | Anthropic |
| Phistory — Claude Code & Codex CLI System Prompt Diff History | Community |
| System Prompts | Anthropic |
| OpenAI Agents SDK examples | OpenAI |
| Structured Outputs for Multi-Agent Systems | OpenAI |
Build a customer service/ticket triage agent: router → specialist → evaluator, with all outputs constrained by structured schemas.
Understand MCP server/client, remote vs local, tool loading, approval, connector boundaries.
| # | Title | Vendor |
|---|---|---|
| 1 | Introducing the Model Context Protocol | Anthropic |
| 2 | MCP and Connectors | OpenAI |
| 3 | Building MCP servers for ChatGPT Apps and API integrations | OpenAI |
| Title | Vendor |
|---|---|
| Code execution with MCP: Building more efficient agents | Anthropic |
| Model Context Protocol - Codex | OpenAI |
| OpenAI Docs MCP | OpenAI |
| Build your ChatGPT UI | OpenAI |
Build a read-only repo/docs MCP server, then create an eval to verify the agent correctly cites documentation.
Learn to control context window, short/long-term memory, skills/plugins, CLAUDE.md/AGENTS.md.
| # | Title | Vendor |
|---|---|---|
| 1 | Effective context engineering for AI agents | Anthropic |
| 2 | Equipping agents for the real world with Agent Skills | Anthropic |
| 3 | Agent Skills Specification | Agent Skills |
| 4 | Agent Skills | Anthropic |
| 5 | Skills | OpenAI |
| 6 | Building Reliable Agents with Memory and Compaction | OpenAI |
| Title | Vendor |
|---|---|
| Custom instructions with AGENTS.md - Codex | OpenAI |
| Best practices for Claude Code | Anthropic |
| Agent Skills - Codex | OpenAI |
| Skills in OpenAI API | OpenAI |
Implement the same task as a Skill/Plugin, then measure accuracy and token cost across three variants: no skill, long prompt, and skill-based.
Master agent runtime: event stream, thread, tool execution, state, sandbox, approval, recovery.
| # | Title | Vendor |
|---|---|---|
| 1 | Unrolling the Codex agent loop | OpenAI |
| 2 | Unlocking the Codex harness: how we built the App Server | OpenAI |
| 3 | Effective harnesses for long-running agents | Anthropic |
| Title | Vendor |
|---|---|
| The next evolution of the Agents SDK | OpenAI |
| Using PLANS.md for multi-hour problem solving | OpenAI |
| Harness design for long-running application development | Anthropic |
| Scaling Managed Agents: Decoupling the brain from the hands | Anthropic |
Build a mini coding harness: plan file, shell tool, apply patch, test gate, event log, and resume capability.
Compare Codex vs Claude Code product/SDK forms; learn multi-agent, IDE, workspace collaboration.
| # | Title | Vendor |
|---|---|---|
| 1 | Introducing Codex | OpenAI |
| 2 | Best practices for Claude Code | Anthropic |
| 3 | Enabling Claude Code to work more autonomously | Anthropic |
| Title | Vendor |
|---|---|
| Introducing the Codex app | OpenAI |
| Introducing workspace agents in ChatGPT | OpenAI |
| Apple's Xcode now supports Claude Agent SDK | Anthropic |
| Building Consistent Workflows with Codex CLI & Agents SDK | OpenAI |
Run both OpenAI/Codex and Claude Code style workflows on the same repo: issue → plan → patch → tests → PR summary.
Build pre/post-launch eval loop, trace loop, safety boundaries, permissions, regression monitoring.
| # | Title | Vendor |
|---|---|---|
| 1 | Demystifying evals for AI agents | Anthropic |
| 2 | Testing Agent Skills Systematically with Evals | OpenAI |
| 3 | Build an Agent Improvement Loop with Traces, Evals, and Codex | OpenAI |
| Title | Vendor |
|---|---|
| Running Codex safely at OpenAI | OpenAI |
| How we contain Claude across products | Anthropic |
| Evals API Use-case - MCP Evaluation | OpenAI |
| Measuring AI agent autonomy in practice | Anthropic |
Build a smoke/macro eval suite for your agent: task success rate, tool misuse, prompt injection resistance, latency, cost, and human approval count.
Priority guide: P0 = must-read (architectural/conceptual), P1 = highly useful (implementation detail), P2 = optional context (background/releases).
| Priority | Title | Vendor | Topic | Key Idea | Date |
|---|---|---|---|---|---|
| P0 | OpenAI for Developers in 2025 | OpenAI | Agents; MCP; Platform | Annual overview: systematic walkthrough of Responses API, Agents SDK, AgentKit, Codex, MCP, Apps SDK, and AGENTS.md. | 2025-12-30 |
| P0 | New tools for building agents | OpenAI | Agents; Responses API; Tools | Key starting point for OpenAI's agent platform: Responses API, built-in web/file/computer tools, Agents SDK, tracing/observability. | 2025-03-11 |
| P0 | Introducing AgentKit | OpenAI | Agents; Evals; AgentKit | AgentKit, expanded evals, agent RFT: the official agent toolchain from prototype to production. | 2025-10-06 |
| P0 | Prompt guidance | OpenAI | Prompting; Models; Agent UX | Official model-specific prompting guidance for outcome-first prompts, reasoning effort, preambles, and validation rules in tool-heavy workflows. | Current docs |
| P0 | System Prompts | Anthropic | System prompts; Claude; Behavior | Claude web/mobile system prompt release notes; useful for studying production prompting patterns and behavioral scaffolding. | Current docs |
| P0 | Agents SDK overview | OpenAI | Agents; SDK | Official SDK entry point: concepts and boundaries of agent, tool, handoff, guardrail, and tracing. | Current docs |
| P0 | Introducing the Model Context Protocol | Anthropic | MCP; Standards | The origin article for MCP: an open standard connecting AI assistants to data, tools, and systems. | 2024-11-25 |
| P0 | Building effective agents | Anthropic | Agents; Patterns; Frameworks | Essential agent primer: workflow vs agent, prompt/tool/retrieval, orchestrator-worker, evaluator-optimizer patterns. | 2024-12-19 |
| P0 | New tools and features in the Responses API | OpenAI | MCP; Responses API; Tools | Responses API extended to remote MCP servers, image/code/file tools; see how OpenAI integrates MCP into its runtime. | 2025-05-21 |
| P0 | MCP and Connectors | OpenAI | MCP; Connectors; Responses API | Official guide to connecting remote MCP servers and connectors; includes approvals and security considerations. | Current docs |
| P0 | Building MCP servers for ChatGPT Apps and API integrations | OpenAI | MCP; ChatGPT Apps; API | Official guide to writing MCP servers: supply tools/knowledge to ChatGPT Apps, deep research, and API integrations. | Current docs |
| P0 | Building a Deep Research MCP Server | OpenAI | MCP; Deep research | Minimal implementation of a search/fetch MCP server for Deep Research. | 2025-06-25 |
| P0 | Model Context Protocol - Codex | OpenAI | MCP; Codex | How Codex CLI/IDE connects to MCP servers, adding Figma, browser, docs, and internal tool context to agents. | Current docs |
| P0 | Introducing Codex | OpenAI | Agents; Coding; Sandbox | Cloud-based software engineering agent: parallel tasks, repo sandbox, running tests/linters/type checkers, producing auditable evidence. | 2025-05-16 |
| P0 | Unrolling the Codex agent loop | OpenAI | Harness; Agent loop; Codex | How Codex CLI chains prompt, tool schema, MCP tools, Responses API, and context management into an agent loop. | 2026-01-23 |
| P0 | Unlocking the Codex harness: how we built the App Server | OpenAI | Harness; Codex App Server; JSON-RPC | Core harness article: Codex core, App Server, JSON-RPC, streaming progress, approval, diff, and thread management. | 2026-02-04 |
| P0 | From model to agent: Equipping the Responses API with a computer environment | OpenAI | Harness; Responses API; Sandbox | Responses API + shell tool + hosted containers form the agent runtime; essential for understanding the model-to-agent execution environment. | 2026-03-10 |
| P0 | Harness engineering: leveraging Codex in an agent-first world | OpenAI | Harness; Agent-first engineering | Design product code, tests, CI, docs, and observability to be agent-readable/executable; learn agent-first repo organization. | 2026-02-11 |
| P0 | The next evolution of the Agents SDK | OpenAI | Harness; Agents SDK; MCP; Skills | Agents SDK harness becomes more complete: memory, sandbox orchestration, Codex-like filesystem tools, MCP, skills, AGENTS.md. | 2026-04-15 |
| P0 | Building Consistent Workflows with Codex CLI & Agents SDK | OpenAI | MCP; Codex; Agents SDK | Codex CLI as an MCP server integrated with Agents SDK; real multi-agent dev workflow. | 2025-10-01 |
| P0 | Building Reliable Agents with Memory and Compaction | OpenAI | Memory; Compaction; Reliability | Memory and compaction design for long-context/multi-turn agents. | 2026-05-01 |
| P0 | Build an Agent Improvement Loop with Traces, Evals, and Codex | OpenAI | Evals; Traces; Self-improvement | Connect traces, evals, and Codex fixes into an agent improvement loop. | 2026-05-12 |
| P0 | Eval Driven System Design - From Prototype to Production | OpenAI | Evals; Production | Use evals as the driving force for system design; ideal for moving agents from demo to production. | 2025-06-02 |
| P0 | Testing Agent Skills Systematically with Evals | OpenAI | Evals; Skills; Agents | Systematically test agent skills with evals; establish quality gates before skill release. | 2026-01-22 |
| P0 | Evals API Use-case - MCP Evaluation | OpenAI | MCP; Evals | Evaluate QA/retrieval capabilities with MCP tools; ideal for building an MCP regression suite. | 2025-06-09 |
| P0 | Running Codex safely at OpenAI | OpenAI | Safety; Sandbox; Codex | How OpenAI runs Codex internally: sandbox, approvals, network policy, agent-native telemetry. | 2026-05-20 |
| P0 | Building Governed AI Agents - A Practical Guide to Agentic Scaffolding | OpenAI | Governance; Guardrails; Agents | Governed agent scaffolding: permissions, guardrails, auditing, and organizational policies. | 2026-02-23 |
| P0 | Macro Evals for Agentic Systems | OpenAI | Evals; Agentic systems | Evaluate agents at the end-to-end/macro level, not just individual step outputs. | 2026-05-19 |
| P0 | Best practices for Claude Code | Anthropic | Coding agents; Claude Code | Claude Code methodology: verification loop, explore-plan-code, CLAUDE.md, permissions, MCP, subagents, context management. | 2025-04-18 |
| P0 | How we built our multi-agent research system | Anthropic | Agents; Multi-agent; Research | Claude Research multi-agent architecture: planner + parallel research agents + synthesis; production multi-agent experience. | 2025-06-13 |
| P0 | Writing effective tools for AI agents - with AI agents | Anthropic | Tools; MCP; Evals | Tool quality determines agent quality: tool descriptions, context budget, eval, and letting Claude optimize its own tools. | 2025-09-11 |
| P0 | Effective context engineering for AI agents | Anthropic | Context; Agents | Context is the agent's core resource: selection, compression, isolation, persistence, and context pollution control. | 2025-09-29 |
| P0 | Enabling Claude Code to work more autonomously | Anthropic | Claude Code; Agent SDK; Subagents | Claude Agent SDK, subagents, hooks, background tasks, checkpoints, and other autonomous coding agent capabilities. | 2025-09-29 |
| P0 | Equipping agents for the real world with Agent Skills | Anthropic | Skills; Agents | Agent Skills as modular capability packages: instructions, resources, scripts — reducing context burden and improving reliability. | 2025-10-16 |
| P0 | Agent Skills | Anthropic | Skills; Claude; Progressive disclosure | Official Claude Agent Skills docs: modular instructions, metadata, scripts, resources, and on-demand loading across Claude products. | Current docs |
| P0 | Skills | OpenAI | Skills; API; Shell environments | Official OpenAI API guide for uploading, managing, and attaching reusable Skills to hosted and local shell environments. | Current docs |
| P0 | Agent Skills Specification | Agent Skills | Skills; Specification; Progressive disclosure | Complete skill package format: SKILL.md frontmatter, optional scripts/references/assets, file references, and validation. | Current docs |
| P0 | Code execution with MCP: Building more efficient agents | Anthropic | MCP; Code execution; Context | Key article on MCP scale challenges: reduce token overhead with code execution/on-demand tools; learn progressive disclosure. | 2025-11-04 |
| P0 | Introducing advanced tool use on Claude Developer Platform | Anthropic | Tools; MCP; Advanced tool use | Tool search, deferred loading, programmatic tool calling; solving context pollution from large numbers of MCP tools. | 2025-11-24 |
| P0 | Effective harnesses for long-running agents | Anthropic | Harness; Long-running agents | Essential harness reading: working across multiple context windows, task logging, external state, agent self-recovery. | 2025-11-26 |
| P0 | Demystifying evals for AI agents | Anthropic | Evals; Agents | Agent evals are more complex than static evals: multi-turn, tools, state changes, creative solutions, failure taxonomy. | 2026-01-09 |
| P0 | Measuring AI agent autonomy in practice | Anthropic | Agents; Autonomy; Measurement | Quantify agent autonomy using metrics like task duration and supervision needs; ideal for building autonomy benchmarks. | 2026-02-18 |
| P0 | Harness design for long-running application development | Anthropic | Harness; Application development | Harness design patterns for delegating long-running app development tasks to agents; compare with OpenAI Codex harness. | 2026-03-24 |
| P0 | Scaling Managed Agents: Decoupling the brain from the hands | Anthropic | Managed agents; Harness | Decouple the model brain from execution hands/harness, keeping interfaces stable as the harness evolves. | 2026-04-08 |
| P0 | How we contain Claude across products | Anthropic | Safety; Containment; Agents | Blast radius of powerful agent releases, human-in-the-loop, and containment strategies. | 2026-05-25 |
| P1 | Structured Outputs for Multi-Agent Systems | OpenAI | Agents; Multi-agent; Structured outputs | Use strict schemas to constrain structured messages and handoffs between multiple agents. | 2024-08-06 |
| P1 | Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku | Anthropic | Agents; Computer use | Claude computer use beta starting point: the model uses a computer via screenshots and actions. | 2024-10-22 |
| P1 | Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet | Anthropic | Agents; Coding; Evals | SWE-bench agent scaffolding article: same model performance strongly depends on harness/scaffolding. | 2025-01-06 |
| P1 | Introducing Operator | OpenAI | Agents; Computer use; Safety | Early product form of browser-based agents: model clicks, types, and executes tasks on web pages, emphasizing user confirmation and safety boundaries. | 2025-01-23 |
| P1 | Computer-Using Agent | OpenAI | Agents; Computer use | Understand how CUA combines vision, mouse/keyboard actions, and environment feedback into an agent loop; compare with Claude computer use. | 2025-01-23 |
| P1 | Claude 3.7 Sonnet and Claude Code | Anthropic | Agents; Coding; Claude Code | Early release of Claude Code, marking Claude's entry into the agentic coding tool space. | 2025-02-24 |
| P1 | The think tool: Enabling Claude to stop and think in complex tool use situations | Anthropic | Tools; Reasoning; Agents | Give the model an explicit think tool in complex tool-use chains; learn tool design for policy-heavy/multi-step decisions. | 2025-03-20 |
| P1 | Evaluating Agents with Langfuse | OpenAI | Evals; Agents | Observe and evaluate Agents SDK runs with Langfuse; learn tracing/eval workflows. | 2025-03-31 |
| P1 | Parallel Agents with the OpenAI Agents SDK | OpenAI | Agents; Parallelism; Agents SDK | Parallel agent patterns: decompose tasks, execute in parallel, aggregate results. | 2025-05-01 |
| P1 | Multi-Agent Portfolio Collaboration with OpenAI Agents SDK | OpenAI | Agents; Multi-agent; Portfolio | Multi-agent collaboration business example: research, analysis, combined output. | 2025-05-28 |
| P1 | MCP-Powered Agentic Voice Framework | OpenAI | MCP; Voice; Agents | Voice agent + MCP paradigm: real-time interaction, tool extension, task execution. | 2025-06-17 |
| P1 | Deep Research API with the Agents SDK | OpenAI | Agents; Deep research; Agents SDK | Integrate Deep Research API into Agents SDK workflows. | 2025-06-25 |
| P1 | Desktop Extensions: One-click MCP server installation for Claude Desktop | Anthropic | MCP; Claude Desktop; Packaging | Package local MCP servers as one-click install extensions; learn MCP distribution/installation/local permission issues. | 2025-06-26 |
| P1 | Building a Supply-Chain Copilot with OpenAI Agent SDK and Databricks MCP Servers | OpenAI | MCP; Agents; Databricks | Enterprise data platform MCP + Agent SDK business agent example. | 2025-07-08 |
| P1 | Introducing ChatGPT agent: bridging research and action | OpenAI | Agents; ChatGPT; Computer use | End-user-facing ChatGPT agent: combining research, browser, computer use, file/slide capabilities. | 2025-07-17 |
| P1 | ChatGPT agent System Card | OpenAI | Agents; Safety; Evals | Learn pre-launch risk classification, evaluation, permissions, human confirmation, and abuse prevention for agent products. | 2025-07-17 |
| P1 | Context Engineering - Short-Term Memory Management with Sessions | OpenAI | Context; Sessions; Agents | How short-term memory/session state affects agent reliability. | 2025-09-09 |
| P1 | Introducing upgrades to Codex | OpenAI | Agents; Coding; IDE | Codex evolves from research preview to daily dev tool: CLI, IDE, web/mobile collaboration, and more independent task execution. | 2025-09-15 |
| P1 | Introducing Claude Sonnet 4.5 | Anthropic | Agents; Claude Agent SDK; Computer use | Sonnet 4.5 emphasizes coding, complex agents, computer use, with simultaneous Agent SDK launch. | 2025-09-29 |
| P1 | Introducing apps in ChatGPT and the new Apps SDK | OpenAI | MCP; Apps; ChatGPT | Apps SDK extends UI and tool server via MCP; entry point for understanding the ChatGPT app / MCP app ecosystem. | 2025-10-06 |
| P1 | Build your ChatGPT UI | OpenAI | MCP; Apps SDK; UI | Build custom UI components that turn structured MCP tool results into interactive ChatGPT app interfaces. | Current docs |
| P1 | Codex is now generally available | OpenAI | Agents; Coding; Codex SDK | Codex GA, Slack integration, Codex SDK, admin tools; see how coding agents enter enterprise management. | 2025-10-06 |
| P1 | Using PLANS.md for multi-hour problem solving | OpenAI | Codex; Long-running; Planning | ExecPlan files and cross-context task management for multi-hour coding-agent work. | 2025-10-07 |
| P1 | Beyond permission prompts: making Claude Code more secure and autonomous | Anthropic | Safety; Permissions; Claude Code | From simple permission prompts to fine-grained security policies, reducing autonomous mode risk and interruptions. | 2025-10-20 |
| P1 | Introducing Aardvark: OpenAI's agentic security researcher | OpenAI | Agents; Security | Security-domain agent form: continuous scanning, issue verification, fix suggestions; later integrated as Codex Security. | 2025-10-30 |
| P1 | Build a coding agent with GPT 5.1 | OpenAI | Agents; Coding | Build a coding agent from scratch: understand file editing, command execution, loops, and verification. | 2025-11-13 |
| P1 | OpenAI co-founds Agentic AI Foundation | OpenAI | MCP; Standards; AGENTS.md | MCP, AGENTS.md, and agent standards enter the Linux Foundation/AAIF context; understand ecosystem standardization. | 2025-12-09 |
| P1 | Donating MCP and establishing the Agentic AI Foundation | Anthropic | MCP; Standards; AAIF | Anthropic donates MCP to Linux Foundation/AAIF; read alongside OpenAI's AAIF article. | 2025-12-09 |
| P1 | Context Engineering for Personalization - Long-Term Memory Notes | OpenAI | Context; Long-term memory; Agents | How long-term memory serves as agent personalization/state management. | 2026-01-05 |
| P1 | Supercharging Codex with JetBrains MCP at Skyscanner | OpenAI | MCP; Codex; IDE | Real IDE/MCP case study: how Codex CLI accesses IDE context and dev tools via JetBrains MCP. | 2026-01-11 |
| P1 | Designing AI-resistant technical evaluations | Anthropic | Evals; Technical hiring | How strong agents continuously break technical evaluations; relevant to benchmark contamination prevention and eval design. | 2026-01-21 |
| P1 | Inside OpenAI's in-house data agent | OpenAI | Agents; Data; Memory | Internal data agent case study: memory, Codex, data context, reliability; learn enterprise knowledge/data agents. | 2026-01-29 |
| P1 | Introducing the Codex app | OpenAI | Agents; Coding; Multi-agent | Desktop command center for agents: multi-threaded/parallel long tasks, project-level agent workflows. | 2026-02-02 |
| P1 | Apple's Xcode now supports Claude Agent SDK | Anthropic | Claude Agent SDK; Xcode; MCP | Embed Claude Agent SDK in Xcode: harness, subagents, background tasks, plugins, MCP. | 2026-02-03 |
| P1 | Quantifying infrastructure noise in agentic coding evals | Anthropic | Evals; Coding agents; Infrastructure | Environment configuration significantly impacts scores in agentic coding evals; control infrastructure noise in both production and benchmarks. | 2026-02-05 |
| P1 | Building a C compiler with a team of parallel Claudes | Anthropic | Multi-agent; Coding; Long-running | Parallel Claude teams completing large engineering tasks; learn multi-agent division of labor, coordination, and long-running execution. | 2026-02-05 |
| P1 | Codex Security: now in research preview | OpenAI | Agents; Security; Codex | Productization of an agentic security researcher: vulnerability discovery, verification, fix suggestions, reducing triage noise. | 2026-03-06 |
| P1 | Eval awareness in Claude Opus 4.6's BrowseComp performance | Anthropic | Evals; Agent awareness | Risk of models recognizing/adapting to evaluations; relevant to agent benchmark credibility discussions. | 2026-03-06 |
| P1 | How we built Claude Code auto mode: a safer way to skip permissions | Anthropic | Safety; Permissions; Autonomy | Claude Code auto mode risk classification, allow/block rules, exception handling, and security testing. | 2026-03-25 |
| P1 | Migrate a Legacy Codebase with Sandbox Agents | OpenAI | Agents; Sandbox; Evals | Sandbox agent evaluation and execution patterns in large legacy code migrations. | 2026-04-07 |
| P1 | Codex for (almost) everything | OpenAI | Agents; Codex; MCP; Plugins | Codex app expanded to Windows/macOS, computer use, in-app browser, memory, plugins, MCP servers. | 2026-04-16 |
| P1 | Computer Use Agents in Daytona Sandboxes | OpenAI | Computer use; Sandbox; Agents | Computer-use agents and sandbox runtimes; compare with Operator/CUA/Claude computer use. | 2026-04-19 |
| P1 | Introducing workspace agents in ChatGPT | OpenAI | Agents; Workspace; Governance | Workspace agents: shared agents, permissions, tools, memory, safeguards; ideal for team collaboration agent design. | 2026-04-22 |
| P1 | Building workspace agents in ChatGPT to complete repeatable, end-to-end work | OpenAI | Workspace agents; ChatGPT | Practical workspace agents for repeatable end-to-end team workflows. | 2026-04-22 |
| P1 | Speeding up agentic workflows with WebSockets in the Responses API | OpenAI | Agents; Latency; Responses API | Optimize latency by treating agentic rollouts as long-lived connections/tasks; learn production agent transport and caching. | 2026-05-01 |
| P1 | Agents for financial services | Anthropic | Agents; Finance; MCP | Ten ready-to-run agent templates, Claude Code/Cowork plugins, Managed Agents cookbooks, MCP app. | 2026-05-05 |
| P1 | Migrate from the Claude Agent SDK to the OpenAI Agents SDK | OpenAI | Agents SDK; Migration | Compare Claude Agent SDK and OpenAI Agents SDK from a migration perspective; ideal for dual-stack learning. | 2026-05-07 |
| P1 | Building a safe, effective sandbox to enable Codex on Windows | OpenAI | Safety; Sandbox; Codex | Coding agent sandbox design on Windows: file access, network restrictions, approval tradeoffs. | 2026-05-13 |
| P1 | Building self-improving tax agents with Codex | OpenAI | Agents; Evals; Self-improvement | Combine production traces, expert feedback, Codex loop, and eval infrastructure into self-improving business agents. | 2026-05-27 |
| P1 | SchemaFlow: Agentic Database Change Impact Analysis, SQL Generation, and Eval Guardrails | OpenAI | Evals; SQL; Agent guardrails | Guardrails and eval guardrails examples for data/SQL agents. | 2026-06-05 |
| P1 | Agents SDK quickstart | OpenAI | Agents; SDK | Quickly build a minimal agent; understand the code patterns of run, tool, and handoff. | Current docs |
| P1 | OpenAI Agents SDK examples | OpenAI | Agents SDK; Patterns; Examples | Practical examples for agent patterns, MCP, memory, guardrails, approvals, handoffs, and streaming. | Current docs |
| P1 | MCP Apps compatibility in ChatGPT | OpenAI | MCP; Apps SDK; UI | Understand MCP Apps UI standards, iframe/bridge, and compatibility between ChatGPT and other hosts. | Current docs |
| P1 | Use Codex with the Agents SDK | OpenAI | MCP; Codex; Agents SDK | Use Codex as an MCP server for other agents to call; ideal for multi-agent dev workflows. | Current docs |
| P1 | Agent approvals and security - Codex | OpenAI | Safety; Approvals; Codex | Official reference for Codex approval modes, sandbox, network access; read alongside OpenAI/Anthropic safety articles. | Current docs |
| P1 | Agent Skills - Codex | OpenAI | Codex; Skills; Plugins | Skills/Plugins as reusable workflow packages; compare with Anthropic Agent Skills. | Current docs |
| P1 | Skills in OpenAI API | OpenAI | Skills; OpenAI API | Cookbook example for using Skills in the OpenAI API and connecting skill bundles to agent workflows. | Current docs |
| P1 | Custom instructions with AGENTS.md - Codex | OpenAI | AGENTS.md; Context | How AGENTS.md provides persistent project specifications for agents; establish repo-level agent contracts. | Current docs |
| P1 | Agents SDK integrations and observability | OpenAI | Observability; MCP; Tracing | Tracing, MCP integration, provider/observability; essential for production agent debugging. | Current docs |
| P1 | Secure MCP Tunnel | OpenAI | MCP; Security; Private tools | Securely expose private/intranet MCP servers to supported OpenAI surfaces; ideal for enterprise deployment. | Current docs |
| P1 | How Claude Code works | Anthropic | Claude Code; Agentic loop; Harness | Under-the-hood architecture of Claude Code: the agentic loop (gather context → act → verify), built-in tool categories, context window management, and extension points. | Current docs |
| P0 | learn-claude-code | Community | Harness; Agent loop; Tools; Context | Hands-on 20-lesson tutorial building a Claude Code–like agent harness from scratch: agent loop, tool integration, context compaction, multi-agent coordination, permissions, MCP plugins. | 2026 |
| P0 | Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems | Academic | Agent architecture; Claude Code; Design space | Deep technical analysis of Claude Code's architecture: agentic loop, permission system, context compaction, extensibility (MCP/plugins/skills/hooks), subagent delegation, and comparison with open-source alternatives. | 2026-04-14 |
| P0 | Function Calling | OpenAI | Tools; Function calling; API | Official guide to function/tool calling: define functions with JSON schemas, handle model tool calls, execute and return results. | Current docs |
| P0 | Tool use overview | Anthropic | Tools; Tool use; API | Connect Claude to external tools and APIs: client vs server tools, the agentic loop, strict schema conformance, and when Claude decides to call tools. | Current docs |
| P0 | Function calling - Gemini API | Tools; Function calling; API | Enable Gemini models to connect with external tools via function calling: single-turn, multi-turn, parallel, and sequential function chains. | Current docs | |
| P2 | Orchestrating Agents: Routines and Handoffs (archived) | OpenAI | Agents; Handoffs; Orchestration | Historical cookbook for routines and handoffs; useful conceptually, but archived and not the current recommended implementation path. | 2024-10-10 |
| P2 | Introducing Contextual Retrieval | Anthropic | Context; Retrieval; RAG | Not agent-specific, but important for agent RAG/context: prepend context to chunks before retrieval to improve recall. | 2024-09-19 |
| P2 | Developing a computer use model | Anthropic | Computer use; Agents | More technical explanation of how the computer-use model moves the mouse, clicks, types, and reads screen feedback. | 2024-10-22 |
| P2 | Introducing Claude 4 | Anthropic | Agents; Coding; Long-running | Overview of Claude Opus/Sonnet 4 capabilities: coding, advanced reasoning, agent workflows. | 2025-05-22 |
| P2 | Claude for Financial Services | Anthropic | Agents; Connectors; Finance | Vertical industry agent/connector productization case; understand data, permissions, and tool integration in finance. | 2025-07-15 |
| P2 | Advancing Claude for Financial Services | Anthropic | Agents; Skills; Finance | Claude for Excel, real-time data connectors, pre-built Agent Skills for vertical industry productization. | 2025-10-27 |
| P2 | Introducing GPT-5.3-Codex | OpenAI | Agents; Coding model; Evals | Codex-native model and long-running coding/terminal/agentic benchmarks; understand how model capabilities serve the harness. | 2026-02-05 |
| P2 | Introducing OpenAI Frontier | OpenAI | Agents; Enterprise; Governance | Enterprise AI coworker/agent platform: shared context, onboarding, permissions, guardrails, governance. | 2026-02-10 |
| P2 | Introducing Claude Sonnet 4.6 | Anthropic | Agents; Planning; Computer use | Sonnet 4.6 emphasizes coding, computer use, long-context reasoning, agent planning. | 2026-02-17 |
| P2 | Introducing Claude Opus 4.6 | Anthropic | Agents; Long-running; Tool use | Model release perspective on long-running tasks, agentic harness, subagents, and tool call capabilities. | 2026-02-25 |
| P2 | Introducing Claude Opus 4.7 | Anthropic | Agents; Long-running; Coding | Stronger software engineering and long-running task performance; track how model capabilities impact agent workloads. | 2026-04-16 |
| P2 | An update on recent Claude Code quality reports | Anthropic | Reliability; Claude Code; Agent SDK | Postmortem on Claude Code/Agent SDK quality regression; learn agent product operations and regression control. | 2026-04-23 |
| P2 | Introducing Claude Opus 4.8 | Anthropic | Agents; Dynamic workflows; Long-running | Dynamic workflows, hundreds of parallel subagents, long-running agentic tasks — latest model/product direction. | 2026-05-28 |
| P2 | Codex for every role, tool, and workflow | OpenAI | Agents; Codex; Plugins | Codex expands from development to knowledge work: role-specific plugins, Sites, annotations, parallel workflows. | 2026-06-02 |
| P2 | Codex is becoming a productivity tool for everyone | OpenAI | Agents; Knowledge work | Usage data shows how non-developers use Codex for reports, spreadsheets, research, automation, and lightweight tools. | 2026-06-02 |
| P2 | OpenAI Docs MCP | OpenAI | MCP; Docs; Context | Official OpenAI docs MCP server; connect docs directly to local agents/IDEs. | Current docs |
| P2 | Codex SDK | OpenAI | Codex SDK; Automation | Programmatically control Codex in CI/CD or internal tools; embed coding agents into existing workflows. | Current docs |
| P2 | When AI builds itself | Anthropic | Agents; Recursive self-improvement; Safety | How AI systems accelerate their own development through recursive self-improvement; three possible futures and the need for verifiable coordination. | 2026-05 |
- AI Engineers
- Agent Engineers
- LLM Engineers
- Platform Engineers
- Research Engineers
- AI Startup Founders
Contributions are welcome. If you find:
- New OpenAI resources
- New Anthropic resources
- MCP updates
- Agent evaluation frameworks
- Production engineering articles
Please open a pull request.
The goal of this project is to become the System Design Primer for Agentic Engineering.
If you're serious about building production AI agents, start here.