A structured learning path covering AI agents from fundamentals to production-ready deployment in a Kubernetes cluster — with runnable code, lessons learned, and blog-post-ready documentation.
| Phase | Topic | Blocks |
|---|---|---|
| Phase 1 | Fundamentals | 3 |
| Phase 2 | Abstractions | 3 |
| Phase 3 | Multi-Agent Systems | 2 |
| Phase 4 | Production Engineering | 4 |
| Phase 5 | Agentic MLOps | 5 |
Core Question: What distinguishes an AI agent from an LLM pipeline?
| Block | Topic |
|---|---|
| 1.1 | Agent Anatomy — Pattern Comparison (ReAct, Planning, Reflection) |
| 1.2 | Tool Use — Isolated Experiment with the Anthropic API |
| 1.3 | ReAct Agent from Scratch — Wikipedia + Calculator |
Key Insight: Frameworks abstract away the mechanics — anyone who has manually built the ReAct loop understands what LangGraph does under the hood.
Core Question: The manual agent from Phase 1 does not scale. How do frameworks solve state management, checkpointing, and tool standardization?
| Block | Topic |
|---|---|
| 2.1 | LangGraph Basics — StateGraph, Checkpointing, Fixed Graph (Pattern A) |
| 2.2 | MCP — Model Context Protocol, Custom MCP Server with FastMCP |
| 2.3 | Multi-Tool Agent — LangGraph + MCP + Web Search (Pattern B: Single Agent) |
Key Insight: LangGraph solves state and cycles, MCP solves tool standardization — the combination is the current state of the art.
Core Question: When do multiple specialized agents pay off — and when is one enough?
| Block | Topic |
|---|---|
| 3.1 | Supervisor Pattern (Pattern C) — SRE Incident Agent with Specialized Sub-Agents |
| 3.2 | Handoff and Evaluation — Supervisor vs. Handoff vs. Single Agent with Real Numbers |
Key Insight: Multi-agent is not automatically better. Evaluation with task success, cost, and steps shows when the added complexity is worth it.
Core Question: "Works on my laptop" is not a production standard. What is missing to truly bring an agent to production?
| Block | Topic |
|---|---|
| 4.1 | Observability — LangSmith Tracing, Token Cost, Step Distribution |
| 4.2 | Evaluation — Test Sets, LLM-as-Judge, CI/CD Integration |
| 4.3 | Guardrails — Budget Limiter, Prompt Injection Detection, Circuit Breaker |
| 4.4 | Deployment — FastAPI REST Service with Docker Compose |
Key Insight: The gap between notebook and production is even larger for agents than for ML models — but many MLOps patterns transfer directly.
Core Question: How do you apply what you have learned to a real MLOps scenario — and run the agent fully autonomously in the cluster?
| Block | Topic |
|---|---|
| 5.1 | MCP Server — FastMCP Server for MLflow (Tracking + Model Registry) |
| 5.2 | ReAct Agent — LangGraph Agent with ToolNode and MemorySaver |
| 5.3 | Kubernetes Deployment — FastAPI in the Cluster, Persistent Sessions |
| 5.4 | Tracing — Self-hosted Langfuse, Callback Pattern, ArgoCD (GitOps) |
| 5.5 | Self-Hosted LLM — vLLM with Qwen 2.5 7B, Fully Cluster-Internal |
Key Insight: Switching from Claude to a self-hosted LLM requires 5 lines of code changes. The real work lies in the GPU infrastructure.
agentic-ai/
├── phase-1-fundamentals/
│ ├── block-1-agent-anatomy/
│ ├── block-2-tool-use/
│ └── block-3-react-agent/
│
├── phase-2-frameworks/
│ ├── block-1-langgraph-basics/
│ ├── block-2-mcp-server/
│ └── block-3-multi-tool-agent/
│
├── phase-3-multi-agent/
│ ├── block-1-supervisor-agent/
│ └── block-2-handoff-evaluation/
│
├── phase-4-production/
│ ├── block-1-observability/
│ ├── block-2-evaluation/
│ ├── block-3-guardrails/
│ └── block-4-deployment/
│
└── phase-5-agentic-mlops/
├── block-1-mlflow-mcp-server/
├── block-2-mlflow-agent/
├── block-3-deployment/
├── block-4-tracing/
└── block-5-self-hosted-llm/
Each block contains its own README with setup instructions, code explanation, and lessons learned.
| Area | Tools |
|---|---|
| LLM | Anthropic Claude, Qwen 2.5 7B (vLLM) |
| Orchestration | LangGraph (StateGraph, ToolNode, Checkpointing) |
| Tool Integration | MCP (Model Context Protocol), FastMCP |
| Tracing | LangSmith, Langfuse (self-hosted) |
| Evaluation | LangSmith Eval, LLM-as-Judge |
| ML Platform | MLflow (Tracking + Model Registry) |
| Deployment | Docker, Kubernetes, ArgoCD (GitOps) |
| API | FastAPI |
- Anthropic: Building Effective Agents
- Lilian Weng: LLM Powered Autonomous Agents
- LangGraph Documentation
- MCP Specification
- vLLM Documentation
- Langfuse
MIT