A curated roadmap and resource collection for leveling up from AI Engineer to AI Systems Architect — with 13 production-ready portfolio projects to build along the way.
Roadmap • Projects • Learning • Frameworks • MLOps • System Design • Books • Contributing
The AI engineering landscape is evolving fast. New frameworks, models, and patterns emerge weekly. This repo cuts through the noise with:
- A clear career progression from junior AI engineer to systems architect
- 13 hands-on projects covering LLMs, agents, RAG, multi-agent systems, and protocols
- Curated resources — only the best, most relevant material for 2025-2026
- Enterprise focus — production patterns, not just tutorials
┌─────────────────────────────────────────────────────────────────────┐
│ AI ENGINEER → SYSTEMS ARCHITECT │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ LEVEL 1: AI Engineer │
│ ├── LLM fundamentals (tokenization, generation, fine-tuning) │
│ ├── Prompt engineering & evaluation │
│ ├── RAG pipelines & vector databases │
│ ├── API design for AI services │
│ └── Projects: #1 #2 #5 #7 │
│ │
│ LEVEL 2: Senior AI Engineer │
│ ├── Agent architectures (ReAct, Plan-and-Execute) │
│ ├── Multi-agent orchestration patterns │
│ ├── Agentic frameworks (LangGraph, CrewAI, ADK) │
│ ├── Streaming & real-time AI systems │
│ └── Projects: #3 #4 #6 #8 │
│ │
│ LEVEL 3: AI Platform Engineer │
│ ├── MLOps & model serving (vLLM, TensorRT, Triton) │
│ ├── Kubernetes for AI workloads │
│ ├── Observability & evaluation frameworks │
│ ├── Protocol design (MCP, A2A, UCP) │
│ └── Projects: #8 #9 #10 │
│ │
│ LEVEL 4: AI Systems Architect │
│ ├── Enterprise agent framework patterns │
│ ├── Multi-agent system design at scale │
│ ├── Compliance, security & governance for AI │
│ ├── Cross-framework interoperability │
│ └── Projects: #11 #12 #13 │
│ │
└─────────────────────────────────────────────────────────────────────┘
13 production-ready projects, each in its own repo with Docker setup, tests, CI/CD, and Kubernetes manifests. Clone any project and run it in under 2 minutes.
# Clone any project
git clone https://github.com/samuelvinay91/<project-name>.git
cd <project-name>
# Option 1: Docker (recommended)
docker compose up --build
# Option 2: Local
pip install -e ".[dev]"
uvicorn src.<package>.main:app --reload --port <port>
# Run tests
pytest tests/ -v| Resource | Description | Level |
|---|---|---|
| Neural Networks: Zero to Hero | Andrej Karpathy's legendary series — build GPT from scratch | Beginner → Intermediate |
| fast.ai Practical Deep Learning | Top-down practical approach to deep learning | Beginner |
| Stanford CS224N: NLP with Deep Learning | Comprehensive NLP & transformer theory | Intermediate |
| Stanford CS229: Machine Learning | Andrew Ng's foundational ML course | Beginner → Intermediate |
| DeepLearning.AI Short Courses | Bite-sized courses on LangChain, RAG, fine-tuning, agents | All levels |
| Hugging Face NLP Course | Hands-on transformers & NLP with the HF ecosystem | Beginner → Intermediate |
| Full Stack LLM Bootcamp | Production LLM application development | Intermediate |
| LLM University by Cohere | Comprehensive LLM concepts and applications | Beginner → Intermediate |
| Resource | Description |
|---|---|
| Anthropic Prompt Engineering Guide | Official Claude prompt engineering best practices |
| OpenAI Prompt Engineering Guide | Official GPT prompt engineering techniques |
| Prompt Engineering Guide | Community-driven comprehensive guide to prompting techniques |
| DAIR.AI Prompt Engineering | Research-backed prompting patterns and techniques |
| Framework | Best For | Maintainer | Link |
|---|---|---|---|
| LangGraph | Complex stateful agent workflows with cycles | LangChain | Docs |
| CrewAI | Role-based multi-agent collaboration | CrewAI | Docs |
| Google ADK | Agent Development Kit with sequential/parallel patterns | Docs | |
| Microsoft Agent Framework | Enterprise agents with middleware pipelines | Microsoft | GitHub |
| OpenAI Agents SDK | Simple agent loops with handoffs | OpenAI | Docs |
| LlamaIndex | Data-centric RAG and agent workflows | LlamaIndex | Docs |
| AutoGen | Multi-agent conversation patterns | Microsoft | Docs |
| Semantic Kernel | Enterprise AI orchestration for .NET/Python/Java | Microsoft | Docs |
| Protocol | Purpose | Link |
|---|---|---|
| MCP (Model Context Protocol) | Standardized tool/resource access for LLMs | Spec |
| A2A (Agent-to-Agent) | Inter-agent communication protocol by Google | Spec |
| UCP (Universal Commerce Protocol) | Agentic commerce transactions | Spec |
| OpenAI Function Calling | Structured tool use for LLMs | Docs |
| Tool | Purpose | Link |
|---|---|---|
| LangSmith | LLM observability, tracing, and evaluation | Docs |
| Weights & Biases Weave | AI application tracing and evaluation | Docs |
| Arize Phoenix | Open-source LLM observability | GitHub |
| Braintrust | Eval, logging, and prompt playground | Docs |
| Tool | Purpose | Link |
|---|---|---|
| vLLM | High-throughput LLM serving with PagedAttention | GitHub |
| TensorRT-LLM | NVIDIA optimized LLM inference | GitHub |
| Triton Inference Server | Multi-framework model serving at scale | GitHub |
| Ollama | Run LLMs locally with one command | GitHub |
| llama.cpp | CPU/GPU inference for LLMs in C++ | GitHub |
| SGLang | Fast serving framework for LLMs and VLMs | GitHub |
| Tool | Purpose | Link |
|---|---|---|
| MLflow | Experiment tracking, model registry, deployment | Docs |
| Weights & Biases | Experiment tracking, dataset versioning | Docs |
| Ray | Distributed compute for ML training and serving | Docs |
| DVC | Data version control and ML pipelines | Docs |
| Kubeflow | ML workflows on Kubernetes | Docs |
| Database | Strengths | Link |
|---|---|---|
| Qdrant | Rust-based, fast, rich filtering | Docs |
| Pinecone | Fully managed, serverless option | Docs |
| Weaviate | Multi-modal, GraphQL API | Docs |
| ChromaDB | Simple, lightweight, great for prototyping | Docs |
| Milvus | Scalable, GPU-accelerated similarity search | Docs |
| pgvector | PostgreSQL extension — use your existing DB | GitHub |
| Tool | Purpose | Link |
|---|---|---|
| Hugging Face TRL | RLHF, DPO, SFT training | Docs |
| Unsloth | 2-5x faster fine-tuning with 80% less memory | GitHub |
| Axolotl | Streamlined fine-tuning with YAML configs | GitHub |
| PEFT | Parameter-efficient fine-tuning (LoRA, QLoRA) | Docs |
| LitGPT | Pretrain, fine-tune, deploy LLMs | GitHub |
| Pattern | When to Use | Project Example |
|---|---|---|
| RAG (Retrieval-Augmented Generation) | Ground LLM responses in external data | Agent RAG (#7) |
| Agentic RAG | Dynamic retrieval with query routing | Agent RAG (#7) |
| ReAct (Reason + Act) | Tool-using agents with reasoning traces | Ask-the-Web (#3) |
| Supervisor Pattern | Central agent delegates to specialists | Capstone (#6) |
| Sequential Pipeline | Ordered multi-step processing | Incident Response (#12) |
| Parallel Fan-Out | Independent tasks run concurrently | Incident Response (#12) |
| Human-in-the-Loop | Approval gates for high-stakes decisions | Compliance Audit (#11) |
| Event-Driven Flow | Conditional routing based on intermediate results | Contract Lifecycle (#13) |
| Middleware Pipeline | Cross-cutting concerns (auth, logging, PII redaction) | Compliance Audit (#11) |
| State Machine | Complex multi-step transactions | UCP Merchant (#9) |
| Resource | Description | Link |
|---|---|---|
| Chip Huyen — AI Engineering | Definitive guide to building AI applications | Book |
| Designing Machine Learning Systems | ML system design end-to-end (Chip Huyen) | Book |
| Eugene Yan — Applied ML | Practical patterns for production ML | Blog |
| The AI Engineer's Handbook | Architecture patterns for LLM applications | Newsletter |
| System Design for ML | Interview prep meets real-world ML design | GitHub |
| Tool | Purpose | Link |
|---|---|---|
| RAGAS | RAG evaluation framework | Docs |
| DeepEval | LLM evaluation with 14+ metrics | GitHub |
| Promptfoo | LLM output testing and red-teaming | Docs |
| Inspect AI | AI safety evaluations by AISI | GitHub |
| Giskard | ML model testing and vulnerability scanning | GitHub |
| Provider | Key Services | Link |
|---|---|---|
| AWS | Bedrock (managed LLMs), SageMaker (training/serving), Inferentia | Docs |
| Google Cloud | Vertex AI, Gemini API, Cloud TPUs, GKE for ML | Docs |
| Azure | Azure OpenAI Service, Azure ML, AKS for AI | Docs |
| Resource | Description | Link |
|---|---|---|
| KubeFlow | End-to-end ML platform on Kubernetes | Docs |
| KServe | Standardized ML model serving on K8s | Docs |
| NVIDIA GPU Operator | Automated GPU management in K8s | Docs |
| Ray on Kubernetes | Distributed ML compute on K8s with KubeRay | Docs |
| Kustomize | Template-free K8s configuration management | Docs |
| Certification | Provider | Focus |
|---|---|---|
| AWS Machine Learning Specialty | AWS | ML on AWS — SageMaker, data engineering, modeling |
| Google Professional ML Engineer | GCP | ML systems design on Google Cloud |
| Azure AI Engineer Associate | Azure | AI solution design on Azure |
| Terraform Associate | HashiCorp | Infrastructure as Code for AI infrastructure |
| CKA (Certified Kubernetes Admin) | CNCF | Kubernetes administration for AI workloads |
| Book | Author | Why It Matters |
|---|---|---|
| AI Engineering | Chip Huyen | The book for building LLM applications in production (2025) |
| Designing Machine Learning Systems | Chip Huyen | End-to-end ML system design — the architect's bible |
| Build a Large Language Model (From Scratch) | Sebastian Raschka | Deep understanding of transformer internals |
| Hands-On Large Language Models | Jay Alammar, Maarten Grootendorst | Practical LLM patterns with code |
| Natural Language Processing with Transformers | Lewis Tunstall et al. | HuggingFace-centric NLP from the team that built it |
| Book | Author | Why It Matters |
|---|---|---|
| Generative Deep Learning (2nd Ed) | David Foster | Comprehensive generative AI — VAEs, GANs, diffusion, transformers |
| Deep Learning | Ian Goodfellow et al. | The foundational theory reference |
| Designing Data-Intensive Applications | Martin Kleppmann | System design fundamentals — essential for AI architects |
| Building Microservices (2nd Ed) | Sam Newman | Service architecture patterns used in AI platforms |
| The Staff Engineer's Path | Tanya Reilly | Leadership and influence for senior technical roles |
| Blog | Author | Focus |
|---|---|---|
| Lil'Log | Lilian Weng (OpenAI) | Deep technical surveys on AI topics |
| Chip Huyen's Blog | Chip Huyen | ML systems, AI engineering, industry trends |
| Eugene Yan | Eugene Yan (Amazon) | Applied ML, RecSys, production patterns |
| Jay Alammar | Jay Alammar | Visual explanations of transformers and LLMs |
| Sebastian Raschka | Sebastian Raschka | LLM research, fine-tuning, practical AI |
| Simon Willison | Simon Willison | LLM tools, prompt engineering, practical AI |
| Hamel Husain | Hamel Husain | MLOps, LLM evaluation, practical engineering |
| Newsletter | Description | Link |
|---|---|---|
| The Batch | Andrew Ng's weekly AI news digest | Subscribe |
| Ahead of AI | Sebastian Raschka's research roundup | Subscribe |
| The AI Engineer | Swyx's newsletter on AI engineering | Subscribe |
| Interconnects | Nathan Lambert on RLHF, alignment, and LLMs | Subscribe |
| AI Tidbits | Sahar Mor's weekly AI news | Subscribe |
| Podcast | Description | Link |
|---|---|---|
| Latent Space | The AI engineer podcast — deep technical interviews | Listen |
| Practical AI | Real-world AI/ML applications and tools | Listen |
| TWIML AI | This Week in Machine Learning & AI | Listen |
| Gradient Dissent | Weights & Biases podcast on ML engineering | Listen |
| Lex Fridman Podcast | Long-form interviews with AI researchers | Listen |
| Model | Developer | Strengths | Link |
|---|---|---|---|
| Llama 3/4 | Meta | Best open-weight general-purpose LLMs | HuggingFace |
| Mistral / Mixtral | Mistral AI | Excellent efficiency, MoE architecture | HuggingFace |
| Gemma 2/3 | Strong small models (2B-27B) | HuggingFace | |
| Qwen 2.5/3 | Alibaba | Competitive multilingual models | HuggingFace |
| DeepSeek V3/R1 | DeepSeek | Strong reasoning, open-weight | HuggingFace |
| Phi-3/4 | Microsoft | Best-in-class small language models | HuggingFace |
| FLUX | Black Forest Labs | State-of-the-art open image generation | HuggingFace |
| Resource | Purpose | Link |
|---|---|---|
| Hugging Face Hub | Largest open model & dataset repository | Hub |
| MMLU / MMLU-Pro | Massive multitask language understanding benchmark | Paper |
| HumanEval / SWE-bench | Code generation benchmarks | GitHub |
| LMSYS Chatbot Arena | Crowdsourced LLM comparison leaderboard | Leaderboard |
| Open LLM Leaderboard | HuggingFace's open model rankings | Leaderboard |
| Resource | Description | Link |
|---|---|---|
| OWASP Top 10 for LLMs | Security risks in LLM applications | Docs |
| NIST AI Risk Management Framework | Government framework for AI risk management | Docs |
| Anthropic Research | AI safety research and responsible scaling | Blog |
| EU AI Act | European regulation for AI systems | Overview |
| AI Alignment Forum | Technical AI safety research discussion | Forum |
| Community | Platform | Link |
|---|---|---|
| Hugging Face | Discord + Forums | Join |
| LangChain | Discord | Join |
| MLOps Community | Slack | Join |
| r/MachineLearning | Visit | |
| r/LocalLLaMA | Visit | |
| Latent Space | Discord | Join |
| Conference | Focus | Link |
|---|---|---|
| NeurIPS | Top ML research conference | Site |
| ICML | International Conference on ML | Site |
| AI Engineer Summit | Applied AI engineering | Site |
| KubeCon | Cloud-native + AI infrastructure | Site |
| Tool | Purpose | Link |
|---|---|---|
| Claude Code | AI coding assistant in the terminal | Docs |
| Cursor | AI-first code editor | Site |
| Continue | Open-source AI code assistant | GitHub |
| uv | Fast Python package manager (10-100x faster than pip) | GitHub |
| Ruff | Extremely fast Python linter and formatter | GitHub |
| Docker | Containerization for reproducible AI environments | Docs |
For those who want a structured approach:
| Week | Focus | Projects | Resources |
|---|---|---|---|
| 1-2 | LLM Fundamentals | #1 LLM Playground | Karpathy's Zero to Hero, HF NLP Course |
| 3-4 | Prompt Engineering & Fine-Tuning | #2 Customer Support Chatbot | Anthropic/OpenAI prompt guides, PEFT docs |
| 5-6 | Agents & Tool Use | #3 Ask-the-Web, #4 Deep Research | LangGraph docs, DeepLearning.AI courses |
| 7-8 | RAG & Multi-Agent Systems | #6 Capstone, #7 Agent RAG | LlamaIndex docs, RAG survey papers |
| 9 | Protocols & Interop | #8 MCP & A2A | MCP spec, A2A spec |
| 10 | Commerce & Real-World AI | #9 UCP Merchant, #10 Shopping Agent | UCP spec, state machine design |
| 11 | Enterprise Frameworks | #11 Compliance, #12 Incident Response | MS Agent Framework, Google ADK docs |
| 12 | Advanced Patterns & Portfolio | #13 Contract Lifecycle, polish portfolio | CrewAI docs, system design resources |
Contributions welcome! This list is community-maintained.
- Fork this repository
- Add your resource in the appropriate category
- Submit a pull request with a clear description
- Resources must be high-quality and actively maintained
- Prefer free/open-source resources, but paid resources are OK if they're exceptional
- Each entry needs a working link and brief description
- Follow the existing table format
- No duplicates — check existing entries first
See CONTRIBUTING.md for detailed guidelines.
If you find this useful, give it a star! It helps others discover these resources.
This work is licensed under CC0 1.0 Universal. To the extent possible under law, the author has waived all copyright and related rights to this work.
Built with 💡 by samuelvinay91 — from the AI Engineer Portfolio project
