Open-source engineering curricula under one CC BY-SA 4.0 license. HLD and DSA today; more to come.
HLD Handbook: 159 chapters + 22 trade-off pages · 181 pages · ~773,000 words · 719 Mermaid diagrams · 3,100+ citations DSA Handbook: 120 chapters across 15 parts · 37 pattern decision pages · 5 long-form editorials · ~470,000 words · 226 Mermaid diagrams · sibling Python / Java / C++ / Go solutions for 155 LeetCode problems · 46 interactive widget specs
Read free at handbook.academy — landing page links to both books · HLD at hld.handbook.academy · DSA at dsa.handbook.academy (public beta)
Start reading · HLD curriculum · DSA curriculum · Trade-offs · Contributing
- What this is
- Reading the handbooks online
- What's in this repo
- Why this exists
- Who this is for
- How these handbooks compare
- Start reading
- The full HLD curriculum — 12 parts + 22-page Trade-offs Library
- The full DSA curriculum — 15 parts + 37 pattern decision pages + editorials + 46 widgets
- Study plans
- Project statistics
- Quality standards
- Contributing
- Project structure
- Development setup
- Governance
- FAQ
- License
- Citation
- Acknowledgments
- Community
This repository hosts two sibling open-source engineering handbooks under content/, sharing the same CC BY-SA 4.0 license, the same contribution workflow, and the same CI quality bar:
- The HLD Handbook (
content/hld/) — an opinionated, end-to-end textbook on high-level software design, distributed systems, and modern infrastructure. 159 teaching chapters across 12 parts plus a 22-page Trade-offs Library — 181 pages, ~773,000 words, 719 Mermaid diagrams, 3,100+ citations. Equivalent in scope to a ~2,400-page book — longer than Designing Data-Intensive Applications and Alex Xu's System Design Interview volumes 1+2 combined. Covers TCP/IP and the OS contract through to LLM serving, multi-agent orchestration, and post-quantum crypto. - The DSA Handbook (
content/dsa/) — a practice-first data-structures-and-algorithms curriculum aimed at coding interviews and competitive programming. 120 chapters across 15 parts plus 37 pattern decision pages, 5 long-form LC editorials, and 46 interactive widget specs. ~470,000 words, 226 Mermaid diagrams. Each chapter teaches a structure or pattern, then walks the canonical LeetCode problems, with siblingsol.py/sol.java/sol.cpp/sol.gofiles for 155 problems. Python is inlined in the chapter; the other three languages are linked.
More curricula are planned. This repo is structured as an umbrella so additional handbooks (e.g. operating systems, databases, ML systems) can land alongside HLD and DSA over time, sharing the same workflow and license.
Every concept in either handbook is taught inline as a full-length article: introduction, first-principles explanation, diagrams, worked examples, trade-offs, production gotchas, citations to primary sources. No stubs. No "coming soon." No external blog redirects. No bullet outlines that tell you to go somewhere else to actually learn.
Every .md file under content/ also renders natively on GitHub — all 945 Mermaid diagrams display in the GitHub UI, all footnotes work, all cross-references resolve. The websites add search, dark mode, per-chapter diagram zoom, social cards, OG images, the interactive DSA widgets, and fast client-side navigation.
Both handbooks are continuously updated. Every chapter's frontmatter declares date_created and date_updated, so you can see exactly how fresh each page is. We care about numbers being right today, not right in 2022.
Each book has its own subdomain. The umbrella landing page links to both:
| Site | URL | What it serves |
|---|---|---|
| Landing page | handbook.academy | Two cards — pick HLD or DSA. |
| HLD Handbook | hld.handbook.academy | The full HLD curriculum. |
| DSA Handbook | dsa.handbook.academy | The full DSA curriculum, with interactive widgets. |
Each subdomain serves one book at the root — URLs do not nest. HLD chapters live at hld.handbook.academy/curriculum/..., not handbook.academy/hld/.... Same for DSA.
This repository is the canonical content source for both books. PRs to either handbook are welcome.
content/hld/— 159 HLD chapters across 12 parts + 22 trade-off decision pages. The chapter template is intro → first principles → diagrams → worked example → trade-offs → production gotchas → references. See STYLE_GUIDE.md.content/dsa/— 120 DSA chapters across 15 parts. The chapter template is practice-first: cheat-sheet table, deep dive on the structure or pattern, prompt cards for representative LeetCode problems, and<details>solution + common-mistakes blocks. Code samples live as sibling files (sol.py,sol.java,sol.cpp,sol.go) under each chapter's directory. See STYLE_GUIDE.md § DSA-specific deviations andcontent/dsa/part-1-linear-data-structures/00-arrays.mdas the canonical example.content/dsa/editorials/— long-form LeetCode editorials covering hard problems with multiple approaches.content/dsa/patterns/— pattern decision references that compare interchangeable approaches (e.g. recursion vs iteration, BFS vs DFS, sliding window vs prefix sum).content/dsa/widgets/— 46 YAML specs for the interactive widgets that render on the DSA website. On GitHub these are callouts that link to the spec; on the website the widget renders.content/dsa/_problem-registry.yml— canonicalLC-NNNID registry; the website source-of-truth for every LeetCode problem the curriculum touches.
The open-source curricula for system design and DSA have a shape, and that shape is frustrating:
- Link dumps — curated lists that point at dozens of scattered blog posts, LeetCode discuss threads, and talks. Great for discovery, useless as a learning path. You end up with 60 tabs open and no through-line.
- Teaser-and-redirect repos — READMEs with a few hundred words on each topic that nudge you toward a paid course hosted elsewhere. The GitHub repo is the marketing page; the actual teaching is behind a $150-$400+/year paywall.
- Monolithic READMEs frozen in time — a single 5,000-line
README.mdthat was great in 2021 but hasn't kept up with the modern stack. No LLM serving, no CRDTs, no post-quantum crypto, no FinOps. And nobody wants to review a PR against a 5,000-line file. - Surface-level outlines — bullet-point summaries that tell you what exists without explaining why you'd pick one approach over another. You learn the vocabulary without the judgment.
- Interview-only prep — focused on passing a specific 45-minute screen, not on actually operating systems at scale or actually understanding the algorithm.
These handbooks fix all of that:
- 100% inline content. Every chapter is a full teaching article written from scratch, living in this repo as plain Markdown. Nothing is a stub. Nothing redirects you elsewhere to learn.
- Progressive curricula. HLD has 159 teaching chapters across 12 parts plus a 22-page Trade-offs Library, sequenced from prerequisites (Part 0) to Staff+ topics. DSA has 120 chapters across 15 parts, from arrays to bitmask DP, with each part introducing a tighter pattern family. Each chapter declares its prerequisites, learning objectives, estimated reading time, and difficulty tier.
- Research-backed. Every HLD chapter ends with a Further Reading & References section citing primary sources — SIGMOD and OSDI papers, RFCs, IETF drafts, engineering postmortems, official docs, canonical books. 3,100+ citations across HLD. DSA chapters cite the original papers behind each algorithm where they exist (Floyd, Tarjan, Knuth-Morris-Pratt, Aho-Corasick).
- Opinionated. Every topic picks a recommended approach and explains why. Where reasonable people disagree, the trade-offs are made explicit — HLD has a dedicated 22-page Trade-offs Library; DSA has 37 pattern decision pages that compare interchangeable approaches (recursion vs iteration, BFS vs DFS, sliding window vs prefix sum, and so on).
- Modern (2025+). HLD covers LLM serving, RAG pipelines, AI agents, multi-agent orchestration, CRDTs, edge computing, FinOps, post-quantum cryptography, platform engineering, local-first software, differential privacy, MCP. DSA covers the modern competitive-programming toolkit: monotonic stacks/deques, Morris traversal, suffix arrays, Aho-Corasick, bitmask DP, randomised algorithms.
- Practice-first for DSA. Every DSA chapter ships with sibling code samples in Python, Java, C++, and Go for the canonical LeetCode problems it teaches. Python is inlined in the chapter; the other three languages are one click away. 155 LeetCode problems are covered in this format.
- Interactive on the website. The DSA book's 46 widget specs render as live, animated visualisations on
dsa.handbook.academy(sliding windows that you can drag, monotonic stacks that animate the pop sequence, etc.). On GitHub the widget callouts link to the YAML spec.
These handbooks are written for:
- SDE1 engineers preparing for SDE2 interviews. Read DSA Parts 0-9 for coding-round fluency, then HLD Part 0 + Part 1 + Part 11, plus 8-10 targeted case studies from HLD Part 8, plus the top-5 most-cited trade-off pages.
- SDE2s preparing for Senior/Staff. The full HLD Parts 3-7 are the core "you need to know this to design anything meaningful" material. Parts 6 (Reliability) and 7 (Security) separate Senior candidates from Staff candidates. DSA Part 14 covers narration and trade-off articulation in the algorithmic round.
- Senior/Staff engineers refreshing or filling gaps. HLD Part 9 (AI/ML systems) and the Trade-offs Library are valuable even if you've been doing this for 15 years. Modern LLM serving and vector search weren't a thing when most of us learned backend.
- Self-taught engineers without a CS degree who want the vocabulary and the reasoning that bootcamps and YouTube don't teach. HLD Part 0 is explicit prerequisites: TCP/IP, the OS contract, database internals, API design. DSA Part 0 is the foundations — Big-O, recursion, bit manipulation — before pattern-matching becomes useful.
- Career switchers moving from adjacent roles (frontend → backend, backend → infra, SWE → MLE) who need to build system-level intuition fast.
- Competitive programmers and ICPC/Codeforces contestants who want a structured reference for the algorithmic toolkit (KMP, Aho-Corasick, suffix arrays, segment trees, Dijkstra, Bellman-Ford, DP variants).
- Teachers and course creators who want to build curriculum without writing thousands of pages from scratch. The CC BY-SA 4.0 license explicitly allows this.
- Anyone operating a production system at non-trivial scale who wants a reference shelf that covers the full stack — from packet-level networking to multi-tenant SaaS isolation models.
These handbooks are NOT for:
- Absolute beginners with no programming experience. HLD Part 0 assumes you can read code and have built at least one CRUD app. DSA Part 0 assumes you can write a
forloop and a recursive function. If you're brand new, start with The Odin Project or CS50, then come back. - People who want low-level language-specific tutorials. We don't teach Rust syntax or Go's goroutines — we teach concepts that apply across languages.
| This repo (HLD + DSA) | Typical OSS system-design repos | Typical OSS DSA repos | Paid courses ($150-$400+/yr) | O'Reilly-style books | |
|---|---|---|---|---|---|
| Free and open-source | Yes | Yes | Yes | No | No |
| 100% inline content (no external redirects) | Yes | No | No | Yes | Yes |
| Both HLD + DSA in one curriculum | Yes | No | No | Rarely | No |
| 150+ HLD chapters, 100+ DSA chapters | Yes | No | No | Only their sliver | Usually one topic |
| 56 end-to-end HLD case studies | Yes | 5-15 typical | — | 15-25 typical | Varies |
| Sibling code in Python / Java / C++ / Go | Yes | — | Rarely all 4 | Sometimes | Rarely |
| Interactive widgets on the DSA site | Yes | — | No | Sometimes | No |
| Dedicated Trade-offs Library (22 pages) | Yes | No | — | No | No |
| 37 DSA pattern decision pages | Yes | — | No | No | No |
| Covers LLMs, RAG, agents, multimodal AI | Yes | Rarely modern | — | Sometimes | Rarely |
| Opinionated decisions, not just options | Yes | No | No | Sometimes | Varies |
| 3,100+ primary-source citations | Yes | No | — | No | Yes |
| Community-editable, PRs welcome | Yes | Yes | Yes | No | No |
| Updated continuously | Yes | Often frozen | Often frozen | Varies | Every 3-5 years |
For the best reading experience, visit hld.handbook.academy for HLD or dsa.handbook.academy for DSA — free, no sign-up, full search, dark mode, per-chapter diagram zoom, and the live DSA widgets. The links below open the source on GitHub, where Mermaid diagrams also render natively.
Pick one:
I want a quick HLD taste. Read these three in order — they give you the vocabulary and the first real worked example:
- Scalability: Growing a System Without Breaking It
- Back-of-the-Envelope Estimation
- Design a URL Shortener (TinyURL / bit.ly)
I want a quick DSA taste. Read these three to see the chapter shape and the sibling-code workflow:
- Arrays: static, dynamic, multi-dimensional
- Two pointers: opposite ends
- Sliding window: variable size
I'm preparing for an SDE2 interview in the next 6 weeks. Follow the SDE1 → SDE2 study plan.
I'm a Senior engineer refreshing my distributed-systems foundations. Read HLD Part 3 — Distributed Systems Theory straight through.
I want to learn AI-systems design. Read HLD Part 9 — AI & ML System Design + the 8 AI case studies in Part 8 (chapters 30-37).
I want to grind interview algorithms. Open the DSA curriculum. Read Parts 0-2 in order, then jump into the pattern parts (3-9) as the problems you encounter call for them.
I want to read both books cover-to-cover. Start at HLD Part 0 for the systems track and DSA Part 0 for the algorithms track. At one 25-minute chapter per day, the whole repo is roughly a year of reading.
181 pages across 12 parts + a 22-page Trade-offs Library. Each part below is collapsed by default — click to expand the chapter list. Linking to a specific part from elsewhere in the README auto-expands it on GitHub.
| # | Part | Chapters | Difficulty | Reading time |
|---|---|---|---|---|
| 0 | Prerequisites | 5 | Beginner | ~3 hrs |
| 1 | Core Fundamentals | 7 | Beginner-Intermediate | ~4 hrs |
| 2 | Building Blocks | 16 | Intermediate | ~10 hrs |
| 3 | Distributed Systems Theory | 11 | Intermediate-Advanced | ~9 hrs |
| 4 | Data Systems | 10 | Intermediate-Advanced | ~8 hrs |
| 5 | Architecture Patterns | 11 | Intermediate | ~8 hrs |
| 6 | Reliability & Operations | 11 | Intermediate-Advanced | ~8 hrs |
| 7 | Security at Scale | 10 | Intermediate-Advanced | ~7 hrs |
| 8 | Case Studies | 56 | Intermediate-Advanced | ~45 hrs |
| 9 | AI & ML System Design | 15 | Intermediate-Advanced | ~11 hrs |
| 10 | Emerging Patterns | 1 | Intermediate-Advanced | ~45 min |
| 11 | Interview Framework | 6 | Intermediate | ~4 hrs |
| T | Trade-offs Library | 22 | Intermediate | ~8 hrs |
Part 0 — Prerequisites (5 chapters) — networking, OS, data structures, databases, API design
Audience: engineers without a CS degree, or anyone who wants to confirm their foundations before moving on. Difficulty: Beginner. Total reading time: ~3 hours.
Foundational topics that the rest of the handbook assumes. If you can explain TCP's three-way handshake, the difference between a process and a thread, a B-tree's internal structure, and why idempotent PUT is better than non-idempotent POST for a retry-prone endpoint — you can skip this part.
- Networking Fundamentals for System Design — OSI and TCP/IP layers, TCP vs UDP, HTTP/1.1 vs HTTP/2 vs HTTP/3, DNS resolution, TLS handshake, what "the network is unreliable" really means in practice.
- Operating System Essentials for System Design — Processes vs threads, context switching cost, virtual memory, page cache, filesystem I/O, epoll/kqueue/IOCP, why
O_DIRECTmatters for databases. - Data Structures for Distributed Systems — Hash tables, B-trees, LSM-trees, skip lists, Bloom filters, tries, HyperLogLog, Count-Min Sketch, and when each shows up in real systems.
- Database Fundamentals for System Design — Transactions, isolation levels (read-committed to serializable), indexes, query planners, joins, and why your ORM hides things from you that you need to see.
- API Design Basics: REST, GraphQL, gRPC, and the Hard Parts — Resource modeling, idempotency, versioning, pagination, rate-limit headers, error envelopes, HATEOAS in theory vs practice.
Part 1 — Core Fundamentals (7 chapters) — scalability, latency, availability, consistency, estimation, trade-off thinking
Audience: everybody — read this part even if you know the topic, because the vocabulary here is used for the rest of the book. Difficulty: Beginner-Intermediate. Total reading time: ~4 hours.
The vocabulary and the reasoning habits that every later chapter assumes. "Scalability," "consistency," "trade-off," and "back-of-envelope" get defined here rigorously so they mean something specific when we use them later.
Part 2 — Building Blocks (16 chapters) — load balancers, caches, queues, CDNs, databases, sharding, pub/sub, rate limiting, blob storage, geo, edge
Audience: anyone who'll assemble backend systems. This part is the Lego brick inventory. Difficulty: Intermediate. Total reading time: ~10 hours.
Deep dives on the pieces you assemble to build real systems: load balancers, caches, queues, CDNs, databases (SQL and NoSQL), partitioning, replication, pub/sub, rate limiters, service discovery, blob storage, geospatial indexes, and edge compute.
- Load Balancers: Spreading Traffic, Absorbing Failure
- Reverse Proxies and API Gateways: The Smart Edge
- Content Delivery Networks: Moving Bytes Closer to Users
- Caching: From Browser to Database
- SQL Databases: The Boring Technology That Wins
- NoSQL Databases: Picking the Right Non-Relational Tool
- Database Partitioning and Sharding: When One Node Is Not Enough
- Database Replication: Keeping Copies in Sync
- Message Queues and Streaming: Decoupling at Scale
- Pub/Sub: Fan-Out and Event-Driven Systems
- Real-Time Communication: WebSockets, SSE, and Long Polling
- Rate Limiting: Protecting Systems from Themselves
- Service Discovery and Service Mesh: Finding and Talking to Services
- Blob and Object Storage: Storing the Big Stuff
- Geospatial Indexing: Geohash, Quadtree, R-tree, S2, and H3
- Edge Computing (Cloudflare Workers, Lambda@Edge, Deno Deploy)
Part 3 — Distributed Systems Theory (11 chapters) — consensus, consistency, clocks, CRDTs, transactions
Audience: SDE2+ preparing for Senior/Staff. If you haven't internalized linearizability vs serializability, read this. Difficulty: Intermediate-Advanced. Total reading time: ~9 hours.
The theory that makes distributed systems distributed: consensus (Raft/Paxos), the full consistency spectrum, CAP/PACELC in 2025 framing, logical clocks, CRDTs, distributed transactions (2PC/Saga), exactly-once delivery, failure detection, consistent hashing, and Merkle-tree anti-entropy.
- Consensus Protocols: How Distributed Systems Agree
- Consistency Deep Dive: Linearizability, Serializability, and the Spectrum Between
- Quorums and Replication: The Math of R + W > N
- CAP and PACELC: The Tradeoff That Keeps Confusing People
- Clocks and Ordering: Lamport, Vector, and Hybrid Logical Clocks
- CRDTs: Conflict-Free Replicated Data Types
- Distributed Transactions: 2PC, Saga, and When to Avoid Both
- Idempotency and Exactly-Once: The Honest Truth About Delivery Guarantees
- Failure Detection: Deciding a Node Is Dead
- Consistent Hashing: Keys to Nodes Without Global Reshuffles
- Merkle Trees and Anti-Entropy: Keeping Replicas in Sync Cheaply
Part 4 — Data Systems (10 chapters) — storage engines, OLTP/OLAP, warehouses/lakes, streams, search, time-series, graph, vector, KV
Audience: anyone who owns a data pipeline or picks a database. Difficulty: Intermediate-Advanced. Total reading time: ~8 hours.
Every flavor of data system you might pick, and when each actually fits. Storage engines (B-tree vs LSM), OLTP vs OLAP, warehouses/lakes/lakehouses, streaming vs batch, CDC, search, time-series, graph, vector, and key-value.
- Storage Engines: B-Trees, LSM-Trees, and Why Your Database Feels the Way It Does
- OLTP vs OLAP: Row Stores, Column Stores, and Matching Shape to Workload
- Data Warehouses and Data Lakes: Structure, Schema, and the Lakehouse
- Stream vs Batch Processing: Lambda, Kappa, and the End of That Debate
- Change Data Capture: Streaming the Database's Inner Monologue
- Search Systems: Inverted Indexes, BM25, and Running Elasticsearch in Production
- Time-Series Databases: Metrics, Events, and Retention at Scale
- Graph Databases: Property Graphs, Cypher, and When Joins Are the Problem
- Vector Databases: Embeddings, ANN Indexes, and the Retrieval Layer for AI
- Key-Value Stores: Redis, Memcached, DynamoDB, and Picking the Right Hash Table
Part 5 — Architecture Patterns (11 chapters) — monolith vs micro, event-driven, CQRS, ES, serverless, BFF, strangler, hex, multi-region, multi-tenant, CRDT apps
Audience: engineers making architecture-level decisions or leading service migrations. Difficulty: Intermediate. Total reading time: ~8 hours.
The architectural shapes you choose between when you design anything bigger than a single service: monolith vs microservices, event-driven, CQRS, event sourcing, serverless, BFF, strangler fig, hexagonal/clean, multi-region, multi-tenancy, CRDT-based apps.
- Monolith vs Microservices: Team Topology, Conway's Law, and the Distributed System Tax
- Event-Driven Architecture: Notifications, State Transfer, and Choreography
- CQRS: Separating Reads from Writes Without Losing Your Mind
- Event Sourcing: Events as the Source of Truth
- Serverless: Functions, Cold Starts, and When FaaS Actually Saves Money
- Backend for Frontend: Per-Client API Aggregation Done Right
- Strangler Fig: Incremental Migration Without a Big Bang
- Hexagonal and Clean Architecture: Keeping Business Logic Independent
- Multi-Region Architecture: Active-Passive, Active-Active, and CRDTs
- Multi-Tenancy: Silo, Pool, and the SaaS Isolation Spectrum
- CRDT Applications (Yjs, Automerge, Local-First Software)
Part 6 — Reliability & Operations (11 chapters) — observability, SLOs, resilience, scaling, deploys, chaos, incidents, FinOps, platform
Audience: anyone on-call, anyone who signs SLAs, anyone paying a cloud bill. Difficulty: Intermediate-Advanced. Total reading time: ~8 hours.
The engineering that separates "it compiles" from "it runs reliably at 3 a.m. on a long weekend": observability, SLOs, resilience patterns, auto-scaling, deployments, chaos engineering, incident response, health checks, FinOps, platform engineering.
- Observability: Metrics, Logs, Traces, and the OpenTelemetry Standard
- SLI, SLO, SLA, and Error Budgets: Making Reliability Quantitative
- Resilience Patterns: Timeouts, Retries, Circuit Breakers, and Bulkheads
- Graceful Degradation: When Partial Service Beats No Service
- Auto-Scaling and Capacity Planning: From HPA to Predictive Scaling
- Deployment Strategies: Blue-Green, Canary, Rolling, and Feature Flags
- Chaos Engineering: Breaking Things on Purpose
- Incident Management: From Detection to Blameless Postmortem
- Health Checks and Readiness: Telling the Truth About Whether You're Up
- Cost Optimization and FinOps
- Platform Engineering: IDPs, Golden Paths, and DX
Part 7 — Security at Scale (10 chapters) — AuthN/Z, OAuth/OIDC, JWT, mTLS, secrets, DDoS/WAF, compliance, supply chain, privacy, PQC
Audience: Senior/Staff engineers, platform teams, security-adjacent builders. Difficulty: Intermediate-Advanced. Total reading time: ~7 hours.
Security architecture for real systems, not a CISSP crib sheet: AuthN vs AuthZ, OAuth2/OIDC, JWT (and why you probably shouldn't), mTLS, secrets management, DDoS/WAF, compliance (GDPR/DPDP/CCPA), software supply chain, privacy-preserving systems, post-quantum cryptography.
- Authentication vs Authorization: Identity, Permissions, and Access Models
- OAuth 2.0 and OpenID Connect: Delegated Authorization and Identity Done Right
- JWT Deep Dive: Signed Tokens, Claims, and the Revocation Problem
- mTLS and Service-to-Service Authentication: SPIFFE, Service Mesh, and Zero Trust
- Secrets Management: Vault, KMS, and the End of Secrets in Config Files
- DDoS Protection and WAFs: Mitigating Volumetric and Application Attacks
- Data Residency and Compliance Architecture (GDPR, DPDP, CCPA, Right-to-Erasure)
- Supply Chain Security: SBOM, SLSA, Sigstore, and Defending Against xz-utils
- Privacy-Preserving Systems (Differential Privacy, Federated Learning)
- Post-Quantum Cryptography: Migrating to ML-KEM, ML-DSA, and a Crypto-Agile Future
Part 8 — Case Studies (56 chapters) — 56 end-to-end designs, grouped by theme; the centerpiece of interview prep
Audience: interview prep candidates, engineers building adjacent systems, anyone who learns best from worked examples. Difficulty: Intermediate-Advanced. Total reading time: ~45 hours.
56 end-to-end system designs, each following a consistent structure: requirements (functional + non-functional), back-of-envelope numbers, high-level architecture, data model, deep-dive components, scalability and reliability considerations, and real-world references. Grouped thematically below for easier navigation.
Core primitives (chapters 00-03) — the four designs you see in every interview
Messaging & social (chapters 04-09) — notifications, chat, feeds, photos, crawlers, autocomplete
- Design a Notification System (Push, SMS, Email at Scale)
- Design a Chat System (WhatsApp / Messenger / Signal)
- Design a Social Media Feed (Twitter / Instagram / LinkedIn)
- Design a Photo Sharing Service (Instagram)
- Design a Web Crawler (Googlebot-style)
- Design Search Autocomplete (Typeahead Suggestions)
Media & consumer products (chapters 10-17) — video, ride-hailing, maps, file sync, editing, cache, recommenders
- Design a Video Streaming Service (YouTube / Twitch / TikTok)
- Design Netflix (End-to-End)
- Design a Ride-Hailing Service (Uber / Lyft)
- Design Google Maps (Routing and Tile Rendering)
- Design a File Sync Service (Dropbox / Google Drive)
- Design Collaborative Editing (Google Docs / Figma / Notion)
- Design a Distributed Cache (Memcached / Redis Cluster)
- Design a Recommendation System (Netflix / YouTube / TikTok)
Commerce & financial (chapters 18-21) — ticketing, payments, stock exchange, food delivery
Data & infrastructure (chapters 22-29) — metrics, ad-click, logs, proximity, leaderboards, IDs, hotels, schedulers
- Design a Metrics Pipeline (Prometheus / InfluxDB / Thanos)
- Design Ad-Click Aggregation (Real-Time Stream Processing)
- Design a Logging Platform (ELK / Loki / Splunk)
- Design a Proximity Service (Nearby Friends / Yelp)
- Design a Real-Time Leaderboard
- Design a Unique ID Generator (Snowflake, ULID, TSID, UUIDv7)
- Design a Hotel Reservation System (Booking.com / Airbnb)
- Design a Distributed Job Scheduler (Airflow / Temporal / Distributed Cron)
AI systems (chapters 30-37) — ChatGPT, RAG, coding agents, AI search, voice, moderation, semantic cache, model routing
- Design ChatGPT (Conversational AI at Scale)
- Design an Enterprise RAG System
- Design a Coding Agent (Claude Code / GitHub Copilot / Cursor)
- Design Perplexity (AI Search with Citations)
- Design a Voice Agent (Alexa / Siri-Class Realtime)
- Design a Content Moderation System at Scale
- Design a Semantic Cache for LLM Applications
- Design a Model Router and Gateway (OpenRouter / LiteLLM)
Infra services (chapters 38-39) — feature flags, DNS
Consumer products II (chapters 40-49) — dating, auctions, SaaS, video conf, email, live comments, fraud, fitness, online judge, price tracking
- Design a Dating App (Tinder / Hinge / Bumble)
- Design an Online Auction (eBay / Catawiki)
- Design a Multi-Tenant SaaS Platform
- Design a Video Conferencing System (Zoom / Google Meet)
- Design an Email Service at Gmail Scale (1.8B Users, 300B Messages/Day)
- Design Live Comments at Scale (FB Live / YouTube Live / Twitch Chat)
- Design a Fraud Detection System (Stripe Radar / PayPal / Feedzai)
- Design a Fitness Tracking Service (Strava / MapMyRun)
- Design an Online Judge (LeetCode / Codeforces / HackerEarth)
- Design a Price Tracking Service (CamelCamelCamel / Honey / Keepa)
Developer & ops platforms (chapters 50-55) — API gateway, CI/CD, observability, search engine, brokerage, chat-at-scale
- Design an API Gateway at Scale (Kong / AWS API Gateway / Apigee / Envoy)
- Design a CI/CD Platform (GitHub Actions / GitLab CI / CircleCI)
- Design an Observability Platform (Datadog / New Relic / Honeycomb)
- Design a Search Engine (Google-Scale / Brave Search)
- Design a Brokerage Platform (Robinhood / E*TRADE / Interactive Brokers)
- Design Channel-Scale Chat (Discord / Slack)
Part 9 — AI & ML System Design (15 chapters) — LLM serving, RAG, vector search, agents, LLMOps, safety, recommenders, multimodal
Audience: anyone building with or around LLMs, agents, or production ML. Difficulty: Intermediate-Advanced. Total reading time: ~11 hours.
Modern AI-systems architecture, treated with the same rigor as Part 3: LLM serving, RAG, vector search, agent architectures, multi-agent orchestration, LLM evaluation, LLMOps, cost optimization, safety, ML fundamentals, feature stores, recommenders, real-time AI, multimodal, and the data infra underneath all of it.
- LLM Serving Architecture (vLLM, TGI, TensorRT-LLM)
- RAG Pipelines (Retrieval-Augmented Generation)
- Vector Search at Scale (HNSW, IVF-PQ, DiskANN)
- AI Agent Architectures (ReAct, Reflection, Planning, Tool Use, Memory)
- Multi-Agent Orchestration (LangGraph, OpenAI Agents SDK, AutoGen, Swarm)
- LLM Evaluation and Observability (Ragas, LangSmith, TruLens, LLM-as-Judge)
- LLMOps and Prompt Engineering (Versioning, Guardrails, Red-Teaming)
- LLM Cost Optimisation (Semantic Cache, Model Routing, Cascading, Prompt Caching)
- LLM Safety and Guardrails (OWASP LLM Top 10, Prompt Injection, PII, Jailbreaks)
- ML System Design Fundamentals
- Feature Stores and Model Serving (Feast, Tecton, KServe, BentoML, MLflow)
- Recommendation Systems Deep Dive (DLRM, Two-Tower, Embedding Retrieval, Cold Start)
- Realtime AI and Voice Agents (Streaming Inference, WebRTC, LiveKit, Deepgram)
- Multimodal AI Systems (CLIP, Whisper, LayoutLM, Document AI)
- Data Infrastructure for AI (Embedding Pipelines, Chunking, Unstructured ETL, MCP)
Part 10 — Emerging Patterns (1 chapter) — green/sustainable computing; growing list of forward-looking topics
Audience: Staff+ engineers and architects thinking past 2026. Difficulty: Intermediate-Advanced. Total reading time: ~45 minutes.
Forward-looking topics that are adjacent to everything else. Currently one chapter, with more planned (WebAssembly at the edge, unikernels, confidential computing, on-device AI).
Part 11 — Interview Framework (6 chapters) — RESHADED/PEDALS/ADEPT, requirements, diagrams, trade-offs, company flavors, RFCs
Audience: anyone preparing for or giving system-design interviews. Difficulty: Intermediate. Total reading time: ~4 hours.
How to run a 45-minute system-design interview, from both sides of the whiteboard. Compares RESHADED / PEDALS / ADEPT frameworks, teaches requirements scoping, diagramming, trade-off articulation, company-specific flavors, and RFC/design-doc authoring for Staff-level work.
- Interview Frameworks Compared (RESHADED, PEDALS, ADEPT)
- Requirements Scoping: Functional, Non-Functional, and MoSCoW
- Diagramming Skills for System Design Interviews
- Trade-off Articulation: Saying 'It Depends' Well
- Company-Specific Interview Flavors (Amazon, Google, Meta, Netflix)
- Design Doc Authoring: RFCs, ADRs, and the Staff Engineer's Written Output
Trade-offs Library (22 pages) — the canonical "X vs Y" decision pages cross-referenced from every part
Audience: everyone — these pages are referenced from every other part. Difficulty: Intermediate. Total reading time: ~8 hours.
The 22 most-asked architectural-choice questions, each answered in a dedicated decision-comparison page: flowchart, comparison table, "when to pick A" vs "when to pick B" sections, real-world examples, and citations.
- Strong vs Eventual Consistency
- ACID vs BASE
- SQL vs NoSQL
- Latency vs Throughput
- CAP and PACELC Applied
- Cache Strategies: Cache-Aside vs Write-Through vs Write-Behind
- Batch vs Stream Processing
- Load Balancer vs Reverse Proxy vs API Gateway
- REST vs gRPC vs GraphQL
- Polling vs Long-Polling vs SSE vs WebSockets vs Webhooks
- Rate Limiting Algorithms: Token Bucket vs Sliding Window
- Optimistic vs Pessimistic Concurrency Control
- Partitioning Schemes: Range, Hash, Consistent Hash, Directory
- B-tree vs LSM-tree Storage
- Monolith vs Microservices
- Replication Topologies: Leader-Follower, Multi-Leader, Leaderless
- Distributed Transactions: 2PC vs Saga vs TCC
- Push vs Pull (Fan-out, Messaging, Feed)
- Lambda vs Kappa Architecture
- Vertical vs Horizontal Scaling
- Normalization vs Denormalization
- Single-Region vs Multi-Region Deployment
120 chapters across 15 parts, plus 37 pattern decision pages, 5 long-form editorials, and 46 interactive widget specs. Each chapter is centered on a specific data structure or algorithmic pattern, taught with worked LeetCode problems. Sibling sol.py / sol.java / sol.cpp / sol.go files live under each problem directory; the Python solution is inlined in the chapter, the others are linked. Editorials, pattern decision diagrams, and widget YAML specs are siblings under content/dsa/.
Each part below is collapsed by default — click to expand the chapter list. Linking to a specific part from elsewhere in the README auto-expands it on GitHub.
| # | Part | Chapters | Difficulty | Reading time |
|---|---|---|---|---|
| 0 | Foundations | 7 | Beginner | ~3 hrs |
| 1 | Linear Data Structures | 8 | Beginner | ~5 hrs |
| 2 | Search & Sort | 8 | Beginner-Intermediate | ~5 hrs |
| 3 | Two Pointers, Sliding Window, Prefix Sums | 6 | Intermediate | ~4 hrs |
| 4 | Stack & Queue Patterns | 5 | Intermediate | ~3 hrs |
| 5 | Linked Lists | 6 | Intermediate | ~4 hrs |
| 6 | Trees & Heaps | 11 | Intermediate-Advanced | ~7 hrs |
| 7 | Recursion & Backtracking | 7 | Intermediate-Advanced | ~5 hrs |
| 8 | Graphs | 13 | Intermediate-Advanced | ~9 hrs |
| 9 | Dynamic Programming | 15 | Advanced | ~10 hrs |
| 10 | Greedy | 5 | Intermediate-Advanced | ~3 hrs |
| 11 | Bit Manipulation | 4 | Intermediate | ~2 hrs |
| 12 | Strings & Pattern Matching | 6 | Intermediate-Advanced | ~4 hrs |
| 13 | Design the Data Structure | 7 | Intermediate-Advanced | ~5 hrs |
| 14 | Interview Framework | 12 | All levels | ~5 hrs |
| P | Pattern decision pages | 37 | Intermediate | ~10 hrs |
| E | Long-form editorials | 5 | Intermediate | ~2 hrs |
| W | Interactive widget specs | 46 | — | reference |
Part 0 — Foundations (7 chapters) — Big-O, recursion, bit ops, interview math, language idioms, choosing your language
Audience: anyone starting interview prep, or returning after a long pause. Difficulty: Beginner. Total reading time: ~3 hours.
The mental models and language fluency that everything else assumes. Big-O, the recursion mental model, bit-manipulation primer, the math you actually need for interviews, language idioms across Python / Java / C++ / Go, and how to pick which language to interview in.
Part 1 — Linear Data Structures (8 chapters) — arrays, dynamic-array internals, strings, hash maps, stacks, queues, matrices
Audience: everyone — these are the structures that show up in 70% of all interview problems. Difficulty: Beginner. Total reading time: ~5 hours.
The contiguous-memory and bucket-based data structures: arrays (static / dynamic / multi-dimensional), the amortized-O(1) doubling rule that makes a vector work, strings as encoded byte arrays, hash maps and hash sets, the load-factor / collision math behind them, stacks and queues with the call-stack analogy, and matrix manipulation tricks (rotate-in-place, spiral, transpose).
Part 2 — Search & Sort (8 chapters) — linear/binary search, comparison & linear-time sorts, heap sort, quickselect
Audience: every interviewee — binary search alone shows up in roughly a quarter of all problems. Difficulty: Beginner-Intermediate. Total reading time: ~5 hours.
How to find things and how to order things. Linear search and when it's actually the right answer; the canonical binary search written without off-by-one bugs; the lower_bound / upper_bound / peak / rotated-array variants; the comparison-sort family (insertion, merge, quicksort with the production hybrids like Timsort and Introsort); heap sort and why n log n is the comparison-sort lower bound; the linear-time sorts (counting, radix, bucket); and quickselect for the top-k problems.
- Linear search and what it's good for
- Binary search: the canonical version
- Binary search variants: lower_bound, upper_bound, peaks, and rotated arrays
- Comparison sorts I: insertion sort and merge sort
- Comparison sorts II: quicksort, partition, and the production hybrids
- Heap sort and the n log n lower bound
- Linear-time sorts: counting, radix, bucket
- Quickselect: linear-time selection
Part 3 — Two Pointers, Sliding Window, Prefix Sums (6 chapters) — the array-walking patterns that turn O(n²) into O(n)
Audience: anyone whose nested-loop solutions keep timing out. Difficulty: Intermediate. Total reading time: ~4 hours.
Three closely related patterns that all amount to "walk the array smarter": opposite-ends two pointers (Container With Most Water, 3Sum), same-direction two pointers, fixed and variable sliding windows, prefix sums and difference arrays, and the prefix-sum + hash-map combo for "subarray sum equals K"-shaped problems.
Part 4 — Stack & Queue Patterns (5 chapters) — monotonic stack/deque, min/max stack, expression parsing, queue-from-stacks
Audience: anyone struggling with Next-Greater-Element and sliding-window-maximum problems. Difficulty: Intermediate. Total reading time: ~3 hours.
Stack and queue as algorithmic patterns, not just data structures. Monotonic stacks (Daily Temperatures, Largest Rectangle in Histogram), monotonic deques (Sliding Window Maximum), min/max stacks (O(1) min query under push/pop), expression parsing (Shunting-Yard / RPN), and the queue-from-stacks amortization argument.
Part 5 — Linked Lists (6 chapters) — pointer rewiring, reversal, k-group reversal, Floyd's cycle, merging, LRU cache
Audience: anyone who's drawn boxes-and-arrows on a whiteboard and gotten lost. Difficulty: Intermediate. Total reading time: ~4 hours.
Linked lists are pointer surgery. Sentinel-node patterns, the canonical iterative reversal (and the recursive cousin), reverse-in-groups-of-k, Floyd's tortoise-and-hare cycle detection (and why the cycle-start formula works), merging sorted lists, and the LRU cache as the canonical hash-map + doubly-linked-list combo.
Part 6 — Trees & Heaps (11 chapters) — traversals, Morris, BFS, heaps, BST, AVL/RB, tries, tree DP, segment trees
Audience: anyone past the linear-data-structure phase of prep. Difficulty: Intermediate-Advanced. Total reading time: ~7 hours.
The hierarchical structures and the priority queue. Binary tree fundamentals, the three depth-first traversals (pre/in/post — both recursive and iterative), Morris traversal for O(1) space, level-order BFS, heaps and priority queues (heapify in O(n)), binary search trees, AVL rotations, a red-black overview, tries, the tree-DP primer (post-order with side state), and an introduction to segment trees for range queries.
- Binary tree fundamentals
- Tree traversals: pre, in, post
- Morris traversal: O(1)-space inorder by threading
- Level-order traversal: BFS on trees
- Heaps and priority queues
- Binary search trees
- AVL trees and rotations
- Red-black trees: an overview
- Tries
- Tree DP primer: post-order with side state
- Segment trees
Part 7 — Recursion & Backtracking (7 chapters) — recursion patterns, the template, subsets/perms, N-Queens, Sudoku, randomized algos
Audience: anyone whose subset/permutation/combination solutions feel like guesswork. Difficulty: Intermediate-Advanced. Total reading time: ~5 hours.
Backtracking is just DFS with state-restoration discipline. The recursion patterns (linear, tree, divide-and-conquer); the backtracking template you can adapt to any constraint problem; subsets, combinations, and permutations; N-Queens with pruning; Sudoku with constraint propagation and forward checking; word search on a grid; and Fisher-Yates, reservoir sampling, and rejection sampling for randomized algorithms.
- Recursion patterns: linear, tree, and divide-and-conquer
- The backtracking template
- Subsets, combinations, permutations
- N-Queens: pruning and constraint propagation
- Sudoku solver: constraint propagation and forward checking
- Word search and grid backtracking
- Randomized algorithms: Fisher-Yates, reservoir sampling, rejection sampling
Part 8 — Graphs (13 chapters) — BFS/DFS, components, topo, cycles, bipartite, union-find, Dijkstra, Bellman-Ford, MSTs
Audience: anyone preparing for FAANG-tier interviews — graph problems are the differentiator. Difficulty: Intermediate-Advanced. Total reading time: ~9 hours.
The graph chapters are interview-grade end-to-end. Adjacency-list vs adjacency-matrix representations; BFS and DFS as orthogonal traversal templates; connected components and flood fill (Number of Islands); topological sort via Kahn's queue and via DFS post-order reverse; cycle detection on directed and undirected graphs; bipartite checking; union-find with path compression and union-by-rank; Dijkstra's algorithm; Bellman-Ford and negative cycles; and minimum spanning trees via both Kruskal and Prim.
- Graph representation
- Breadth-first search
- Depth-first search
- Connected components and flood fill
- Topological sort: Kahn's algorithm
- Topological sort: DFS post-order reverse
- Cycle detection in graphs
- Bipartite check
- Union-Find: parent forests, path compression, and union by rank
- Dijkstra's shortest-path algorithm
- Bellman-Ford and negative cycles
- Minimum spanning tree: Kruskal's algorithm
- Minimum spanning trees: Prim's algorithm
Part 9 — Dynamic Programming (15 chapters) — memo↔tab, 1D/grid/interval/tree/bitmask, knapsack, LCS, edit distance, LIS, palindromes
Audience: the part most candidates fear most. Read this if "DP problems just don't click." Difficulty: Advanced. Total reading time: ~10 hours.
The most demanding part of the handbook. Build DP up from recursion (top-down memoization) and re-derive the bottom-up tabulation; 1D-state DPs (Climbing Stairs, House Robber, decision DP); the string-prefix decision DPs (Decode Ways, Word Break); 0/1 and unbounded knapsacks; longest-common-subsequence and edit distance; LIS in O(n²) and the patience-sort O(n log n) variant; palindrome DP; interval DP (matrix chain, burst balloons); grid DP; tree DP; and the bitmask DP family for "all subsets" problems.
- Dynamic Programming: From Recursion to Memoization
- DP: bottom-up tabulation
- Dynamic programming on a 1D state
- Decode Ways and Word Break: string-prefix decision DP
- 0/1 knapsack
- Unbounded knapsack: when items can be picked over and over
- Longest common subsequence
- Edit distance
- Longest Increasing Subsequence: the quadratic DP
- LIS: patience sort
- Palindrome DP
- Interval DP: matrix chain and burst balloons
- Grid DP: forward fills, backward survives
- Tree DP: states that travel up the call stack
- Bitmask DP
Part 10 — Greedy (5 chapters) — when local choices win, intervals, activity selection, Huffman, jump games
Audience: anyone who's been burned by a greedy that "looked right" and got Wrong Answer. Difficulty: Intermediate-Advanced. Total reading time: ~3 hours.
When local choices yield global optima, and how to prove it. Greedy thinking framed against DP; interval scheduling (the sorting comparator is the algorithm); activity selection and the task-scheduler family; Huffman encoding; and Jump Games / Gas Station as canonical "scan once, maintain a running invariant" problems.
Part 11 — Bit Manipulation (4 chapters) — the bit-ops cookbook, XOR patterns, bitmask techniques, performance tricks
Audience: anyone preparing for low-level / performance-oriented interviews (HFT, embedded, kernel, GPU). Difficulty: Intermediate. Total reading time: ~2 hours.
The bit-ops cookbook (set/clear/toggle, lowest-set-bit, popcount); XOR patterns (Single Number I/II/III, Missing Number); bitmask techniques as compact subset state; and bit-level performance tricks for the "your code is correct, just make it 10× faster" interview round.
Part 12 — Strings & Pattern Matching (6 chapters) — naive matching, Rabin-Karp, KMP, Z-array, Aho-Corasick, suffix arrays
Audience: anyone interviewing where strings are a focus area (search infra, NLP infra, IDEs, compiler tooling). Difficulty: Intermediate-Advanced. Total reading time: ~4 hours.
Substring search beyond the naive O(n·m) baseline. Rabin-Karp with rolling hashes; KMP and the failure function (and why it generalizes the prefix-function); the Z-algorithm; Aho-Corasick for matching many patterns in one pass; and an introduction to suffix arrays for "all substring" queries.
Part 13 — Design the Data Structure (7 chapters) — LRU/LFU, min stacks, hit counter, trie autocomplete, Twitter feed, game state
Audience: SDE2+ candidates — "design X" is the staple senior-coding-round question type. Difficulty: Intermediate-Advanced. Total reading time: ~5 hours.
The bridge between DSA and HLD: how to compose primitives into APIs that meet a per-operation complexity contract. The full LRU treatment with concurrency and multi-tier framing; LFU via frequency-bucketed doubly-linked lists; min stacks and max-frequency stacks; hit counters and rate limiters; trie-backed autocomplete; Twitter-feed design; and a game-state design vignette.
Part 14 — Interview Framework (12 chapters) — pattern recognition, clarifying questions, communicating, complexity, mocks, per-company tracks
Audience: anyone within a month of an actual interview window. Difficulty: All levels. Total reading time: ~5 hours.
The meta-skills that turn a correct solution into a passing one. Pattern-recognition drills; the first-five-minutes clarifying-questions script; how to narrate while you code; how to discuss complexity convincingly; running productive mock interviews; the most common pitfalls; Amazon Leadership Principles (brief and narration); the Meta AI-assisted round (format, prompting tactics, common failures); and the per-company tracks index that points at company-specific reading lists.
- Pattern recognition: the fastest skill to develop
- The first five minutes: clarifying questions
- Communicating during the interview
- Complexity discussions
- The mock interview process
- Common pitfalls
- Amazon Leadership Principles, briefly
- Amazon Leadership Principles narration
- Meta's AI round: the format
- Meta's AI round: prompting tactics
- Meta's AI round: common failures
- Per-company tracks: index and how to use
Audience: anyone deciding "which pattern fits this problem?" mid-interview. Difficulty: Intermediate. Total reading time: ~10 hours total; each page is a 15-30 min decision aid.
Each page is a focused decision-comparison: a Mermaid decision tree, a "pick A vs pick B" table, archetype problems on each side, and references. Click to expand the full list.
All 37 pattern decision pages
- Recursion vs Iteration
- Memoization vs Tabulation
- Hash Map vs Sorted Map
- Array vs Linked List
- Stack vs Queue vs Deque
- Heap vs Sorted Array vs BST
- Two-Heap Median Archetypes
- k-Way Merge: Cursor Heap
- BFS vs DFS on Graphs
- Shortest-Paths Decision Tree
- Kruskal vs Prim
- Quicksort vs Mergesort vs Heapsort
- Binary Search vs Linear Scan
- Two Pointers vs Sliding Window
- Iterative vs Morris vs Recursive Traversal
- Trie vs Hash Map vs Sorted Set
- Union-Find vs DFS
- Topological Sort: Kahn vs DFS
- Greedy vs DP
- Backtracking vs DP
- Recursion vs DP
- When to Use Bit Manipulation
- BFS vs DFS on Trees
- Quickselect vs Heap for Top-K
- Sliding Window Archetypes
- Two-Pointer Archetypes
- Prefix-Sum Archetypes
- Monotonic Stack Archetypes
- Monotonic Deque Archetypes
- In-Place Linked-List Reversal
- Cyclic-Sort Archetypes
- Top-K: Heap or Quickselect
- Merge-Intervals Archetypes
- Tree-DP Archetypes
- Bitmask-DP Archetypes
- Backtracking Template & Pruning
- Sweep-Line Archetypes
Audience: anyone who's solved an "easy" problem and wants the senior-engineer-level analysis behind it. Difficulty: Intermediate. Total reading time: ~2 hours total.
Five canonical interview problems treated as full editorial-style essays: every approach (brute force → optimized → optimal), the proof of correctness, the edge cases, the production framing, and the follow-up variants.
All 5 long-form editorials
- LC-001 — Two Sum (full editorial under LC-001-two-sum/)
- LC-003 — Longest Substring Without Repeating Characters (full editorial under LC-003-longest-substring-without-repeating-characters/)
- LC-005 — Longest Palindromic Substring (full editorial under LC-005-longest-palindromic-substring/)
- LC-011 — Container With Most Water (full editorial under LC-011-container-with-most-water/)
- LC-015 — 3Sum (full editorial under LC-015-3sum/)
Audience: contributors authoring or editing the website's interactive widgets. Format: YAML; rendered as interactive animations on dsa.handbook.academy.
Each widget is described as keyframe data plus narration in YAML. The website's renderer turns each spec into an interactive (play/pause/scrub/step) animation alongside the chapter prose. Click to expand the full list.
Editorial-tied widgets (5)
Pattern widgets (41)
- w-01 — Recursion call stack
- w-02 — Hash table
- w-03 — Matrix rotation
- w-04 — Binary search
- w-05 — Sorting visualizer
- w-06 — Quicksort partition
- w-07 — Quickselect
- w-08 — Two-pointer 3Sum
- w-09 — Sliding window expansion
- w-10 — Prefix-sum cumulative
- w-11 — Monotonic stack
- w-12 — Amortized queue via stacks
- w-12 — Monotonic deque
- w-13 — Linked-list pointer rewiring
- w-14 — Floyd cycle
- w-15 — LRU cache
- w-16 — Tree traversal animator
- w-17 — Morris thread
- w-18 — Heap operations
- w-19 — BST rotations
- w-20 — Trie
- w-21 — Segment tree
- w-22 — Backtracking tree
- w-23 — N-Queens
- w-24 — Sudoku grid
- w-25 — Graph BFS
- w-26 — Graph DFS
- w-27 — Topological sort (Kahn)
- w-28 — Union-Find
- w-29 — Dijkstra
- w-30 — MST: Kruskal & Prim
- w-31 — DP table fill
- w-32 — Knapsack fill
- w-33 — LIS via patience
- w-34 — Bitmask DP
- w-35 — Prefix-sum + hash combo
- w-36 — Interval scheduling
- w-37 — Bellman-Ford
- w-38 — KMP
- w-39 — Huffman encoding
- w-40 — Z-array
Widgets are catalogued in content/dsa/widgets/_widget-registry.yml; see content/dsa/widgets/README.md for authoring conventions.
You don't need one — pick any chapter and start reading. These are here if you prefer a pre-built route, typically because you're preparing for a specific interview window or filling a specific gap.
Goal: pass a 45-60 minute system-design screen at a mid-to-senior level. Roughly 8-10 hours of reading per week.
| Week | Focus | Chapters |
|---|---|---|
| 1 | Foundations | Part 0 + Part 1 (12 chapters, ~7 hrs) |
| 2 | Building blocks I | Part 2 chapters 0-7: load balancers, proxies, CDN, cache, SQL, NoSQL, partitioning, replication |
| 3 | Building blocks II | Part 2 chapters 8-15: queues, pub/sub, real-time, rate limiting, service mesh, blob storage, geo, edge |
| 4 | Case studies (core) | Pick 5 from Part 8 chapters 0-9: URL shortener, rate limiter, chat, feed, web crawler, autocomplete |
| 5 | Case studies (your target) | Pick 5 more from Part 8 relevant to your target company (see Company-Specific Flavors) |
| 6 | Interview mechanics | Part 11 + top 5 most-cited pages in Trade-offs Library |
Goal: operate at Senior level, own cross-team architecture, pass loops at Senior+ bars.
| Phase | Focus | Duration |
|---|---|---|
| 1 | Foundations + building blocks | 3 weeks — Parts 0-2 (28 chapters) |
| 2 | Distributed theory + data + architecture | 4 weeks — Parts 3, 4, 5 (32 chapters) |
| 3 | Case studies (all 56) | 4 weeks — Part 8 |
| 4 | Reliability, security, AI, frontier, interview | 2 weeks — Parts 6, 7, 9, 10, 11 + Trade-offs Library |
Read everything in order. Use the end-of-chapter questions for active recall. Average one 25-minute chapter per day gets you through the whole thing in about 6 months, with buffer for harder chapters and re-reading.
If you already know the fundamentals and want to become fluent specifically in AI-systems design:
| Week | Focus | Chapters |
|---|---|---|
| 1 | LLM serving + RAG | Part 9 chapters 0-3 |
| 2 | Agents + evaluation | Part 9 chapters 3-9 |
| 3 | AI case studies | Part 8 chapters 30-37 (ChatGPT, RAG, coding agent, Perplexity, voice, moderation, semantic cache, model router) |
| 4 | ML fundamentals | Part 9 chapters 9-14 + Recommendation System case study |
If you have a loop on Monday:
- Saturday morning: How to Approach a System Design Question + Back-of-the-Envelope Estimation + Requirements Scoping + Diagramming Skills
- Saturday afternoon: 3 case studies in your domain
- Sunday morning: 3 more case studies + Trade-off Articulation
- Sunday evening: Company-Specific Interview Flavors for your target
| Metric | HLD Handbook | DSA Handbook | Repo total |
|---|---|---|---|
| Parts | 12 + Trade-offs Library | 15 | 27 + library |
| Teaching chapters | 159 | 120 | 279 |
| Decision/pattern pages | 22 trade-offs | 37 patterns | 59 |
| Long-form editorials | — | 5 | 5 |
| Interactive widget specs | — | 46 | 46 |
| Sibling code samples (per language) | — | 155 problems × 4 langs ≈ 620 files | 620 |
| Total Markdown pages | 181 | 162 | 343 |
| Total words | ~773,000 | ~470,000 | ~1,240,000 |
| Mermaid diagrams | 719 | 226 | 945 |
| Citations to primary sources | 3,100+ | hundreds | 3,500+ |
| Estimated total reading time | ~110 hours | ~60 hours | ~170 hours |
| Equivalent printed book page count | ~2,400 pages | ~1,500 pages | ~3,900 pages |
For comparison, the HLD book alone is longer than Designing Data-Intensive Applications (~600 pages) + Alex Xu's System Design Interview Vol. 1 (~300 pages) + Vol. 2 (~340 pages) combined. The DSA book is comparable in length to Cracking the Coding Interview (~700 pages) plus Elements of Programming Interviews (~480 pages).
Every chapter in this repository passes 7 automated CI checks on every PR. The validators run against both books and dispatch on a per-book schema:
- markdownlint — Markdown style and structure conformance.
- typos — source-code spell-check with a project-specific allowlist for technical terms.
- Citation integrity — every
[^1]-style footnote has a corresponding[^1]: sourcedefinition; every citation is a real URL; no orphan citations. - Frontmatter validation — HLD chapters declare
title,difficulty,prerequisites,date_created,date_updated,reading_time_minutes,tags(canonical taxonomy), andtechnologies(curated allowlist). DSA chapters declaretitle,slug,part,chapter,difficulty,languages(subset ofpython/java/cpp/go),canonical_test,widgets, andladder(referencingLC-NNNIDs in_problem-registry.yml). Editorials and pattern decision pages have their own thinner schemas. Seescripts/check-frontmatter.mjsfor the per-book branches. - Mermaid diagram validation — all 945 diagrams across both books must parse with
@mermaid-js/mermaid-clion CI so broken syntax doesn't ship. - Vale — prose-style linter for voice, passive voice, weasel words, and banned phrases. Custom rules in
.vale.ini. - lychee — external link checker run weekly; flags rotted URLs so citations stay valid.
CI configuration lives in .github/workflows/content-ci.yml. Validator scripts are in scripts/.
Beyond automation, every chapter is reviewed for:
- Internal consistency (terminology matches Part 1 definitions)
- Difficulty calibration (a "Beginner" chapter doesn't assume Staff-level context)
- Diagram quality (Mermaid, not screenshots; captioned; accessibility-tagged)
- Citation quality (primary sources preferred over summarizing blog posts)
Contributions of all sizes are welcome, from a typo fix to a full chapter. Read CONTRIBUTING.md for the full workflow. A short summary:
| Time | What you can do | Issue required? |
|---|---|---|
| 5 min | Fix a typo or dead link | No |
| 15 min | Add a missing citation or update an out-of-date number | No |
| 30 min | Add a real-world example or clarify a confusing paragraph | No |
| 1 hour | Create a Mermaid diagram for an existing chapter | Optional |
| 2-4 hours | Review a chapter for technical accuracy and leave feedback | Yes |
| 4-8 hours | Write a full chapter from an outline | Yes, required |
| Translator | Translate one chapter or the full handbook into your language | Yes |
-
Read the STYLE_GUIDE.md for voice, structure, diagram conventions, and citation format.
-
Open an issue first for anything bigger than 30 minutes of work — this ensures you don't duplicate in-flight work.
-
Fork, branch, edit. Use descriptive branch names like
fix/raft-quorum-mathoradd/mcp-protocol-chapter. -
Run validators locally (optional):
npm install npm run check:all
If you skip this, CI will run them on your PR anyway and tell you what to fix.
-
Submit a pull request using the PR template. Small fix? A one-sentence description is fine. Full chapter? Describe the pedagogical approach and list your primary sources.
-
Respond to review. A maintainer will review within 7 days (usually faster). For content chapters, expect at least one round of technical review.
- Correctness. If you're citing a number, cite the primary source. If you're claiming a property (like "Raft guarantees linearizability"), cite the paper.
- Clarity. Write for the difficulty tier declared in the chapter's frontmatter. Don't introduce Part-7 concepts in a Part-0 chapter.
- Opinion with evidence. If you think the chapter should recommend something different, make the case with citations, not vibes.
- Pedagogical structure. Intro → first principles → diagram → worked example → trade-offs → production gotchas → references. Deviate only when the topic genuinely demands it.
If you have operated one of the systems we cover in Part 8 (payment processing, real-time chat, feeds, video streaming, etc.) — your review is worth more than a month of solo research. Open an issue with your expertise area and which chapter you'd like to review.
- Security issues in validator scripts or CI: open a private security advisory.
- Code of conduct concerns: email hello@handbook.academy.
- Legal or licensing questions: email hello@handbook.academy.
- Direct contact: @invincible04.
engineering-handbook/
├── content/ # Two open-source books (CC BY-SA 4.0)
│ ├── hld/ # HLD Handbook — 181 pages
│ │ ├── part-0-prerequisites/ # 5 chapters
│ │ ├── part-1-core-fundamentals/ # 7 chapters
│ │ ├── part-2-building-blocks/ # 16 chapters
│ │ ├── part-3-distributed-systems-theory/ # 11 chapters
│ │ ├── part-4-data-systems/ # 10 chapters
│ │ ├── part-5-architecture-patterns/ # 11 chapters
│ │ ├── part-6-reliability-and-operations/ # 11 chapters
│ │ ├── part-7-security-at-scale/ # 10 chapters
│ │ ├── part-8-case-studies/ # 56 chapters
│ │ ├── part-9-ai-ml-system-design/ # 15 chapters
│ │ ├── part-10-emerging-patterns/ # 1 chapter
│ │ ├── part-11-interview-framework/ # 6 chapters
│ │ └── trade-offs/ # 22 decision pages
│ └── dsa/ # DSA Handbook — 120 chapters
│ ├── part-0-foundations/ # 7 chapters
│ ├── part-1-linear-data-structures/ # 8 chapters
│ ├── part-2-search-sort/ # 8 chapters
│ ├── part-3-pointers-window-prefix/ # 6 chapters
│ ├── part-4-stack-queue-patterns/ # 5 chapters
│ ├── part-5-linked-lists/ # 6 chapters
│ ├── part-6-trees-heaps/ # 11 chapters
│ ├── part-7-recursion-backtracking/ # 7 chapters
│ ├── part-8-graphs/ # 13 chapters
│ ├── part-9-dynamic-programming/ # 15 chapters
│ ├── part-10-greedy/ # 5 chapters
│ ├── part-11-bit-manipulation/ # 4 chapters
│ ├── part-12-strings-pattern-matching/ # 6 chapters
│ ├── part-13-design-the-data-structure/ # 7 chapters
│ ├── part-14-interview-framework/ # 12 chapters
│ ├── editorials/ # Per-LC long-form editorials
│ ├── patterns/ # Pattern decision references
│ ├── widgets/ # YAML specs for interactive widgets
│ ├── _problem-registry.yml # Canonical LC-NNN registry
│ └── _widget-registry.yml # Canonical w-NN/e-LCNNN registry
│
├── writing-guides/ # Author handbooks (CC BY-SA 4.0)
│ ├── case-study-template.md # Skeleton for Part 8 case studies
│ └── trade-off-template.md # Skeleton for trade-off decision pages
│
├── scripts/ # Content validators used by CI
│ ├── check-citations.mjs # Footnote integrity
│ ├── check-frontmatter.mjs # YAML frontmatter schema (per-book)
│ ├── check-mermaid.mjs # Mermaid syntax validator
│ ├── content-stats.mjs # Word / diagram / citation counts
│ ├── technologies.json # Curated technology name taxonomy (HLD)
│ └── update-frontmatter-dates.mjs # Bulk-touch date_updated on edited files
│
├── .github/
│ ├── workflows/
│ │ ├── content-ci.yml # PR validation pipeline (both books)
│ │ └── stale.yml # Issue/PR hygiene
│ ├── ISSUE_TEMPLATE/ # Correction / Request / Writing templates
│ ├── PULL_REQUEST_TEMPLATE.md
│ ├── CODEOWNERS
│ ├── FUNDING.yml
│ ├── dependabot.yml
│ └── release.yml
│
├── STYLE_GUIDE.md # Voice, structure, diagram conventions
├── CONTRIBUTING.md # Full contribution workflow
├── CODE_OF_CONDUCT.md # Contributor Covenant v2.1
├── CONTRIBUTORS.md # All-contributors list
├── SECURITY.md # Security disclosure process
├── CITATION.cff # Citation metadata (Zenodo, academic tools)
├── LICENSE # CC BY-SA 4.0 (everything in this repo)
├── package.json # Dev dependencies for validators
├── lychee.toml # Link-check configuration
├── .vale.ini # Prose style rules
├── .markdownlint.json # Markdown style rules
└── .typos.toml # Spell-check allowlist
You don't need anything installed to contribute content — edit Markdown, push, let CI validate. But if you want to run the checks locally before pushing:
- Node.js 20+ (the
.nvmrcfile pins the exact version) - npm (ships with Node)
- Optional: Vale for prose-style linting
- Optional: lychee for link checking
git clone https://github.com/handbook-academy/engineering-handbook.git
cd engineering-handbook
nvm use # pins Node version from .nvmrc
npm install # installs markdownlint, @mermaid-js/mermaid-cli, etc.# Run all 7 content checks (same as CI)
npm run check:all
# Individual checks
npm run lint # markdownlint
npm run check:citations # footnote integrity
npm run check:frontmatter # YAML schema
npm run check:mermaid # Mermaid diagram syntax
npm run check:typos # spell-check
npm run check:links # lychee link check (slow; runs weekly)
# Optional
npm run prose # Vale prose-style linter (requires Vale installed)
npm run stats # word / diagram / citation countsUse Markdownlab (source) for editing chapters. It's a browser-based Markdown editor with live preview, Mermaid rendering, and the same callouts/footnotes/admonitions this handbook uses — no install, no setup.
Open any chapter directly in the editor by prefixing its GitHub URL with https://markdownlab.vercel.app/:
https://markdownlab.vercel.app/https://github.com/handbook-academy/engineering-handbook/blob/main/content/hld/part-1-core-fundamentals/00-scalability.md
That URL opens the chapter in Markdownlab pre-loaded and ready to edit. Paste your edits back into a fork → branch → PR.
The repo includes .editorconfig, .nvmrc, .markdownlint.json, .vale.ini, and .typos.toml so any editor that reads them picks up project settings automatically.
Benevolent maintainer model. The project is currently solo-maintained by @invincible04. Major architectural decisions (curriculum scope, licensing, CI strategy) are made by the maintainer after consultation with active contributors via Issues/Discussions.
Decision process:
- Typo / dead link / factual correction: any maintainer can merge.
- New chapter or major rewrite: requires an issue with pedagogical rationale and at least one maintainer approval.
- Curriculum scope change (new part, renamed part, reordering): requires a Discussion thread open for at least 7 days before merge.
- License change: requires all-contributors consent; CC BY-SA 4.0 for everything in this repository is a stable commitment.
Becoming a maintainer. Sustained, high-quality contributions over 3+ months can result in a maintainer invitation. This includes merge rights on your areas of expertise and a vote on curriculum-scope decisions.
See CODEOWNERS for the current maintainer routing.
Is this really free?
Yes, and specifically: everything in this repository — both books' content, writing guides, style guide, validator scripts — is released under CC BY-SA 4.0. Anyone (including us) can share and adapt it, but adaptations must carry the same license. That means the chapter text cannot be paywalled by anybody — not by us, not by a future company, not by anyone who forks this repo. You can always read both books free at handbook.academy.
Can I use this to build a paid course?
Yes. CC BY-SA 4.0 allows commercial use provided you:
- Attribute the original ("Adapted from The Engineering Handbook, CC BY-SA 4.0, https://github.com/handbook-academy/engineering-handbook")
- License your derivative under the same CC BY-SA 4.0 terms
- Indicate what you changed
You can teach courses, run boot camps, build YouTube channels, or sell books derived from this content — under share-alike. What you can't do is take the content, strip the attribution, and relicense it as proprietary.
Can I translate it into my language?
Please do. Translations are explicitly welcomed. Open an issue saying which language you'd like to translate into and which chapters you plan to start with. Translations live in parallel directories (e.g. content-hi/ for Hindi, content-zh/ for Chinese) under the same repository, or as a linked sister repo — your call.
How is this different from `donnemartin/system-design-primer`?
system-design-primer is a link-dump README that points at scattered blog posts and talks. It's great as a discovery tool. The HLD Handbook in this repo is inline content — every topic is taught fully within these pages. They're complementary: system-design-primer for breadth of resources, this handbook for depth of teaching.
How is this different from Alex Xu's books?
Alex Xu's books are excellent and we cite them in nearly every case study. The differences:
- Scope: ~640 pages (Vols 1+2) vs ~2,400 equivalent pages of HLD here.
- Freshness: Xu Vol. 2 was published in 2022 and doesn't cover LLMs, RAG, agents, CRDTs, post-quantum crypto, MCP, etc. This handbook does.
- Format: Xu is case-study-focused. This handbook has 56 case studies plus 103 foundational HLD chapters that teach the building blocks.
- License: Xu's books are copyrighted and paid. This handbook is CC BY-SA 4.0.
- Community: Xu's books are one-author; this handbook accepts PRs.
- Bonus: this repo also includes a 120-chapter DSA curriculum.
How is the DSA book different from NeetCode / LeetCode editorials / "Cracking the Coding Interview"?
- Structured by pattern, not by problem. Every DSA chapter is a teaching chapter on a structure or pattern (e.g. monotonic stack, sliding-window-variable, segment tree). LeetCode problems show up as worked examples inside the chapter, not as the unit of study.
- Four languages, side by side. Each canonical problem ships with
sol.py,sol.java,sol.cpp,sol.gosiblings. You read the chapter in Python, then click through to your interview language without re-deriving the logic. - 37 pattern decision pages. When a problem could be solved by recursion or iteration, BFS or DFS, sliding window or prefix sum — there's a pattern decision page that compares them and tells you which to reach for first. This is the part NeetCode skips.
- Interactive widgets. On
dsa.handbook.academy, the relevant chapters render live, animated visualisations for sliding windows, monotonic stacks, Morris traversal, quickselect partitioning, and so on. Useful when the words on the page aren't enough. - Open and editable. CTCI is paywalled and frozen at one author's voice. This is CC BY-SA 4.0 and accepts PRs.
Is this the same as the websites at hld.handbook.academy and dsa.handbook.academy?
The content is identical. The websites add a polished reading UI: full-text search, dark mode, syntax-highlighted code blocks, per-chapter diagram zoom, social cards, OG images, fast client-side navigation, and — on the DSA site — the live interactive widgets. Same content, better reading experience — and still free, with no sign-up.
Why don't you include an "Awesome" list of external resources?
We cite primary sources inline where they're relevant. A separate "awesome" list would duplicate that work and go stale faster. If a resource is worth reading, it's cited in a chapter's References section.
I want to submit a chapter on [topic X]. How?
Open an issue first with the writing template explaining:
- Which part this chapter belongs in
- Why it belongs (not already covered? modern emerging topic? missing from Part X?)
- Proposed outline (1-2 paragraphs)
- Your primary sources
A maintainer will respond with scope feedback within a week. Once approved, fork → draft → PR. For Part 8 chapters use writing-guides/case-study-template.md; for trade-off pages use writing-guides/trade-off-template.md. All other chapters follow the skeleton documented in STYLE_GUIDE.md.
How do I cite this in an academic paper?
Use the BibTeX entry in the Citation section below, or the CITATION.cff file (which academic tools like Zotero and Zenodo can parse directly).
Everything in this repository — the 343 chapters and decision pages across both books, writing guides, style guide, validator scripts, templates, and configuration — is licensed under Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
What CC BY-SA 4.0 means in practice:
You can:
- Read it, share it, print it, screenshot it — forever.
- Build courses, boot camps, YouTube channels, or books derived from it.
- Translate it into any language.
- Use it commercially.
You must:
- Attribute the original (link back to this repo).
- Release derivatives under the same CC BY-SA 4.0 license (share-alike).
- Not relicense the content as proprietary or impose additional restrictions.
Nobody can paywall the chapter text itself under CC BY-SA 4.0. You can build anything on top of it, as long as the text itself stays open. Read both handbooks free at handbook.academy — no sign-up, no paywall, ever.
If you reference this project in academic work, blog posts, books, or courses:
@misc{engineering-handbook,
author = {Soni, Aayush},
title = {The Engineering Handbook: Open-Source Engineering Curricula (HLD + DSA)},
year = {2026},
publisher = {GitHub},
url = {https://github.com/handbook-academy/engineering-handbook},
note = {HLD: 159 chapters + 22 trade-off pages, ~773K words. DSA: 120 chapters across 15 parts, ~470K words, 155 LeetCode problems with 4-language solutions. CC BY-SA 4.0.}
}"The Engineering Handbook (HLD + DSA)" by Aayush Soni, https://github.com/handbook-academy/engineering-handbook, CC BY-SA 4.0.
The CITATION.cff file in this repository is a Citation File Format descriptor. GitHub's "Cite this repository" button, Zenodo, Zotero, and academic reference managers can parse it directly.
This project stands on the shoulders of many, and we cite primary sources liberally throughout. A few that shaped the handbook most:
Canonical books — HLD:
- Martin Kleppmann — Designing Data-Intensive Applications is the foundation underneath most of HLD Parts 3-4.
- Alex Xu — System Design Interview Volumes 1 and 2 defined the interview case-study format we build on in HLD Part 8.
- Chip Huyen — Designing Machine Learning Systems and AI Engineering shape HLD Part 9.
- Betsy Beyer et al. (Google) — Site Reliability Engineering and The SRE Workbook are behind HLD Part 6.
- Mark Richards & Neal Ford — Fundamentals of Software Architecture and Software Architecture: The Hard Parts for HLD Part 5.
Canonical books — DSA:
- Cormen, Leiserson, Rivest, Stein (CLRS) — Introduction to Algorithms is the bedrock for proofs, complexity arguments, and the algorithms in DSA Parts 2, 8, 9, and 12.
- Robert Sedgewick & Kevin Wayne — Algorithms, 4th Edition shapes the data-structure exposition in DSA Parts 1, 5, 6, and 8.
- Jon Bentley — Programming Pearls is the spirit behind the prompt cards.
- Steven & Felix Halim — Competitive Programming for the contest-driven techniques in DSA Parts 9, 11, and 12.
- Adnan Aziz, Tsung-Hsien Lee, Amit Prakash — Elements of Programming Interviews for the problem-shaping that influenced the DSA prompt cards.
- Gayle Laakmann McDowell — Cracking the Coding Interview for setting the bar on interview-style explanations.
Open-source references:
- Donne Martin — system-design-primer.
- NeetCode — neetcode.io and the NeetCode 150 problem list for the DSA practice ladder template.
- LeetCode community — for the canonical problem corpus and the editorial conventions we adapt.
- The Kubernetes, CNCF, and OpenTelemetry communities — for cloud-native and observability standards.
- The Apache projects (Kafka, Cassandra, Hadoop, Spark, Flink) — for foundational distributed-systems code and docs.
Papers, RFCs, and postmortems:
- SIGMOD / VLDB / OSDI / SOSP / NSDI papers — the bedrock of distributed systems.
- IETF / W3C / IANA — for networking and web standards.
- Every engineer who has written a public postmortem. Real outages taught us everything about graceful degradation, blast radius, and the limits of testing.
We teach the same concepts in our own words, with our own diagrams, and we cite the originals.
- GitHub Discussions — design questions, feedback, sharing what you're building.
- GitHub Issues — corrections, content requests, chapter proposals.
- Email — hello@handbook.academy for private or sensitive matters.
- Maintainer — @invincible04.
If you find this useful, star the repo. Stars help signal to new contributors that the project is worth their time.
HLD: Networking Fundamentals · Scalability · URL Shortener Case Study · Trade-offs Library
DSA: Arrays · Two pointers · Sliding window (variable) · Patterns
Or skim the full HLD curriculum or the full DSA curriculum above.