AI Agent Framework Analysis

I got tired of answering "which agent framework should we use?" with "it depends" and then spending an hour qualifying that, so I went through 44 of them and wrote down what I found. Saved me from ever having that conversation again, hopefully. Probably not.

February 2026. This is a snapshot. The field moves fast enough that some of this will be wrong by the time you read it. Check the dates on individual files and adjust your expectations accordingly.

What this is

Not an "awesome list". Those exist, they're fine for discovery, but they don't help you choose. This is more like: I looked at each framework in enough depth to form an opinion, and then I wrote the opinion down. With some kind of evidence, usually.

I split them up into three tiers based on how deep I went:

Tier 1 (11 frameworks): 3,000+ word analyses. Architecture, context handling, tradeoffs, failure modes, code examples. The stuff you'd actually need to make a decision.
Tier 2 (16 frameworks): 1,000–1,500 words. The important bits, when to use it, when not to.
Tier 3 (17 frameworks): 100–200 words. What it is, whether it matters.

One consistent dimension across all of them: context engineering. How much does this framework actually help you manage what goes into the model's context window? The answer is almost always "less than you'd hope," but the specifics differ in interesting ways. If context engineering as a discipline is what you're after, that's what contextpatterns.com is for.

A note on how this was made

The detailed analysis files (architecture breakdowns, code examples, comparison tables) were produced with heavy AI assistance. I directed the research, verified the findings, and used the frameworks myself, but the structured reference material was not written by hand, and it reads like it. The editorial voice lives in this README, the synthesis, and the quoted notes at the top of each analysis file. The rest is research output that I've reviewed in varying detail for accuracy but not rewritten for personality. Seemed more honest to say that than to pretend otherwise, it's not like you can't tell anyway. Celebrate the emdashes!

Quick Navigation

Tier 1

Framework	Best For	Key Differentiator
LangChain / LangGraph	Production agents, large ecosystem	127K stars, 600+ integrations, graph orchestration
CrewAI	Multi-agent collaboration	Role-based agents, Flows for deterministic control
AutoGen (Microsoft)	Enterprise, Microsoft stack	Dual API (AgentChat + Core), distributed runtimes
Letta (MemGPT)	Long-running conversations	Hierarchical memory: core / recall / archival
Vercel AI SDK	TypeScript / web apps	Best-in-class streaming, React hooks
Pydantic AI	Type-safe Python agents	Full Pydantic v2 validation, Logfire observability
OpenAI Agents SDK	OpenAI ecosystem	Handoffs, guardrails, fastest time-to-agent
Anthropic / Claude Code	Computer use, coding agents	Pattern-based (not a framework), 200K context
Mastra	TypeScript agents with memory	Working memory + semantic recall, Next.js native
Haystack (deepset)	Enterprise-scale RAG	Pipeline architecture, Elasticsearch integration
Google ADK / Genkit	Google Cloud / Gemini	A2A protocol, Vertex AI, built-in tracing

Tier 2

Framework	Focus
AG2	AutoGen community fork
Agno	Lightweight Python agents (formerly Phi)
AutoGen Studio	Visual no-code AutoGen UI
AutoGPT	Early autonomous agent, mostly historical
DSPy	Prompt optimization via programming
Instructor	Structured LLM output (Pydantic wrapper)
LangFlow	Visual LangChain pipeline builder
LlamaIndex	RAG and data retrieval
Mirascope	Lightweight, type-safe Python
n8n	Workflow automation with AI nodes
OpenHands	Open-source coding agent (formerly OpenDevin)
Open Interpreter	Local code execution agent
Phidata	Production agents with built-in memory
Pi	Minimal context-focused coding agent
Semantic Kernel	Microsoft enterprise, C# / Python
Smolagents	HuggingFace, local models, code execution

Tier 3

BeeAgent, Atomic Agents, ControlFlow, Julep, E2B, Composio, Browser Use, Crawl4AI, Dify, Flowise, MetaGPT, BabyAGI, SuperAGI, AgentGPT, Strands (AWS), Swarms, TaskWeaver

Synthesis & Comparisons

File	What It Contains
comparisons.md	5 comparison matrices: features, context engineering, provider lock-in, production readiness, DX
synthesis.md	Cross-framework patterns, ecosystem trends, consolidation analysis, practitioner recommendations

What I found

Nobody is doing context engineering well

This was the big one. Every framework will tell you how many tokens are in your context window. None of them will tell you whether those tokens are actually helping. No quality monitoring, no proactive compression when context starts degrading, no real isolation between sub-agents. You get a number, and figuring out what to do with it is your problem.

I was surprised at first, but after going through twenty-something frameworks it started making sense. Context management is genuinely hard, and most of these projects are already struggling with orchestration, memory, and tooling. Context quality is apparently next year's problem. Every year.

The ecosystem is consolidating

A lot of these frameworks exist. Not all of them will in a year. Four are pulling ahead far enough that the gap is already meaningful: LangChain/LangGraph on ecosystem breadth, CrewAI for multi-agent, Vercel AI SDK for TypeScript/web, Pydantic AI for type-safe Python. Everything else is either very niche, early, or quietly losing contributors.

AutoGen is the interesting case. Microsoft seems to be heading in a slightly different direction with each release, which is not a great sign if you're trying to build on top of it. Worth watching, wouldn't build anything critical on it right now.

Graph orchestration won

Even frameworks that didn't start with graph-based execution are adding it now. LangGraph pushed this pattern into the mainstream, and it's become the expected baseline. If a framework doesn't support graph-based control flow in some form, that's a meaningful limitation.

Memory is still mostly "good luck"

Most frameworks treat memory as a feature you bolt on. Letta is the only one where memory is load-bearing architecture; it's the whole point. Everywhere else, if you need your agent to remember something across sessions, you're building it yourself with a vector store and duct tape. Probably more fragile than you'd like.

Type safety turns out to matter

Pydantic AI grew quickly, and I think it's because type errors in agent systems are genuinely miserable to debug at runtime. You're chasing phantom bugs through LLM output that looked right but wasn't, and you don't find out until three tool calls later. Frameworks that catch this earlier save real time. Not everyone needs it, but once you've been burned by it in production, it starts feeling a lot less optional.

If you just want a recommendation

Use Case	Primary	Runner-up	Key Tradeoff
Production chatbot with memory	Pydantic AI + Letta	LangChain + LangGraph	Type safety vs. ecosystem breadth
Multi-agent research system	CrewAI	LangGraph	Autonomy vs. control
Enterprise workflow automation	LangChain + LangSmith	Google ADK (if GCP)	Vendor support vs. cloud lock-in
TypeScript / Next.js project	Vercel AI SDK	Mastra	Streaming/UI focus vs. orchestration power
Quick prototype	OpenAI Agents SDK	Vercel AI SDK	Speed vs. flexibility
Code generation agent	Claude Code (patterns)	Smolagents	Product polish vs. customization
Enterprise RAG pipeline	Haystack	LlamaIndex	Pipeline power vs. community size

Context Engineering Pattern Coverage

Each Tier 1 analysis maps the framework against 8 context engineering patterns. Here's where the ecosystem stands:

Pattern	Frameworks with strong support
Select, Don't Dump	Haystack, LlamaIndex, LangChain (via retrievers)
Write Outside the Window	Letta, Mastra, LangChain, AutoGen
Compress & Restart	Letta (automatic), LangChain (ConversationSummaryMemory)
Recursive Delegation	CrewAI, LangGraph, OpenAI Agents SDK (handoffs)
Progressive Disclosure	Letta, Mastra, Haystack
Isolate	LangGraph (subgraphs), limited elsewhere
The Pyramid	LangChain (PromptTemplate), manual elsewhere
Context Rot Awareness	Nobody. Universal gap.

That last row is the telling one.

Architecture Patterns

Pattern	Frameworks	Best For	Watch Out For
Graph-based	LangGraph, Mastra, Google ADK	Production reliability, audit trails	Upfront graph design complexity
Multi-agent / role-based	CrewAI, OpenAI Agents SDK	Complex tasks, collaboration	Debugging opacity
Memory-first	Letta	Long-running agents, assistants	Architectural complexity
Pipeline-based	Haystack, LlamaIndex	RAG, knowledge retrieval	Not agent-native
Pattern-based (no framework)	Anthropic / Claude Code	Full control, custom architectures	You build everything
Tool-first / code execution	Smolagents, Open Interpreter	Local models, automation	Security (obviously)

Lock-in Risk

Risk	Frameworks	Why
High	Claude Code, Google ADK	Provider-specific, hard to migrate away from
Medium	Letta, Semantic Kernel, AutoGen	Optimized for specific providers/ecosystems
Low	Pydantic AI, Vercel AI SDK, Mastra, LangChain	Clean abstractions, swap providers without rewriting

How I evaluated these

For each framework I read the docs, looked at source code, ran examples, and read the GitHub issues. The issues are often more honest than the docs; you learn a lot about a framework from what people complain about. I checked how active the maintainers were, how they handled breaking changes, and formed an opinion.

For the Tier 1 analyses I went deeper: architecture decomposition, tracing design decisions, and deliberately trying things that should break to see how they fail.

The dimensions I applied consistently:

Architecture and how the pieces fit together
Context management: what's built in, what you're building yourself
Tool system flexibility
Multi-agent support and what it actually looks like when you use it (not what the marketing page says)
Memory: short-term, long-term, or "we'll add that later"
Developer experience: time to first working thing, time to first confusing error
Production readiness: observability, error handling, retries
Lock-in and how painful switching would be

License

CC BY-NC 4.0 — Share and adapt with attribution, non-commercial use only.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
tier-1		tier-1
tier-2		tier-2
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
comparisons.md		comparisons.md
synthesis.md		synthesis.md
tier-3.md		tier-3.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Agent Framework Analysis

What this is

A note on how this was made

Quick Navigation

Tier 1

Tier 2

Tier 3

Synthesis & Comparisons

What I found

Nobody is doing context engineering well

The ecosystem is consolidating

Graph orchestration won

Memory is still mostly "good luck"

Type safety turns out to matter

If you just want a recommendation

Context Engineering Pattern Coverage

Architecture Patterns

Lock-in Risk

How I evaluated these

License

About

Uh oh!

Releases

Packages

License

larsderidder/framework-analysis

Folders and files

Latest commit

History

Repository files navigation

AI Agent Framework Analysis

What this is

A note on how this was made

Quick Navigation

Tier 1

Tier 2

Tier 3

Synthesis & Comparisons

What I found

Nobody is doing context engineering well

The ecosystem is consolidating

Graph orchestration won

Memory is still mostly "good luck"

Type safety turns out to matter

If you just want a recommendation

Context Engineering Pattern Coverage

Architecture Patterns

Lock-in Risk

How I evaluated these

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages