Skip to content

Analysis of 44 AI agent frameworks through a context engineering lens. Feb 2026.

License

Notifications You must be signed in to change notification settings

larsderidder/framework-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Agent Framework Analysis

I got tired of answering "which agent framework should we use?" with "it depends" and then spending an hour qualifying that, so I went through 44 of them and wrote down what I found. Saved me from ever having that conversation again, hopefully. Probably not.

February 2026. This is a snapshot. The field moves fast enough that some of this will be wrong by the time you read it. Check the dates on individual files and adjust your expectations accordingly.


What this is

Not an "awesome list". Those exist, they're fine for discovery, but they don't help you choose. This is more like: I looked at each framework in enough depth to form an opinion, and then I wrote the opinion down. With some kind of evidence, usually.

I split them up into three tiers based on how deep I went:

  • Tier 1 (11 frameworks): 3,000+ word analyses. Architecture, context handling, tradeoffs, failure modes, code examples. The stuff you'd actually need to make a decision.
  • Tier 2 (16 frameworks): 1,000–1,500 words. The important bits, when to use it, when not to.
  • Tier 3 (17 frameworks): 100–200 words. What it is, whether it matters.

One consistent dimension across all of them: context engineering. How much does this framework actually help you manage what goes into the model's context window? The answer is almost always "less than you'd hope," but the specifics differ in interesting ways. If context engineering as a discipline is what you're after, that's what contextpatterns.com is for.

A note on how this was made

The detailed analysis files (architecture breakdowns, code examples, comparison tables) were produced with heavy AI assistance. I directed the research, verified the findings, and used the frameworks myself, but the structured reference material was not written by hand, and it reads like it. The editorial voice lives in this README, the synthesis, and the quoted notes at the top of each analysis file. The rest is research output that I've reviewed in varying detail for accuracy but not rewritten for personality. Seemed more honest to say that than to pretend otherwise, it's not like you can't tell anyway. Celebrate the emdashes!


Quick Navigation

Tier 1

Framework Best For Key Differentiator
LangChain / LangGraph Production agents, large ecosystem 127K stars, 600+ integrations, graph orchestration
CrewAI Multi-agent collaboration Role-based agents, Flows for deterministic control
AutoGen (Microsoft) Enterprise, Microsoft stack Dual API (AgentChat + Core), distributed runtimes
Letta (MemGPT) Long-running conversations Hierarchical memory: core / recall / archival
Vercel AI SDK TypeScript / web apps Best-in-class streaming, React hooks
Pydantic AI Type-safe Python agents Full Pydantic v2 validation, Logfire observability
OpenAI Agents SDK OpenAI ecosystem Handoffs, guardrails, fastest time-to-agent
Anthropic / Claude Code Computer use, coding agents Pattern-based (not a framework), 200K context
Mastra TypeScript agents with memory Working memory + semantic recall, Next.js native
Haystack (deepset) Enterprise-scale RAG Pipeline architecture, Elasticsearch integration
Google ADK / Genkit Google Cloud / Gemini A2A protocol, Vertex AI, built-in tracing

Tier 2

Framework Focus
AG2 AutoGen community fork
Agno Lightweight Python agents (formerly Phi)
AutoGen Studio Visual no-code AutoGen UI
AutoGPT Early autonomous agent, mostly historical
DSPy Prompt optimization via programming
Instructor Structured LLM output (Pydantic wrapper)
LangFlow Visual LangChain pipeline builder
LlamaIndex RAG and data retrieval
Mirascope Lightweight, type-safe Python
n8n Workflow automation with AI nodes
OpenHands Open-source coding agent (formerly OpenDevin)
Open Interpreter Local code execution agent
Phidata Production agents with built-in memory
Pi Minimal context-focused coding agent
Semantic Kernel Microsoft enterprise, C# / Python
Smolagents HuggingFace, local models, code execution

Tier 3

BeeAgent, Atomic Agents, ControlFlow, Julep, E2B, Composio, Browser Use, Crawl4AI, Dify, Flowise, MetaGPT, BabyAGI, SuperAGI, AgentGPT, Strands (AWS), Swarms, TaskWeaver

Synthesis & Comparisons

File What It Contains
comparisons.md 5 comparison matrices: features, context engineering, provider lock-in, production readiness, DX
synthesis.md Cross-framework patterns, ecosystem trends, consolidation analysis, practitioner recommendations

What I found

Nobody is doing context engineering well

This was the big one. Every framework will tell you how many tokens are in your context window. None of them will tell you whether those tokens are actually helping. No quality monitoring, no proactive compression when context starts degrading, no real isolation between sub-agents. You get a number, and figuring out what to do with it is your problem.

I was surprised at first, but after going through twenty-something frameworks it started making sense. Context management is genuinely hard, and most of these projects are already struggling with orchestration, memory, and tooling. Context quality is apparently next year's problem. Every year.

The ecosystem is consolidating

A lot of these frameworks exist. Not all of them will in a year. Four are pulling ahead far enough that the gap is already meaningful: LangChain/LangGraph on ecosystem breadth, CrewAI for multi-agent, Vercel AI SDK for TypeScript/web, Pydantic AI for type-safe Python. Everything else is either very niche, early, or quietly losing contributors.

AutoGen is the interesting case. Microsoft seems to be heading in a slightly different direction with each release, which is not a great sign if you're trying to build on top of it. Worth watching, wouldn't build anything critical on it right now.

Graph orchestration won

Even frameworks that didn't start with graph-based execution are adding it now. LangGraph pushed this pattern into the mainstream, and it's become the expected baseline. If a framework doesn't support graph-based control flow in some form, that's a meaningful limitation.

Memory is still mostly "good luck"

Most frameworks treat memory as a feature you bolt on. Letta is the only one where memory is load-bearing architecture; it's the whole point. Everywhere else, if you need your agent to remember something across sessions, you're building it yourself with a vector store and duct tape. Probably more fragile than you'd like.

Type safety turns out to matter

Pydantic AI grew quickly, and I think it's because type errors in agent systems are genuinely miserable to debug at runtime. You're chasing phantom bugs through LLM output that looked right but wasn't, and you don't find out until three tool calls later. Frameworks that catch this earlier save real time. Not everyone needs it, but once you've been burned by it in production, it starts feeling a lot less optional.


If you just want a recommendation

Use Case Primary Runner-up Key Tradeoff
Production chatbot with memory Pydantic AI + Letta LangChain + LangGraph Type safety vs. ecosystem breadth
Multi-agent research system CrewAI LangGraph Autonomy vs. control
Enterprise workflow automation LangChain + LangSmith Google ADK (if GCP) Vendor support vs. cloud lock-in
TypeScript / Next.js project Vercel AI SDK Mastra Streaming/UI focus vs. orchestration power
Quick prototype OpenAI Agents SDK Vercel AI SDK Speed vs. flexibility
Code generation agent Claude Code (patterns) Smolagents Product polish vs. customization
Enterprise RAG pipeline Haystack LlamaIndex Pipeline power vs. community size

Context Engineering Pattern Coverage

Each Tier 1 analysis maps the framework against 8 context engineering patterns. Here's where the ecosystem stands:

Pattern Frameworks with strong support
Select, Don't Dump Haystack, LlamaIndex, LangChain (via retrievers)
Write Outside the Window Letta, Mastra, LangChain, AutoGen
Compress & Restart Letta (automatic), LangChain (ConversationSummaryMemory)
Recursive Delegation CrewAI, LangGraph, OpenAI Agents SDK (handoffs)
Progressive Disclosure Letta, Mastra, Haystack
Isolate LangGraph (subgraphs), limited elsewhere
The Pyramid LangChain (PromptTemplate), manual elsewhere
Context Rot Awareness Nobody. Universal gap.

That last row is the telling one.


Architecture Patterns

Pattern Frameworks Best For Watch Out For
Graph-based LangGraph, Mastra, Google ADK Production reliability, audit trails Upfront graph design complexity
Multi-agent / role-based CrewAI, OpenAI Agents SDK Complex tasks, collaboration Debugging opacity
Memory-first Letta Long-running agents, assistants Architectural complexity
Pipeline-based Haystack, LlamaIndex RAG, knowledge retrieval Not agent-native
Pattern-based (no framework) Anthropic / Claude Code Full control, custom architectures You build everything
Tool-first / code execution Smolagents, Open Interpreter Local models, automation Security (obviously)

Lock-in Risk

Risk Frameworks Why
High Claude Code, Google ADK Provider-specific, hard to migrate away from
Medium Letta, Semantic Kernel, AutoGen Optimized for specific providers/ecosystems
Low Pydantic AI, Vercel AI SDK, Mastra, LangChain Clean abstractions, swap providers without rewriting

How I evaluated these

For each framework I read the docs, looked at source code, ran examples, and read the GitHub issues. The issues are often more honest than the docs; you learn a lot about a framework from what people complain about. I checked how active the maintainers were, how they handled breaking changes, and formed an opinion.

For the Tier 1 analyses I went deeper: architecture decomposition, tracing design decisions, and deliberately trying things that should break to see how they fail.

The dimensions I applied consistently:

  • Architecture and how the pieces fit together
  • Context management: what's built in, what you're building yourself
  • Tool system flexibility
  • Multi-agent support and what it actually looks like when you use it (not what the marketing page says)
  • Memory: short-term, long-term, or "we'll add that later"
  • Developer experience: time to first working thing, time to first confusing error
  • Production readiness: observability, error handling, retries
  • Lock-in and how painful switching would be

License

CC BY-NC 4.0 — Share and adapt with attribution, non-commercial use only.

About

Analysis of 44 AI agent frameworks through a context engineering lens. Feb 2026.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published