# Microsoft Agent Framework + Llama Stack Integration

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/notebooks/microsoft_agent_framework/microsoft_agent_framework_llama_stack_integration.ipynb)

## Overview

This notebook demonstrates how to use **Microsoft Agent Framework** (successor to AutoGen) with **Llama Stack** as the backend.

> **Note:** This notebook uses Microsoft Agent Framework, which replaces AutoGen. For the migration guide, see: [Microsoft Agent Framework Migration Guide](https://learn.microsoft.com/en-us/agent-framework/migration-guide/from-autogen/)

### Use Cases Covered:
1. **Simple ChatAgent** - Single agent task execution
2. **Sequential Workflow** - Round-robin multi-agent collaboration
3. **AgentThread** - Stateful multi-turn conversations
4. **Custom Workflow** - Data-flow with executors and feedback loops
5. **Concurrent Workflow** - Parallel agent processing

---

## Prerequisites

```bash
# Install Microsoft Agent Framework
pip install agent-framework

# Llama Stack should already be running
# Default: http://localhost:8321
```

**Migration Note:** If you're migrating from AutoGen, the main changes are:
- Package: `autogen-*` ‚Üí `agent-framework`
- Client: `OpenAIChatCompletionClient` ‚Üí `OpenAIResponsesClient` or `OpenAIChatClient`
- Client: `AzureOpenAIChatCompletionClient` ‚Üí `AzureOpenAIResponsesClient` or `AzureOpenAIChatClient`
- Agent: `AssistantAgent` ‚Üí `ChatAgent`
- Team: `RoundRobinGroupChat` ‚Üí `SequentialBuilder`

In [1]:
# Imports for Microsoft Agent Framework
import os
import asyncio
from agent_framework import ChatAgent
from agent_framework.openai import OpenAIResponsesClient

print("‚úÖ Microsoft Agent Framework imports successful")
print("Using Microsoft Agent Framework (successor to AutoGen)")

# Check Llama Stack connectivity
import httpx

LLAMA_STACK_URL = "http://localhost:8321"

try:
    response = httpx.get(f"{LLAMA_STACK_URL}/v1/models")
    print(f"‚úÖ Llama Stack is running at {LLAMA_STACK_URL}")
    print(f"Status: {response.status_code}")
except Exception as e:
    print(f"‚ùå Llama Stack not accessible: {e}")
    print("Make sure Llama Stack is running on port 8321")

‚úÖ Microsoft Agent Framework imports successful
Using Microsoft Agent Framework (successor to AutoGen)
‚úÖ Llama Stack is running at http://localhost:8321
Status: 200


## Configuration: Microsoft Agent Framework with Llama Stack

### How It Works

Microsoft Agent Framework uses **OpenAIResponsesClient** to connect to OpenAI-compatible servers like Llama Stack.

**Key Changes from AutoGen:**
- `OpenAIChatCompletionClient` ‚Üí `OpenAIResponsesClient`
- Team-based architecture (similar to AutoGen v0.7.5)
- Async/await pattern for running tasks

In [2]:
# Create OpenAI Responses Client for Llama Stack
# Uses the /responses API (specialized for reasoning models)
chat_client = OpenAIResponsesClient(
    model_id="ollama/llama3.3:70b",  # Choose any other model of your choice
    api_key="not-needed",
    base_url="http://localhost:8321/v1"  # Llama Stack OpenAI-compatible endpoint
)

print("‚úÖ Model client configured for Llama Stack")
print(f"Model: ollama/llama3.3:70b")
print(f"Base URL: http://localhost:8321/v1")
print(f"Client type: OpenAIResponsesClient")

‚úÖ Model client configured for Llama Stack
Model: ollama/llama3.3:70b
Base URL: http://localhost:8321/v1
Client type: OpenAIResponsesClient


## Example 1: Simple Task with ChatAgent

### Pattern: Single Agent Task

Microsoft Agent Framework uses **ChatAgent** to create AI assistants powered by your model.

**ChatAgent Features:**
- Multi-turn by default (keeps calling tools until complete)
- Stateless (use `AgentThread` for conversation history)
- Configured with `instructions` (replaces AutoGen's `system_message`)
- Can be created directly or via client factory method

### Use Case: Solve a Math Problem

In [3]:
import asyncio

# Method 1: Direct creation
assistant = ChatAgent(
    name="MathAssistant",
    chat_client=chat_client,
    instructions="You are a helpful AI assistant that solves math problems. Provide clear explanations and show your work."
)

print("‚úÖ Agent created:", assistant.name)

# Define the task
task = "What is the sum of the first 10 prime numbers? Please calculate it step by step."

# Run the task (Agent Framework uses async)
# Note: ChatAgent is stateless - no conversation history between calls
result = await assistant.run(task)

print("\n" + "="*50)
print("Task Result:")
print(result.text if result.text else "No response")
print("="*50)

‚úÖ Agent created: MathAssistant

Task Result:
To find the sum of the first 10 prime numbers, we need to follow these steps:

**Step 1: Identify the first 10 prime numbers**

A prime number is a positive integer that is divisible only by itself and 1. We will list out the first few prime numbers until we have 10:
2, 3, 5, 7, 11, 13, 17, 19, 23, 29

These are the first 10 prime numbers.

**Step 2: Add up the prime numbers**

Now, we simply need to add these numbers together:
2 + 3 = 5
5 + 5 = 10
10 + 7 = 17
17 + 11 = 28
28 + 13 = 41
41 + 17 = 58
58 + 19 = 77
77 + 23 = 100
100 + 29 = 129

Therefore, the sum of the first 10 prime numbers is **129**.

So, to summarize:
The first 10 prime numbers are: 2, 3, 5, 7, 11, 13, 17, 19, 23, and 29.
Their sum is: 2 + 3 + 5 + 7 + 11 + 13 + 17 + 19 + 23 + 29 = **129**.


## Example 2: Multi-Agent Team Collaboration

### Pattern: Sequential Workflow (Round-Robin Style)

Agent Framework uses **SequentialBuilder** to create workflows where agents take turns.

**Key Concepts:**
- `SequentialBuilder`: Agents process messages sequentially
- Shared conversation history across all agents
- Each agent sees all previous messages

### Use Case: Write a Technical Blog Post

In [4]:
from agent_framework import SequentialBuilder, WorkflowOutputEvent

# Create specialist agents with very strict role separation
researcher = ChatAgent(
    name="Researcher",
    chat_client=chat_client,
    instructions="""You are a researcher. Your ONLY job is to gather facts, statistics, and key information.
    
    DO:
    - Provide bullet points of facts and key information
    - Include relevant statistics if available
    - Keep it concise (50-100 words max)
    
    DO NOT:
    - Write full paragraphs or blog posts
    - Act as a writer or editor
    - Provide any writing beyond factual bullet points
    
    End your response by saying: "Research complete. Passing to Writer."
    """
)

writer = ChatAgent(
    name="Writer",
    chat_client=chat_client,
    instructions="""You are a technical writer. Your ONLY job is to take research and write a blog post.
    
    DO:
    - Use the research provided by the Researcher
    - Write a clear, engaging 200-word blog post
    - Use proper formatting (headers, paragraphs)
    - Focus on benefits and value
    
    DO NOT:
    - Do research yourself
    - Review or critique your own work
    - Act as an editor or critic
    
    End your response by saying: "Draft complete. Passing to Critic."
    """
)

critic = ChatAgent(
    name="Critic",
    chat_client=chat_client,
    instructions="""You are an editor and critic. Your ONLY job is to review the blog post written by the Writer.
    
    DO:
    - Review the blog post for clarity, accuracy, and engagement
    - Provide 3-5 specific, constructive suggestions for improvement
    - Comment on structure, tone, and effectiveness
    - Be constructive but honest
    
    DO NOT:
    - Rewrite the blog post yourself
    - Do research or writing
    - Say "looks good" without providing specific feedback
    
    Provide your review in this format:
    **Review:**
    1. [Suggestion 1]
    2. [Suggestion 2]
    3. [Suggestion 3]
    """
)

print("‚úÖ Team agents created: Researcher, Writer, Critic")

# Create a sequential workflow (round-robin collaboration)
workflow = SequentialBuilder().participants([researcher, writer, critic]).build()

# Simpler task that doesn't list all the steps (to avoid confusion)
task = """Write a 200-word blog post about the benefits of using Llama Stack for LLM applications.

Topic: Benefits of Llama Stack for LLM applications
Target length: 200 words
Audience: Developers and technical decision-makers
"""

print("\n" + "="*50)
print("Running Sequential Workflow:")
print("="*50)

# Run the workflow and display results with agent names
async for event in workflow.run_stream(task):
    if isinstance(event, WorkflowOutputEvent):
        conversation_history = event.data
        
        # Map assistant messages to agent names using position
        agent_names = ["Researcher", "Writer", "Critic"]
        turn = 1
        assistant_count = 0
        
        for msg in conversation_history:
            # Normalize role to string for comparison (msg.role is a Role enum)
            role_str = str(msg.role).lower().strip()
            
            # Determine the speaker label
            if role_str == "user":
                speaker = "user"
            elif role_str == "assistant":
                if assistant_count < len(agent_names):
                    speaker = f"assistant - {agent_names[assistant_count]}"
                else:
                    speaker = "assistant"
                assistant_count += 1
            else:
                speaker = str(msg.role)
            
            # Display the message
            print(f"\nTurn {turn} [{speaker}]:")
            print(msg.text[:1000] + "..." if len(msg.text or "") > 1000 else msg.text)
            turn += 1


‚úÖ Team agents created: Researcher, Writer, Critic

Running Sequential Workflow:

Turn 1 [user]:
Write a 200-word blog post about the benefits of using Llama Stack for LLM applications.

Topic: Benefits of Llama Stack for LLM applications
Target length: 200 words
Audience: Developers and technical decision-makers


Turn 2 [assistant - Researcher]:
* Llama Stack is an open-source framework for building LLM applications
* Improves model performance with optimized algorithms and hyperparameters
* Supports multiple frameworks, including PyTorch and TensorFlow
* Reduces development time by up to 30% with pre-built components
* Enhances scalability with distributed training capabilities
* Provides seamless integration with popular libraries and tools

Research complete. Passing to Writer.

Turn 3 [assistant - Writer]:
 

### Introduction to Llama Stack 
The Llama Stack is designed to simplify the development of LLM applications, providing numerous benefits for developers and technical decis

## Example 3: Multi-Turn Conversations with AgentThread

### Pattern: Stateful Conversations

Unlike AutoGen, `ChatAgent` is **stateless by default**. To maintain conversation history across multiple interactions, use **AgentThread**.

**AgentThread Features:**
- Stores conversation history
- Allows context to carry across multiple `agent.run()` calls

### Use Case: Interactive Analysis

In [5]:
# Create an analyst agent
analyst = ChatAgent(
    name="TechAnalyst",
    chat_client=chat_client,
    instructions="""You are a technical analyst. Analyze technical topics deeply:
    1. Break down complex concepts
    2. Identify pros and cons
    3. Provide recommendations
    """
)

print("‚úÖ Analyst agent created")

# Create a new thread to maintain conversation state
thread = analyst.get_new_thread()

print("\n" + "="*50)
print("Multi-Turn Conversation with Thread:")
print("="*50)

# First interaction
result1 = await analyst.run(
    "Analyze the trade-offs between using local LLMs versus cloud-based APIs.",
    thread=thread
)
print("\n[Turn 1 - Initial Analysis]:")
print(result1.text[:400] + "..." if len(result1.text or "") > 400 else result1.text)

# Second interaction - builds on previous context
result2 = await analyst.run(
    "What about cost implications specifically?",
    thread=thread
)
print("\n[Turn 2 - Follow-up on Cost]:")
print(result2.text[:400] + "..." if len(result2.text or "") > 400 else result2.text)

# Third interaction - continues the conversation
result3 = await analyst.run(
    "Summarize your recommendation in one sentence.",
    thread=thread
)
print("\n[Turn 3 - Summary]:")
print(result3.text)

print("\n" + "="*50)
print(f"Thread maintained context across {3} turns")
print("="*50)

‚úÖ Analyst agent created

Multi-Turn Conversation with Thread:

[Turn 1 - Initial Analysis]:
**Introduction to Local LLMs vs Cloud-Based APIs**

The rise of Large Language Models (LLMs) has revolutionized natural language processing, offering unparalleled capabilities in text generation, comprehension, and analysis. Users can access these powerful models via two primary avenues: local deployment or cloud-based Application Programming Interfaces (APIs). Each approach presents distinct adva...

[Turn 2 - Follow-up on Cost]:
**Cost Implications: Local LLMs vs Cloud-Based APIs**

When evaluating the cost implications of local LLMs versus cloud-based APIs, several factors come into play. These include initial investment, ongoing expenses, scalability costs, and potential savings. Each approach has distinct cost characteristics that can significantly impact an organization's budget and ROI (Return on Investment).

### **...

[Turn 3 - Summary]:
I recommend choosing between local LLM deploym

## Example 4: Advanced Workflow with Custom Executors

### Pattern: Data-Flow Workflow with Code Review Loop

Agent Framework's **Workflow** enables complex orchestration using executors and edges.
Unlike AutoGen's event-driven model, workflows use **data-flow** architecture.

**Key Concepts:**
- `Executor`: Processing units (agents, functions, or sub-workflows)
- `WorkflowBuilder`: Build typed data-flow graphs
- `@executor` decorator: Define custom processing logic
- Edges route messages between executors

### Use Case: Iterative Code Review Until Approved

In [6]:
from agent_framework import WorkflowBuilder, executor, WorkflowContext, WorkflowOutputEvent
from typing_extensions import Never

# Create code review agents with better instructions
code_developer = ChatAgent(
    name="Developer",
    chat_client=chat_client,
    instructions="""You are a developer. When you receive code review feedback:
    - Address ALL issues mentioned
    - Explain your changes briefly
    - Present ONLY the improved code (no extra commentary)

    If no feedback is given, present your initial implementation."""
)

code_reviewer = ChatAgent(
    name="CodeReviewer",
    chat_client=chat_client,
    instructions="""You are a senior code reviewer. Review code for bugs, performance, security, and best practices.

    CRITICAL: Your response MUST start with one of these:

    If code is production-ready:
    "APPROVED: [brief reason why it's good]"

    If code needs changes:
    "NEEDS REVISION: [list specific issues to fix]"

    DO NOT provide fixed code examples.
    DO NOT say LGTM or APPROVED unless the code is truly ready.
    Be constructive but strict."""
)

print("‚úÖ Code review team created with strict instructions")

# Track iterations
review_state = {"iteration": 0, "max_iterations": 4}

# Define custom executors for workflow
@executor(id="developer")
async def developer_executor(task: str, ctx: WorkflowContext[str]) -> None:
    """Developer creates or improves code based on input."""
    review_state["iteration"] += 1
    print(f"\n [Developer - Iteration {review_state['iteration']}]")

    result = await code_developer.run(task)
    print(f"   Code submitted for review (preview): {result.text[:80]}...")
    await ctx.send_message(result.text)

@executor(id="reviewer")
async def reviewer_executor(code: str, ctx: WorkflowContext[str, str]) -> None:
    """Reviewer checks code and either approves or requests changes."""
    print(f"\n [Reviewer - Iteration {review_state['iteration']}]")

    result = await code_reviewer.run(f"Review this code:\n\n{code}")

    # Smart approval detection - check the START of the response
    response_start = result.text[:100].upper()  # First 100 chars only
    is_approved = response_start.startswith("APPROVED")
    needs_revision = "NEEDS REVISION" in response_start

    print(f"   Decision: {'‚úÖ APPROVED' if is_approved else '‚ùå NEEDS REVISION' if needs_revision else '‚ö†Ô∏è UNCLEAR'}")

    if is_approved:
        # Code approved! Output final result
        await ctx.yield_output(
            f"‚úÖ APPROVED after {review_state['iteration']} iteration(s)\n\n"
            f"üìù Review Comments:\n{result.text}\n\n"
            f"üíª Final Code:\n{code}"
        )
    elif review_state["iteration"] >= review_state["max_iterations"]:
        # Hit max iterations - force stop
        await ctx.yield_output(
            f"‚ö†Ô∏è MAX ITERATIONS REACHED ({review_state['max_iterations']})\n\n"
            f"üìù Last Review:\n{result.text}\n\n"
            f"üíª Last Code:\n{code}"
        )
    else:
        # Send feedback back to developer for revision
        print(f"   Sending feedback to developer...")
        await ctx.send_message(
            f"FEEDBACK FROM REVIEWER:\n{result.text}\n\nPrevious code:\n{code}",
            target_id="developer"
        )

# Build workflow: developer ‚Üí reviewer (with feedback loop)
workflow = (
    WorkflowBuilder()
    .add_edge(developer_executor, reviewer_executor)
    .add_edge(reviewer_executor, developer_executor)  # Feedback loop
    .set_start_executor(developer_executor)
    .build()
)

# Use a task that's more likely to need multiple iterations
task = """Implement a Python function to validate email addresses with these requirements:
- Must have @ symbol
- Must have domain with at least one dot
- No spaces allowed
- Handle edge cases
- Include basic error handling
Keep it simple but correct."""

print("\n" + "="*60)
print("Code Review Workflow (with iteration tracking):")
print("="*60)

# Reset state
review_state["iteration"] = 0

# Run workflow with streaming
async for event in workflow.run_stream(task):
    if isinstance(event, WorkflowOutputEvent):
        print("\n" + "="*60)
        print("FINAL RESULT:")
        print("="*60)
        print(event.data)

‚úÖ Code review team created with strict instructions

Code Review Workflow (with iteration tracking):

 [Developer - Iteration 1]
   Code submitted for review (preview): ```python
import re

def validate_email(email: str) -> bool:
    """
    Validat...

 [Reviewer - Iteration 1]
   Decision: ‚ùå NEEDS REVISION
   Sending feedback to developer...

 [Developer - Iteration 2]
   Code submitted for review (preview): ```python
import re

# Define a regular expression pattern for email validation ...

 [Reviewer - Iteration 2]
   Decision: ‚ùå NEEDS REVISION
   Sending feedback to developer...

 [Developer - Iteration 3]
   Code submitted for review (preview): ```python
import re

# Define a regular expression pattern for email validation ...

 [Reviewer - Iteration 3]
   Decision: ‚ùå NEEDS REVISION
   Sending feedback to developer...

 [Developer - Iteration 4]
   Code submitted for review (preview): ```python
import re
import logging

# Define constants for email validation
EMAI...

 [R

## Example 5: Concurrent Workflow Pattern

### Pattern: Parallel Processing

Agent Framework's **ConcurrentBuilder** enables parallel agent execution.
All agents process the input simultaneously and results are aggregated.

### Use Case: Multi-Perspective Analysis

In [7]:
from agent_framework import ConcurrentBuilder, WorkflowOutputEvent

# Create specialized analysts
technical_analyst = ChatAgent(
    name="TechnicalAnalyst",
    chat_client=chat_client,
    instructions="You analyze technical feasibility and implementation complexity."
)

business_analyst = ChatAgent(
    name="BusinessAnalyst",
    chat_client=chat_client,
    instructions="You analyze business value, ROI, and market impact."
)

security_analyst = ChatAgent(
    name="SecurityAnalyst",
    chat_client=chat_client,
    instructions="You analyze security implications, risks, and compliance."
)

print("‚úÖ Analyst team created: Technical, Business, Security")

# Create concurrent workflow - all agents process in parallel
workflow = (
    ConcurrentBuilder()
    .participants([technical_analyst, business_analyst, security_analyst])
    .build()
)

# task = "Evaluate the proposal to deploy Llama Stack for our customer service chatbot."
task = "Evaluate the proposal to deploy a customer service chatbot."

print("\n" + "="*50)
print("Concurrent Analysis (Parallel Processing):")
print("="*50)

# Run workflow - agents work in parallel
async for event in workflow.run_stream(task):
    if isinstance(event, WorkflowOutputEvent):
        # Combined results from all agents
        results = event.data
        for i, result in enumerate(results, 1):
            print(f"\n[Analysis {i}]:")
            print(result.text[:1000] + "..." if len(result.text or "") > 1000 else result.text)
            print("-" * 50)

print("\n" + "="*50)
print("All agents completed in parallel")
print("="*50)

‚úÖ Analyst team created: Technical, Business, Security

Concurrent Analysis (Parallel Processing):

[Analysis 1]:
Evaluate the proposal to deploy a customer service chatbot.
--------------------------------------------------

[Analysis 2]:
**Proposal Evaluation: Customer Service Chatbot Deployment**

**Introduction:**
The proposed project aims to deploy a customer service chatbot to enhance the user experience, reduce support queries, and increase efficiency. This evaluation assesses the technical feasibility and implementation complexity of the proposal.

**Technical Feasibility:**

1. **Platform Compatibility:** The chatbot can be integrated with various platforms, including websites, mobile apps, and social media messaging services.
2. **Natural Language Processing (NLP):** The proposed NLP engine is capable of understanding and processing human language, allowing for effective conversation flow.
3. **Integration with Existing Systems:** The chatbot can be integrated with the compa