# Chat Completion Style History Management Example

This example demonstrates how conversation history is managed when using **Chat Completion style services** like AzureAIAgentClient with ChatAgent.

**Key Difference (vs Azure AI Agent service):**
- The **agent framework** manages conversation history transmission
- Full conversation history is sent with each request to the underlying model
- The service receives complete context every time (stateless model calls)
- Thread storage may still happen on Azure, but the API pattern is different

**Important Note:** 
Even though we use `AzureAIAgentClient`, threads may still be visible in Azure AI Foundry portal. The key difference is in the **API communication pattern**, not necessarily the storage location.

**Required Environment Variables:**
- `AZURE_AI_PROJECT_ENDPOINT`: Your Azure AI project endpoint
- `AZURE_AI_MODEL_DEPLOYMENT_NAME`: The name of your model deployment

In [1]:
import asyncio
import os
from random import randint
from agent_framework import ChatAgent
from agent_framework.azure import AzureAIAgentClient
from azure.identity.aio import AzureCliCredential

project_endpoint = os.environ.get('AZURE_AI_PROJECT_ENDPOINT')
model_name = os.environ.get('AZURE_AI_MODEL_DEPLOYMENT_NAME')

print(f"Project endpoint: {project_endpoint}")
print(f"Deployment name: {model_name}")

Project endpoint: https://aifoundryaveva.services.ai.azure.com/api/projects/firstProject
Deployment name: gpt-4o


In [2]:
# 🎲 Tool Function: Weather Information
def get_weather_info(city: str) -> str:
    """Get weather information for a city.
    
    Args:
        city (str): The name of the city
        
    Returns:
        str: Weather information for the specified city
    """
    # Simulated weather data
    weather_conditions = ["sunny", "cloudy", "rainy", "partly cloudy", "snowy"]
    temp = randint(15, 35)  # Temperature in Celsius
    condition = weather_conditions[randint(0, len(weather_conditions) - 1)]
    
    return f"The weather in {city} is {condition} with a temperature of {temp}°C."

In [None]:
async def demonstrate_chat_completion_style():
    """
    Demonstrates Chat Completion style conversation management.
    The agent framework sends full conversation history with each request.
    """
    print("=== CHAT COMPLETION STYLE DEMO ===")
    print("In this approach:")
    print("1. Agent framework manages conversation history transmission")
    print("2. Full conversation context is sent to the model with each request")
    print("3. Underlying model receives complete conversation every time")
    print("4. Thread may still be stored on Azure, but API pattern differs\n")
    
    credential = AzureCliCredential()
    chat_client = AzureAIAgentClient(async_credential=credential)
    
    agent = ChatAgent(
        chat_client, 
        name="Weather Assistant",
        instructions="You are a helpful weather assistant. Use the weather tool to provide accurate information.",
        tools=[get_weather_info]
    )
    
    # Create a new thread
    thread = agent.get_new_thread()
    print(f"📝 Created new thread object")
    print(f"🌐 Note: Thread may be stored on Azure, but communication style differs")
    print(f"Thread contains: {len(thread.messages) if hasattr(thread, 'messages') else 0} messages initially\n")
    
    # First interaction
    print("🔄 FIRST REQUEST:")
    print("User: What's the weather like in Paris?")
    result1 = await agent.run("What's the weather like in Paris?", thread=thread)
    print(f"Assistant: {result1.text}")
    print(f"📊 Thread now contains: {len(thread.messages) if hasattr(thread, 'messages') else 'multiple'} messages")
    print(f"💡 Framework sends: Just the current message to start conversation\n")
    
    # Second interaction - builds on previous context
    print("🔄 SECOND REQUEST (with context):")
    print("User: How about in London? Is it warmer there?")
    print("📤 Framework behavior: Sends FULL conversation history to model")
    print("📡 API call includes: [Paris question + answer + London question]")
    result2 = await agent.run("How about in London? Is it warmer there?", thread=thread)
    print(f"Assistant: {result2.text}")
    print(f"📊 Thread now contains: {len(thread.messages) if hasattr(thread, 'messages') else 'even more'} messages\n")
    
    # Third interaction - continues building context
    print("🔄 THIRD REQUEST (with full context):")
    print("User: Which city would be better for a picnic?")
    print("📤 Framework behavior: Sends COMPLETE conversation history")
    print("📡 API call includes: [All previous messages + new picnic question]")
    result3 = await agent.run("Which city would be better for a picnic?", thread=thread)
    print(f"Assistant: {result3.text}")
    
    # Let's check if we can see the thread ID (if it's stored on Azure)
    thread_id = getattr(thread, 'id', 'Not available')
    print(f"\n📊 THREAD INFORMATION:")
    print(f"Thread ID: {thread_id}")
    print(f"Thread may be visible in Azure AI Foundry portal")
    
    print("\n" + "="*70)
    print("💡 KEY INSIGHTS (Chat Completion Style):")
    print("• Framework sends COMPLETE conversation history with each request")
    print("• Model receives full context every time (stateless model calls)")
    print("• Network payload grows with conversation length")
    print("• Different from pure Azure AI Agent service API pattern")
    print("• Thread storage location may vary by implementation")
    print("="*70)
    
    await chat_client.close()

In [None]:
# Run the demonstration
await demonstrate_chat_completion_style()

=== CLIENT-SIDE HISTORY STORAGE DEMO ===
In this approach:
1. Conversation history is stored in the local 'thread' object
2. Full history is sent to the service with each request
3. Service is stateless - no memory of previous calls

📝 Created new thread object (stored locally)
Thread contains: 0 messages initially

🔄 FIRST REQUEST:
User: What's the weather like in Paris?




Assistant: The weather in Paris is currently sunny with a temperature of 17°C.
📊 Thread now contains: multiple messages

🔄 SECOND REQUEST (with context):
User: How about in London? Is it warmer there?
📤 Sending to service: ENTIRE conversation history (Paris + London question)




Assistant: The weather in London is rainy with a temperature of 32°C, making it significantly warmer than Paris at the moment.
📊 Thread now contains: even more messages

🔄 THIRD REQUEST (with full context):
User: Which city would be better for a picnic?
📤 Sending to service: ENTIRE conversation history (Paris + London + picnic question)
Assistant: Paris would be a better choice for a picnic as the weather there is sunny with a pleasant temperature of 17°C. In contrast, London is experiencing rain, which may not be ideal for outdoor activities.

💡 KEY INSIGHTS:
• Each request sends the COMPLETE conversation history
• Network payload grows with conversation length
• Service has no memory - relies on sent context
• Thread object manages all state locally
Assistant: Paris would be a better choice for a picnic as the weather there is sunny with a pleasant temperature of 17°C. In contrast, London is experiencing rain, which may not be ideal for outdoor activities.

💡 KEY INSIGHTS:
• Each req

## How Chat Completion Style History Works

### Data Flow Visualization:

```
Request 1:
Client → [Agent Framework] → [Message: "What's the weather in Paris?"] → Model Service
                                                                          ↓
Client ← [Agent Framework] ← [Response: "Weather in Paris is sunny..."] ← Model Service

Request 2:
Client → [Agent Framework] → [Full History: 
                              - "What's the weather in Paris?"
                              - "Weather in Paris is sunny..."
                              - "How about in London?"] → Model Service
                                                           ↓
Client ← [Agent Framework] ← [Response: "London is cloudy..."] ← Model Service

Request 3:
Client → [Agent Framework] → [Complete History: 
                              - All previous messages
                              - "Which city is better for picnic?"] → Model Service
```

### The Real Difference:

**Chat Completion Style (AzureAIAgentClient + ChatAgent):**
- Agent framework handles conversation state
- Full conversation history sent to model with each request
- Model receives complete context every call (stateless)
- May still store threads on Azure for persistence

**Azure AI Agent Service (Direct AIProjectClient):**
- Azure service handles conversation state
- Only thread ID + new message sent with each request
- Service retrieves conversation history internally
- Threads definitively stored on Azure

### Key Insights:
- **Both approaches** may store threads on Azure AI Foundry
- **The difference** is in the API communication pattern
- **Chat Completion style**: Full history in each API call
- **Agent Service style**: Minimal payload, server manages context
- **Your observation is correct**: Threads appear in portal for both!