# Lab 3: Deploy Cohere-based Agents Locally

**Note**: Amazon Bedrock AgentCore Runtime is an AWS-specific service that requires AWS Bedrock models. Since we're using Cohere's LLM, we'll focus on local deployment and production-ready patterns for Cohere-based agents.

In this lab, you'll learn how to deploy the sophisticated multi-agent financial advisory system built in Lab 1 and Lab 2 (Cohere versions) for production use. While AWS AgentCore is designed specifically for AWS Bedrock models, we'll explore best practices for deploying Cohere-based agents in production environments.

## Deployment Options for Cohere Agents

1. **Local Development Server** - FastAPI/Flask with async streaming
2. **Docker Containerization** - Package your agents for cloud deployment
3. **Serverless Deployment** - AWS Lambda, Google Cloud Functions, Azure Functions
4. **Kubernetes** - Scalable orchestration for enterprise deployments
5. **Cloud Run / App Engine** - Managed container platforms

## What You Will Learn

You'll learn how to:
- Create production-ready FastAPI endpoints for Cohere agents
- Implement streaming responses for real-time interactions
- Handle authentication and rate limiting
- Containerize your application with Docker
- Deploy to cloud platforms
- Monitor and observe agent performance

In [None]:
# Install required dependencies directly
!pip install -q strands-agents[openai]==1.7.1 \
             strands-agents-tools==0.2.6 \
             openai==1.59.7 \
             yfinance==0.2.65 \
             matplotlib==3.10.6 \
             pandas==2.3.2 \
             pydantic==2.11.7

print("✓ All packages installed successfully!")

In [None]:
# Import FastAPI and deployment dependencies
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
import uvicorn
import asyncio
import json

### Step 1: Create Production-Ready FastAPI Application

In [None]:
%%writefile app.py
# Production-ready FastAPI application for Cohere-based agents

from fastapi import FastAPI, HTTPException, Header
from fastapi.responses import StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from typing import Optional
import os

from strands import Agent, tool
from strands.models.openai import OpenAIModel
from strands.agent.conversation_manager import SummarizingConversationManager

from budget_agent_cohere import FinancialReport, budget_agent
from financial_analysis_agent_cohere import financial_analysis_agent

# Initialize FastAPI app
app = FastAPI(
    title="Financial Advisory Multi-Agent System",
    description="Cohere-powered financial advisory system with budgeting and investment analysis",
    version="1.0.0"
)

# Add CORS middleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # Configure appropriately for production
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Request/Response models
class ChatRequest(BaseModel):
    prompt: str
    session_id: Optional[str] = None

class ChatResponse(BaseModel):
    response: str
    session_id: str

# Orchestrator configuration
ORCHESTRATOR_PROMPT = """You are a comprehensive financial advisor orchestrator that coordinates between specialized financial agents to provide complete financial guidance. 

Your specialized agents are:
1. **budget_agent**: Handles budgeting, spending analysis, savings recommendations, and expense tracking
2. **financial_analysis_agent_tool**: Handles investment analysis, stock research, portfolio creation, and performance comparisons

Guidelines for using your agents:
- Use **budget_agent** for questions about: budgets, spending habits, expense tracking, savings goals, debt management
- Use **financial_analysis_agent_tool** for questions about: stocks, investments, portfolios, market analysis, investment recommendations
- You can use both agents together for comprehensive financial planning
- Always provide a cohesive summary that combines insights from multiple agents when applicable
- Maintain a helpful, professional tone and include appropriate disclaimers about financial advice

When a user asks a question:
1. Determine which agent(s) are most appropriate
2. Call the relevant agent(s) with focused queries
3. Synthesize the responses into a coherent, comprehensive answer
4. Provide actionable next steps when possible"""

# Configure Cohere model
cohere_model = OpenAIModel(
    client_args={
        "api_key": os.environ.get("COHERE_API_KEY"),
        "base_url": "https://api.cohere.ai/compatibility/v1",
    },
    model_id="command-a-03-2025",
    params={
        "temperature": 0.0,
        "stream_options": None
    }
)

# Conversation manager
conversation_manager = SummarizingConversationManager(
    summary_ratio=0.3,
    preserve_recent_messages=5,
)

@tool
def budget_agent_tool(query: str) -> FinancialReport:
    """Generate structured financial reports with budget analysis and recommendations."""
    try:
        structured_response = budget_agent.structured_output(
            output_model=FinancialReport, prompt=query
        )
        return structured_response
    except Exception as e:
        return FinancialReport(
            monthly_income=0.0,
            budget_categories=[],
            recommendations=[f"Error generating report: {str(e)}"],
            financial_health_score=1,
        )

@tool
def financial_analysis_agent_tool(query: str) -> str:
    """Handle investment analysis queries including stock research, portfolio creation, and performance comparisons."""
    try:
        response = financial_analysis_agent(query)
        return str(response)
    except Exception as e:
        return f"❌ Financial analysis error: {str(e)}"

# Initialize orchestrator agent
orchestrator_agent = Agent(
    model=cohere_model,
    system_prompt=ORCHESTRATOR_PROMPT,
    tools=[budget_agent_tool, financial_analysis_agent_tool],
    conversation_manager=conversation_manager,
)

@app.get("/")
async def root():
    return {
        "message": "Financial Advisory Multi-Agent System API",
        "version": "1.0.0",
        "powered_by": "Cohere + Strands Agents"
    }

@app.get("/health")
async def health_check():
    return {"status": "healthy"}

@app.post("/chat")
async def chat(request: ChatRequest):
    """Non-streaming chat endpoint"""
    try:
        response = orchestrator_agent(request.prompt)
        return ChatResponse(
            response=str(response),
            session_id=request.session_id or "default"
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.post("/chat/stream")
async def chat_stream(request: ChatRequest):
    """Streaming chat endpoint for real-time responses"""
    async def generate():
        try:
            async for event in orchestrator_agent.stream_async(request.prompt):
                if "data" in event:
                    yield f"data: {event['data']}\n\n"
        except Exception as e:
            yield f"data: Error: {str(e)}\n\n"
    
    return StreamingResponse(
        generate(),
        media_type="text/event-stream"
    )

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8080)

### Step 2: Testing Locally

To test your Agent server locally:

1. **Set your Cohere API key**:
   ```bash
   export COHERE_API_KEY="your-api-key"
   ```

2. **Terminal 1**: Start the Agent server
   ```bash
   python app.py
   ```
   
3. **Terminal 2**: Test the endpoints
   
   **Health Check**:
   ```bash
   curl http://localhost:8080/health
   ```
   
   **Non-streaming Chat**:
   ```bash
   curl -X POST http://localhost:8080/chat \
      -H "Content-Type: application/json" \
      -d '{"prompt": "I make $5000/month, help me create a budget"}'
   ```
   
   **Streaming Chat**:
   ```bash
   curl -X POST http://localhost:8080/chat/stream \
      -H "Content-Type: application/json" \
      -d '{"prompt": "Compare Tesla and Apple stocks"}'
   ```

### Step 3: Dockerize Your Application

Create a Dockerfile for containerized deployment:

In [None]:
%%writefile Dockerfile
FROM python:3.10-slim

WORKDIR /app

# Copy requirements first for better caching
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
RUN pip install --no-cache-dir fastapi uvicorn python-multipart

# Copy application code
COPY budget_agent_cohere.py .
COPY financial_analysis_agent_cohere.py .
COPY app.py .

# Expose port
EXPOSE 8080

# Set environment variable
ENV PORT=8080

# Run the application
CMD ["python", "app.py"]

In [None]:
%%writefile .dockerignore
*.pyc
__pycache__
.git
.gitignore
*.ipynb
.env
venv/
*.md

### Step 4: Build and Run Docker Container

```bash
# Build the Docker image
docker build -t financial-advisor-cohere .

# Run the container
docker run -p 8080:8080 \
  -e COHERE_API_KEY="your-api-key" \
  financial-advisor-cohere
```

### Step 5: Cloud Deployment Options

In [None]:
import os
%%writefile deployment_guide.md
# Deployment Guide for Cohere-based Financial Advisor

## Option 1: Google Cloud Run

```bash
# Build and push to Google Container Registry
gcloud builds submit --tag gcr.io/YOUR_PROJECT_ID/financial-advisor

# Deploy to Cloud Run
gcloud run deploy financial-advisor \
  --image gcr.io/YOUR_PROJECT_ID/financial-advisor \
  --platform managed \
  --region us-central1 \
  --set-env-vars COHERE_API_KEY=your-api-key \
  --allow-unauthenticated
```

## Option 2: AWS ECS/Fargate

```bash
# Push to Amazon ECR
aws ecr create-repository --repository-name financial-advisor
docker tag financial-advisor-cohere:latest AWS_ACCOUNT.dkr.ecr.REGION.amazonaws.com/financial-advisor:latest
docker push AWS_ACCOUNT.dkr.ecr.REGION.amazonaws.com/financial-advisor:latest

# Deploy using ECS/Fargate (configure task definition with COHERE_API_KEY)
```

## Option 3: Azure Container Instances

```bash
# Push to Azure Container Registry
az acr create --resource-group myResourceGroup --name myContainerRegistry --sku Basic
docker tag financial-advisor-cohere myContainerRegistry.azurecr.io/financial-advisor:v1
docker push myContainerRegistry.azurecr.io/financial-advisor:v1

# Deploy to Container Instances
az container create \
  --resource-group myResourceGroup \
  --name financial-advisor \
  --image myContainerRegistry.azurecr.io/financial-advisor:v1 \
  --environment-variables COHERE_API_KEY=your-api-key \
  --dns-name-label financial-advisor \
  --ports 8080
```

## Option 4: Kubernetes (Any Cloud)

Create `k8s-deployment.yaml`:

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: financial-advisor
spec:
  replicas: 3
  selector:
    matchLabels:
      app: financial-advisor
  template:
    metadata:
      labels:
        app: financial-advisor
    spec:
      containers:
      - name: financial-advisor
        image: gcr.io/YOUR_PROJECT/financial-advisor:latest
        ports:
        - containerPort: 8080
        env:
        - name: COHERE_API_KEY
          valueFrom:
            secretKeyRef:
              name: cohere-secret
              key: api-key
---
apiVersion: v1
kind: Service
metadata:
  name: financial-advisor
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: financial-advisor
```

Deploy:
```bash
kubectl create secret generic cohere-secret --from-literal=api-key=YOUR_API_KEY
kubectl apply -f k8s-deployment.yaml
```

## Monitoring and Observability

### Add OpenTelemetry Instrumentation

```python
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.sdk.trace import TracerProvider

# In app.py
trace.set_tracer_provider(TracerProvider())
FastAPIInstrumentor.instrument_app(app)
```

### Add Logging

```python
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@app.post("/chat")
async def chat(request: ChatRequest):
    logger.info(f"Received chat request: {request.prompt[:50]}...")
    # ... rest of code
```

## Security Best Practices

1. **Use secrets management**: Store COHERE_API_KEY in AWS Secrets Manager, Google Secret Manager, or Azure Key Vault
2. **Add authentication**: Implement JWT or API key authentication
3. **Rate limiting**: Use middleware to prevent abuse
4. **HTTPS only**: Always use TLS in production
5. **CORS configuration**: Restrict origins appropriately

## Performance Optimization

1. **Connection pooling**: Reuse HTTP connections to Cohere API
2. **Caching**: Cache common queries
3. **Horizontal scaling**: Run multiple replicas
4. **Load balancing**: Distribute traffic across instances

## Comparison: AWS AgentCore vs Cohere Deployment

| Feature | AWS AgentCore | Cohere + Custom Deployment |
|---------|---------------|---------------------------|
| **Model Provider** | AWS Bedrock only | Cohere (or any OpenAI-compatible API) |
| **Infrastructure** | Fully managed by AWS | Self-managed (containers, serverless, etc.) |
| **Deployment Complexity** | Low (AWS handles everything) | Medium (requires DevOps setup) |
| **Cost** | AWS pricing + Bedrock model costs | Cloud hosting + Cohere API costs |
| **Flexibility** | Limited to AWS ecosystem | Deploy anywhere (multi-cloud) |
| **Authentication** | AWS Cognito integration | Custom (JWT, OAuth, API keys) |
| **Observability** | Built-in AgentCore tracing | Custom (OpenTelemetry, CloudWatch, etc.) |
| **Scaling** | Automatic | Manual or auto-scaling configuration |
| **Vendor Lock-in** | High (AWS-specific) | Low (portable) |

## When to Use Each Approach

**Use AWS AgentCore when**:
- You're committed to AWS ecosystem
- You prefer fully managed services
- You're using AWS Bedrock models
- You need minimal DevOps overhead

**Use Cohere + Custom Deployment when**:
- You want flexibility in model choice
- You need multi-cloud or on-premises deployment
- You want to avoid vendor lock-in
- You have existing DevOps infrastructure
- You prefer Cohere's models over AWS Bedrock