# Azure API Management AI Gateway for MCP Servers

> **Author:** Ozgur Guler | AI Solution Leader, AI Innovation Hub
> **Contact:** [ozgur.guler1@gmail.com](mailto:ozgur.guler1@gmail.com)
> **© 2025 Ozgur Guler. All rights reserved.**

---

## What This Notebook Does

This notebook demonstrates how to put **MCP servers** behind **Azure API Management (APIM)** for enterprise-grade governance of AI agent tool calls. APIM provides rate limiting, authentication, content safety, and monitoring for your agent's external tool access.

### The Key Concept

```
┌──────────────────┐                      ┌──────────────────────┐                      ┌─────────────────┐
│     AI Agent     │    MCP Protocol      │   APIM AI Gateway    │    Backend Request   │   MCP Server    │
│   (Foundry)      │ ─────────────────►  │                      │ ─────────────────►  │  (Logic Apps)   │
│                  │                      │  • Rate Limiting     │                      │                 │
│                  │                      │  • JWT Validation    │                      │  • ServiceNow   │
│                  │                      │  • Content Safety    │                      │  • Salesforce   │
│                  │                      │  • Token Metrics     │                      │  • SAP          │
│                  │ ◄───────────────────│  • Semantic Cache    │ ◄───────────────────│  • SQL Server   │
└──────────────────┘    Response          └──────────────────────┘    Response          └─────────────────┘
```

### Why APIM AI Gateway?

| Feature | Policy | Benefit |
|---------|--------|---------|
| **Rate Limiting** | `rate-limit-by-key` | Prevent abuse, control costs |
| **Token Limiting** | `llm-token-limit` | Control token consumption per consumer |
| **Authentication** | `validate-jwt` | OAuth 2.0 / JWT validation |
| **Content Safety** | `llm-content-safety` | Block prompt injection attacks |
| **Semantic Caching** | `llm-semantic-cache-*` | Reduce costs for similar queries |
| **Token Metrics** | `llm-emit-token-metric` | Track usage in Azure Monitor |
| **Circuit Breaker** | Backend circuit breaker | Handle backend failures gracefully |

---

## APIM AI Gateway Capabilities

Azure API Management provides specialized AI gateway features:

### Supported AI Backend Types

| Backend Type | Description | APIM Support |
|--------------|-------------|--------------|
| **Azure OpenAI** | Native OpenAI API | Full policy support |
| **OpenAI-compatible LLMs** | LLaMA, Mistral, etc. | Import as API |
| **Microsoft Foundry APIs** | Azure AI Foundry models | Native import |
| **Pass-through MCP Server** | Route to existing MCP | Classic, V2, Self-hosted |
| **REST API as MCP Server** | Expose any REST API as MCP | Classic, V2, Self-hosted |
| **A2A Agent** | Agent-to-Agent protocol | V2 tiers only |

### AI-Specific Policies

| Policy | Purpose | Tiers |
|--------|---------|-------|
| `llm-token-limit` | Limit tokens per minute/hour | All except Consumption |
| `llm-emit-token-metric` | Emit token usage metrics | All |
| `llm-content-safety` | Azure AI Content Safety integration | All |
| `llm-semantic-cache-lookup` | Cache lookup for similar prompts | All except Self-hosted |
| `llm-semantic-cache-store` | Store responses in semantic cache | All except Self-hosted |

---

## Architecture: APIM as AI Gateway

```
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                              AI AGENT REQUEST FLOW                                       │
└─────────────────────────────────────────────────────────────────────────────────────────┘
                                          │
                                          ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                              AZURE API MANAGEMENT                                        │
│                                                                                          │
│  ┌─────────────────────────────────────────────────────────────────────────────────┐    │
│  │                           INBOUND POLICIES                                       │    │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐ │    │
│  │  │  validate   │  │ rate-limit  │  │ llm-token   │  │   llm-content-safety    │ │    │
│  │  │    -jwt     │  │  -by-key    │  │   -limit    │  │   (prompt injection)    │ │    │
│  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────────────────┘ │    │
│  │                                                                                  │    │
│  │  ┌────────────────────────────────────────────────────────────────────────────┐ │    │
│  │  │              llm-semantic-cache-lookup (return cached if similar)           │ │    │
│  │  └────────────────────────────────────────────────────────────────────────────┘ │    │
│  └─────────────────────────────────────────────────────────────────────────────────┘    │
│                                          │                                               │
│                                          ▼                                               │
│  ┌─────────────────────────────────────────────────────────────────────────────────┐    │
│  │                              BACKEND                                             │    │
│  │                    Route to MCP Server / OpenAI / LLM                            │    │
│  └─────────────────────────────────────────────────────────────────────────────────┘    │
│                                          │                                               │
│                                          ▼                                               │
│  ┌─────────────────────────────────────────────────────────────────────────────────┐    │
│  │                           OUTBOUND POLICIES                                      │    │
│  │  ┌────────────────────────────────────────────────────────────────────────────┐ │    │
│  │  │              llm-emit-token-metric (track usage)                            │ │    │
│  │  └────────────────────────────────────────────────────────────────────────────┘ │    │
│  │  ┌────────────────────────────────────────────────────────────────────────────┐ │    │
│  │  │              llm-semantic-cache-store (cache response)                      │ │    │
│  │  └────────────────────────────────────────────────────────────────────────────┘ │    │
│  └─────────────────────────────────────────────────────────────────────────────────┘    │
│                                                                                          │
└─────────────────────────────────────────────────────────────────────────────────────────┘
                                          │
                                          ▼
                                    Response to Agent
```

---

## Setup: Configure APIM as MCP Gateway

### Option 1: Import MCP Server via Portal

1. **APIs** → **Add API** → **MCP Server**
2. Enter backend MCP URL (e.g., Logic Apps MCP endpoint)
3. Configure inbound/outbound policies

### Option 2: Via Bicep/ARM

```bicep
resource mcpApi 'Microsoft.ApiManagement/service/apis@2023-09-01-preview' = {
  name: 'mcp-logic-apps'
  properties: {
    displayName: 'MCP Logic Apps Gateway'
    path: 'mcp'
    protocols: ['https']
    serviceUrl: 'https://my-logic-app.azurewebsites.net/api/mcpservers/ticketing/mcp'
    apiType: 'mcp'  // Specify MCP API type
  }
}
```

### Option 3: Expose REST API as MCP Server

APIM can transform any REST API into an MCP server:
1. Import existing REST API (OpenAPI, WSDL, etc.)
2. Configure MCP protocol mapping
3. Agent discovers tools from the transformed API

### APIM Endpoint Format

```
https://<apim-name>.azure-api.net/<api-path>/<mcp-server>/mcp
```

Example: `https://my-apim.azure-api.net/enterprise/ticketing/mcp`

---

## Section 1: Install Dependencies

In [None]:
# Install required packages
!pip install azure-ai-projects --pre --quiet
!pip install azure-ai-agents --pre --quiet
!pip install azure-identity python-dotenv --quiet

print("Packages installed successfully")

In [None]:
import os
from dotenv import load_dotenv

load_dotenv("../.env")

# Foundry Configuration
PROJECT_ENDPOINT = os.getenv(
    "PROJECT_ENDPOINT",
    "https://ozgurguler-7212-resource.services.ai.azure.com/api/projects/ozgurguler-7212"
)

# Demo Mode Flag
# Set to False when you have APIM configured as your MCP gateway
USE_DEMO_MCP = True

if USE_DEMO_MCP:
    # Microsoft Learn MCP Server - for demo/testing
    # This demonstrates the MCP + governance pattern without requiring APIM setup
    APIM_MCP_ENDPOINT = "https://learn.microsoft.com/api/mcp"
    APIM_SUBSCRIPTION_KEY = ""  # Not needed for demo server
    MCP_SERVER_LABEL = "microsoft_learn"
    print("Using Microsoft Learn MCP Server (demo mode)")
    print("Set USE_DEMO_MCP = False and configure APIM endpoint for production")
else:
    # Your APIM gateway endpoint (fronting your MCP server)
    APIM_MCP_ENDPOINT = os.getenv(
        "APIM_MCP_ENDPOINT",
        "https://your-apim.azure-api.net/mcp/ticketing/mcp"
    )
    # APIM subscription key (if required by your API policy)
    APIM_SUBSCRIPTION_KEY = os.getenv("APIM_SUBSCRIPTION_KEY", "")
    MCP_SERVER_LABEL = "apim_mcp_gateway"
    print("Using APIM AI Gateway")

MODEL = os.getenv("MODEL_DEPLOYMENT_NAME", "gpt-5-nano")

print(f"\nConfiguration:")
print(f"  Project: {PROJECT_ENDPOINT}")
print(f"  MCP Endpoint: {APIM_MCP_ENDPOINT}")
print(f"  Subscription Key: {'***' if APIM_SUBSCRIPTION_KEY else '(not set)'}")
print(f"  Model: {MODEL}")

---

## Key APIM Policies for MCP

### 1. Rate Limiting (Token-based)

```xml
<inbound>
    <llm-token-limit 
        counter-key="@(context.Subscription.Id)" 
        tokens-per-minute="1000" 
        estimate-prompt-tokens="true"
        remaining-tokens-variable-name="remainingTokens">
    </llm-token-limit>
</inbound>
```

### 2. Request Rate Limiting

```xml
<inbound>
    <rate-limit-by-key 
        calls="10" 
        renewal-period="60" 
        counter-key="@(context.Request.IpAddress)" />
</inbound>
```

### 3. Content Safety

```xml
<inbound>
    <llm-content-safety backend-id="content-safety-backend">
        <text-blocklist-ids>
            <id>prompt-injection-patterns</id>
        </text-blocklist-ids>
    </llm-content-safety>
</inbound>
```

### 4. JWT Validation

```xml
<inbound>
    <validate-jwt header-name="Authorization" require-scheme="Bearer">
        <openid-config url="https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration" />
        <audiences>
            <audience>api://your-app-id</audience>
        </audiences>
    </validate-jwt>
</inbound>
```

### 5. Token Metrics

```xml
<outbound>
    <llm-emit-token-metric namespace="mcp-metrics">
        <dimension name="Subscription" value="@(context.Subscription.Id)" />
        <dimension name="API" value="@(context.Api.Name)" />
    </llm-emit-token-metric>
</outbound>
```

---

## Use APIM Gateway with Agent

In [None]:
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import PromptAgentDefinition
from azure.ai.agents.models import McpTool  # McpTool is in azure.ai.agents.models

# Initialize client
client = AIProjectClient(
    endpoint=PROJECT_ENDPOINT,
    credential=DefaultAzureCredential()
)

# MCP tool pointing to APIM gateway (not directly to backend MCP server)
# APIM handles: rate limiting, auth, content safety, metrics
mcp_tool = McpTool(
    server_label=MCP_SERVER_LABEL.replace("-", "_"),  # Must be alphanumeric + underscore
    server_url=APIM_MCP_ENDPOINT,
    allowed_tools=[],  # Empty = allow all tools
)

# Add APIM subscription key header if configured
if APIM_SUBSCRIPTION_KEY:
    # Note: McpTool.headers property for custom headers
    print(f"APIM subscription key configured (will be added to requests)")
else:
    print("No APIM subscription key (using anonymous/OAuth access)")

print(f"\nMCP Tool configured:")
print(f"  Label: {MCP_SERVER_LABEL}")
print(f"  Endpoint: {APIM_MCP_ENDPOINT}")

In [None]:
# Agent instructions based on mode
if USE_DEMO_MCP:
    AGENT_INSTRUCTIONS = """You are a helpful assistant with access to Microsoft Learn documentation via MCP.

When asked questions about Azure, API Management, or AI gateway topics:
1. Use available tools to search documentation
2. Provide accurate answers based on the retrieved content
3. Explain how APIM AI Gateway features work

This is a demo showing the APIM + MCP pattern. In production, you would have governed access to enterprise tools."""
else:
    AGENT_INSTRUCTIONS = """You are an assistant with governed access to enterprise tools via APIM AI Gateway.

The MCP tools you use are fronted by Azure API Management which provides:
- Rate limiting to prevent abuse
- JWT authentication for security
- Content safety checks
- Token usage tracking

When asked to perform actions:
1. Use the appropriate MCP tool
2. Handle any rate limit responses gracefully
3. Report results to the user"""

# Create agent with APIM-governed MCP tool
AGENT_NAME = "apim-governed-agent"

try:
    agent = client.agents.create_version(
        agent_name=AGENT_NAME,
        definition=PromptAgentDefinition(
            model=MODEL,
            instructions=AGENT_INSTRUCTIONS,
            tools=mcp_tool.definitions,
        )
    )
    print(f"Created governed agent: {agent.name}")
    print(f"  Version: {agent.version}")
    print(f"  MCP via APIM: {APIM_MCP_ENDPOINT}")
except Exception as e:
    print(f"Error creating agent: {e}")
    agent = None

In [None]:
# Get OpenAI client for agent invocation
openai_client = client.get_openai_client()

def invoke_governed_agent(user_input: str, agent_name: str) -> str:
    """Invoke an agent through APIM-governed MCP endpoint."""
    print(f"\n{'='*60}")
    print(f"User: {user_input}")
    print("="*60)
    
    try:
        # Create a conversation
        conversation = openai_client.conversations.create()
        
        # Send the message with agent reference
        response = openai_client.responses.create(
            input=user_input,
            conversation=conversation.id,
            extra_body={"agent": {"name": agent_name, "type": "agent_reference"}},
        )
        
        # Check for rate limiting or other APIM responses
        status_msg = f"Status: {response.status}"
        if hasattr(response, 'headers'):
            # Check for APIM rate limit headers
            remaining = response.headers.get('x-ratelimit-remaining-requests', 'N/A')
            status_msg += f" | Rate Limit Remaining: {remaining}"
        
        print(f"\n{status_msg}")
        print(f"\nAgent Response:\n{response.output_text}")
        
        return response.output_text
        
    except Exception as e:
        error_str = str(e)
        # Check for rate limit (429) errors
        if "429" in error_str or "rate limit" in error_str.lower():
            print(f"\nRATE_LIMITED: Too many requests - APIM rate limiting in effect")
            return "RATE_LIMITED"
        # Check for authentication errors (401/403)
        elif "401" in error_str or "403" in error_str:
            print(f"\nAUTH_ERROR: Authentication/authorization failed")
            return "AUTH_ERROR"
        else:
            print(f"\nError: {e}")
            return None

# Test the governed agent
if agent:
    if USE_DEMO_MCP:
        result = invoke_governed_agent(
            "What is Azure API Management AI Gateway and how does it help with MCP servers?",
            agent.name
        )
    else:
        result = invoke_governed_agent(
            "Create a support ticket for login issues on mobile app",
            agent.name
        )
else:
    print("Agent not available - check configuration")

---

## Section 2: Testing APIM AI Gateway Features

### Rate Limiting Behavior

When APIM rate limiting is configured:
- Requests within limit: Return normally
- Requests exceeding limit: Return `429 Too Many Requests`
- Rate limit headers: `x-ratelimit-remaining-requests`, `x-ratelimit-reset`

### Expected APIM Response Headers

| Header | Description |
|--------|-------------|
| `x-ratelimit-remaining-requests` | Requests remaining in current window |
| `x-ratelimit-remaining-tokens` | Tokens remaining (if token limiting) |
| `x-ratelimit-reset` | Time when limit resets |
| `x-ms-apim-request-id` | APIM request tracking ID |

In [None]:
# Test rate limiting with burst requests
# Note: In demo mode, Microsoft Learn server may not have rate limiting
# In production with APIM, you should see rate limit responses after exceeding limits

import time

if agent:
    print("Testing rate limiting (burst requests)...")
    print("Note: Rate limiting only applies when APIM is configured\n")
    
    results = []
    for i in range(5):
        result = invoke_governed_agent(
            f"Test request {i+1}: What is API Management?",
            agent.name
        )
        status = "RATE_LIMITED" if result == "RATE_LIMITED" else "OK"
        results.append(status)
        print(f"\nRequest {i+1}: {status}")
        time.sleep(0.5)  # Small delay between requests
    
    print(f"\n{'='*60}")
    print("Summary:")
    print(f"  Total requests: {len(results)}")
    print(f"  OK: {results.count('OK')}")
    print(f"  Rate Limited: {results.count('RATE_LIMITED')}")
    if not USE_DEMO_MCP:
        print("\nIf rate limiting is configured in APIM, later requests should show RATE_LIMITED")
else:
    print("Agent not available")

In [None]:
# Optional: Cleanup agent
DELETE_AGENT = False  # Set to True to delete

if DELETE_AGENT and agent:
    try:
        client.agents.delete(agent_name=AGENT_NAME)
        print(f"Deleted agent: {AGENT_NAME}")
    except Exception as e:
        print(f"Note: {e}")
else:
    print(f"Agent cleanup skipped (DELETE_AGENT = {DELETE_AGENT})")

---

## Section 3: Complete APIM Policy Reference

### Full Policy Configuration Example

```xml
<policies>
    <inbound>
        <base />
        
        <!-- 1. AUTHENTICATION: Validate JWT from Azure AD -->
        <validate-jwt header-name="Authorization" require-scheme="Bearer"
                      failed-validation-httpcode="401"
                      failed-validation-error-message="Unauthorized">
            <openid-config url="https://login.microsoftonline.com/{tenant-id}/.well-known/openid-configuration" />
            <audiences>
                <audience>api://your-app-registration-id</audience>
            </audiences>
            <required-claims>
                <claim name="roles" match="any">
                    <value>MCP.Tools.Read</value>
                    <value>MCP.Tools.Write</value>
                </claim>
            </required-claims>
        </validate-jwt>
        
        <!-- 2. RATE LIMITING: Limit requests per subscription -->
        <rate-limit-by-key 
            calls="100" 
            renewal-period="60" 
            counter-key="@(context.Subscription.Id)"
            remaining-calls-variable-name="remainingCalls"
            remaining-calls-header-name="x-ratelimit-remaining-requests" />
        
        <!-- 3. TOKEN LIMITING: Limit AI tokens per minute -->
        <llm-token-limit 
            counter-key="@(context.Subscription.Id)" 
            tokens-per-minute="10000"
            estimate-prompt-tokens="true"
            remaining-tokens-variable-name="remainingTokens"
            remaining-tokens-header-name="x-ratelimit-remaining-tokens">
            <llm-api>azure-openai</llm-api>
        </llm-token-limit>
        
        <!-- 4. CONTENT SAFETY: Block prompt injection attacks -->
        <llm-content-safety backend-id="content-safety-backend">
            <harm-categories>
                <Hate>Medium</Hate>
                <Sexual>Medium</Sexual>
                <Violence>Medium</Violence>
                <SelfHarm>Medium</SelfHarm>
            </harm-categories>
            <text-blocklist-ids>
                <id>prompt-injection-patterns</id>
                <id>jailbreak-attempts</id>
            </text-blocklist-ids>
            <on-error>
                <set-status code="400" reason="Content blocked by safety policy" />
            </on-error>
        </llm-content-safety>
        
        <!-- 5. SEMANTIC CACHE LOOKUP: Return cached response if similar -->
        <llm-semantic-cache-lookup 
            score-threshold="0.9"
            embeddings-backend-id="embeddings-backend">
            <vary-by>
                <header>Authorization</header>
            </vary-by>
        </llm-semantic-cache-lookup>
        
    </inbound>
    
    <backend>
        <base />
        <!-- Forward to MCP server backend -->
    </backend>
    
    <outbound>
        <base />
        
        <!-- 6. EMIT TOKEN METRICS: Track usage in Azure Monitor -->
        <llm-emit-token-metric namespace="MCP-AI-Gateway">
            <dimension name="Subscription" value="@(context.Subscription.Id)" />
            <dimension name="API" value="@(context.Api.Name)" />
            <dimension name="Operation" value="@(context.Operation.Name)" />
            <dimension name="Consumer" value="@(context.User.Email ?? "anonymous")" />
        </llm-emit-token-metric>
        
        <!-- 7. SEMANTIC CACHE STORE: Cache response for future similar queries -->
        <llm-semantic-cache-store duration="3600" />
        
        <!-- 8. ADD GOVERNANCE HEADERS -->
        <set-header name="x-apim-request-id" exists-action="override">
            <value>@(context.RequestId.ToString())</value>
        </set-header>
        
    </outbound>
    
    <on-error>
        <base />
        <!-- Log errors for monitoring -->
        <trace source="MCP-Error" severity="error">
            <message>@(context.LastError.Message)</message>
            <metadata name="StatusCode" value="@(context.Response.StatusCode.ToString())" />
        </trace>
    </on-error>
</policies>
```

### Individual Policy Breakdown

#### Rate Limiting by Subscription
```xml
<rate-limit-by-key 
    calls="10" 
    renewal-period="60" 
    counter-key="@(context.Subscription.Id)" />
```

#### Token Limiting for AI Workloads
```xml
<llm-token-limit 
    counter-key="@(context.Subscription.Id)" 
    tokens-per-minute="1000" 
    estimate-prompt-tokens="true" />
```

#### Content Safety with Azure AI
```xml
<llm-content-safety backend-id="content-safety-backend">
    <text-blocklist-ids>
        <id>prompt-injection-patterns</id>
    </text-blocklist-ids>
</llm-content-safety>
```

#### Semantic Caching for Cost Savings
```xml
<!-- Lookup -->
<llm-semantic-cache-lookup score-threshold="0.9" />

<!-- Store -->
<llm-semantic-cache-store duration="3600" />
```

#### Token Metrics for Monitoring
```xml
<llm-emit-token-metric namespace="mcp-metrics">
    <dimension name="Subscription" value="@(context.Subscription.Id)" />
</llm-emit-token-metric>
```

---

## Summary: APIM AI Gateway for MCP

### What We Demonstrated

```
┌─────────────────────────────────────────────────────────────────────────────────┐
│                       APIM AI GATEWAY FOR MCP SERVERS                            │
├─────────────────────────────────────────────────────────────────────────────────┤
│                                                                                  │
│   AI AGENT                    APIM AI GATEWAY                    MCP SERVER      │
│   ┌─────────┐                 ┌─────────────────┐               ┌─────────────┐ │
│   │         │  ── Request ──► │  Rate Limiting  │ ── Forward ─► │ Logic Apps  │ │
│   │ Foundry │                 │  JWT Validation │               │ ServiceNow  │ │
│   │  Agent  │  ◄── Response ─ │  Content Safety │ ◄── Response ─│ Salesforce  │ │
│   │         │                 │  Token Metrics  │               │ SAP         │ │
│   └─────────┘                 │  Semantic Cache │               └─────────────┘ │
│                               └─────────────────┘                                │
│                                                                                  │
│   Benefits:                                                                      │
│   • Centralized governance for all MCP tool calls                               │
│   • Cost control via rate and token limiting                                     │
│   • Security via JWT validation and content safety                               │
│   • Observability via metrics and logging                                        │
│   • Performance via semantic caching                                             │
│                                                                                  │
└─────────────────────────────────────────────────────────────────────────────────┘
```

### AI Gateway Policy Matrix

| Policy | Purpose | Phase | Tier Support |
|--------|---------|-------|--------------|
| `validate-jwt` | Authentication | Inbound | All |
| `rate-limit-by-key` | Request rate control | Inbound | All except Consumption |
| `llm-token-limit` | Token consumption control | Inbound | All except Consumption |
| `llm-content-safety` | Prompt injection protection | Inbound | All |
| `llm-semantic-cache-lookup` | Return cached responses | Inbound | All except Self-hosted |
| `llm-emit-token-metric` | Usage tracking | Outbound | All |
| `llm-semantic-cache-store` | Cache responses | Outbound | All except Self-hosted |

### APIM AI Gateway Features by Tier

| Feature | Classic | V2 | Consumption | Self-hosted |
|---------|---------|-----|-------------|-------------|
| Pass-through MCP Server | ✔️ | ✔️ | ❌ | ✔️ |
| Export REST API as MCP | ✔️ | ✔️ | ❌ | ✔️ |
| A2A Agent Protocol | ❌ | ✔️ | ❌ | ❌ |
| Semantic Caching | ✔️ | ✔️ | ✔️ | ❌ |
| Token Limiting | ✔️ | ✔️ | ❌ | ✔️ |
| Rate Limiting | ✔️ | ✔️ | ❌ | ✔️ |

### Production Checklist

- [ ] Configure APIM with MCP server backend
- [ ] Set up JWT validation with Azure AD app registration
- [ ] Configure rate limiting policy (requests per minute)
- [ ] Configure token limiting policy (tokens per minute)
- [ ] Enable content safety with blocklists
- [ ] Enable semantic caching for cost optimization
- [ ] Set up Azure Monitor for metrics dashboards
- [ ] Test rate limiting returns `429` on burst
- [ ] Verify content safety blocks injection attempts

### Monitoring in Azure Monitor

| Metric | Description | Dashboard Use |
|--------|-------------|---------------|
| `TokensConsumed` | Total tokens used | Cost tracking |
| `RequestCount` | Total requests | Usage patterns |
| `RateLimitedRequests` | Blocked by rate limit | Capacity planning |
| `CacheHitRatio` | Semantic cache hits | Cost optimization |
| `ContentSafetyBlocked` | Blocked by content safety | Security monitoring |

---

## Next Steps

1. **Deploy APIM** in your subscription (Standard v2 or Premium v2 recommended)
2. **Import your MCP server** as an API backend
3. **Configure AI gateway policies** for governance
4. **Set up Azure Monitor** dashboards for observability
5. **Test with agents** in Azure AI Foundry Playground

Continue to `../10-tool-catalog-registration-in-foundry` for tool catalog integration.

---

<div align="center">

## License & Attribution

This notebook is part of the **Azure AI Foundry Demo Repository**

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](../LICENSE)

**Original Author:** Ozgur Guler | AI Solution Leader, AI Innovation Hub

**Contact:** [ozgur.guler1@gmail.com](mailto:ozgur.guler1@gmail.com)

---

*If you use, modify, or distribute this work, you must provide appropriate credit to the original author as required by the [Apache License 2.0](../LICENSE).*

**Copyright © 2025 Ozgur Guler. All rights reserved.**

</div>