# APIM AI Gateway for MCP

> **Author:** Ozgur Guler | AI Solution Leader, AI Innovation Hub
> **Contact:** [ozgur.guler1@gmail.com](mailto:ozgur.guler1@gmail.com)
> **© 2025 Ozgur Guler. All rights reserved.**

---

Put MCP servers behind Azure API Management for enterprise governance.

## Why APIM AI Gateway?

| Feature | Benefit |
|---------|--------|
| **Rate Limiting** | Control tokens/requests per consumer |
| **Content Safety** | Block prompt injection attacks |
| **Authentication** | OAuth/JWT validation |
| **Monitoring** | Token metrics, tracing, logging |
| **Caching** | Semantic caching for cost savings |

## Architecture

```
┌──────────┐      ┌─────────────────┐      ┌─────────────────┐
│  Agent   │ ──► │  APIM Gateway   │ ──► │  MCP Server     │
│          │      │  • Rate Limit   │      │  (Logic Apps)   │
│          │      │  • Auth         │      │                 │
│          │      │  • Safety       │      │                 │
└──────────┘      └─────────────────┘      └─────────────────┘
```

---

## Setup: Expose MCP Server via APIM

### Option 1: Via Azure Portal

1. **APIs** → **Add API** → **MCP Server**
2. Enter backend MCP URL
3. Configure policies

### Option 2: Via Bicep/ARM

```bicep
resource mcpApi 'Microsoft.ApiManagement/service/apis@2023-09-01-preview' = {
  name: 'mcp-logic-apps'
  properties: {
    path: 'mcp'
    protocols: ['https']
    serviceUrl: 'https://my-logic-app.azurewebsites.net/api/mcpservers/ticketing/mcp'
  }
}
```

In [None]:
!pip install azure-ai-projects azure-ai-agents azure-identity --pre --quiet

In [None]:
import os
from dotenv import load_dotenv

load_dotenv("../.env")

PROJECT_ENDPOINT = os.getenv(
    "PROJECT_ENDPOINT",
    "https://ozgurguler-7212-resource.services.ai.azure.com/api/projects/ozgurguler-7212"
)

# APIM gateway endpoint (fronting your MCP server)
APIM_MCP_ENDPOINT = os.getenv(
    "APIM_MCP_ENDPOINT",
    "https://your-apim.azure-api.net/mcp/ticketing/mcp"
)

# APIM subscription key (if required)
APIM_SUBSCRIPTION_KEY = os.getenv("APIM_SUBSCRIPTION_KEY", "")

MODEL = os.getenv("MODEL_DEPLOYMENT_NAME", "gpt-5-nano")

print(f"APIM Endpoint: {APIM_MCP_ENDPOINT}")

---

## Key APIM Policies for MCP

### 1. Rate Limiting (Token-based)

```xml
<inbound>
    <llm-token-limit 
        counter-key="@(context.Subscription.Id)" 
        tokens-per-minute="1000" 
        estimate-prompt-tokens="true"
        remaining-tokens-variable-name="remainingTokens">
    </llm-token-limit>
</inbound>
```

### 2. Request Rate Limiting

```xml
<inbound>
    <rate-limit-by-key 
        calls="10" 
        renewal-period="60" 
        counter-key="@(context.Request.IpAddress)" />
</inbound>
```

### 3. Content Safety

```xml
<inbound>
    <llm-content-safety backend-id="content-safety-backend">
        <text-blocklist-ids>
            <id>prompt-injection-patterns</id>
        </text-blocklist-ids>
    </llm-content-safety>
</inbound>
```

### 4. JWT Validation

```xml
<inbound>
    <validate-jwt header-name="Authorization" require-scheme="Bearer">
        <openid-config url="https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration" />
        <audiences>
            <audience>api://your-app-id</audience>
        </audiences>
    </validate-jwt>
</inbound>
```

### 5. Token Metrics

```xml
<outbound>
    <llm-emit-token-metric namespace="mcp-metrics">
        <dimension name="Subscription" value="@(context.Subscription.Id)" />
        <dimension name="API" value="@(context.Api.Name)" />
    </llm-emit-token-metric>
</outbound>
```

---

## Use APIM Gateway with Agent

In [None]:
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient
from azure.ai.agents.models import McpTool

client = AIProjectClient(endpoint=PROJECT_ENDPOINT, credential=DefaultAzureCredential())

# MCP tool pointing to APIM gateway (not directly to Logic Apps)
mcp_tool = McpTool(
    server_label="apim-mcp-gateway",
    server_url=APIM_MCP_ENDPOINT,
)

# Add APIM subscription key if required
if APIM_SUBSCRIPTION_KEY:
    mcp_tool.update_headers("Ocp-Apim-Subscription-Key", APIM_SUBSCRIPTION_KEY)

print(f"MCP via APIM: {APIM_MCP_ENDPOINT}")

In [None]:
# Create agent using APIM-fronted MCP
agent = client.agents.create_agent(
    model=MODEL,
    name="governed-mcp-agent",
    instructions="You are an assistant with governed access to enterprise tools via APIM.",
    tools=mcp_tool.definitions,
)

print(f"Created agent: {agent.id}")

In [None]:
import time
from azure.ai.agents.models import ListSortOrder, RequiredMcpToolCall, SubmitToolApprovalAction, ToolApproval

def run_governed_agent(agent, client, mcp_tool, message):
    """Run agent through APIM gateway."""
    thread = client.agents.threads.create()
    client.agents.messages.create(thread_id=thread.id, role="user", content=message)
    
    run = client.agents.runs.create(
        thread_id=thread.id, agent_id=agent.id, tool_resources=mcp_tool.resources
    )
    
    while run.status in ["queued", "in_progress", "requires_action"]:
        time.sleep(1)
        run = client.agents.runs.get(thread_id=thread.id, run_id=run.id)
        
        if run.status == "requires_action" and isinstance(run.required_action, SubmitToolApprovalAction):
            approvals = [
                ToolApproval(tool_call_id=tc.id, approve=True, headers=mcp_tool.headers)
                for tc in run.required_action.submit_tool_approval.tool_calls
                if isinstance(tc, RequiredMcpToolCall)
            ]
            if approvals:
                client.agents.runs.submit_tool_outputs(thread_id=thread.id, run_id=run.id, tool_approvals=approvals)
    
    if run.status == "failed":
        # Check for rate limit (429)
        if "429" in str(run.last_error):
            return "RATE_LIMITED: Too many requests"
        return f"ERROR: {run.last_error}"
    
    messages = client.agents.messages.list(thread_id=thread.id, order=ListSortOrder.DESCENDING)
    for msg in messages:
        if msg.role == "assistant" and msg.text_messages:
            return msg.text_messages[-1].text.value
    return None

# Test the governed agent
response = run_governed_agent(agent, client, mcp_tool, "Create a support ticket for login issues")
print(f"Response: {response}")

---

## Test Rate Limiting

In [None]:
# Burst test - send multiple requests to trigger rate limit
print("Testing rate limiting (burst requests)...\n")

for i in range(5):
    response = run_governed_agent(agent, client, mcp_tool, f"Test request {i+1}")
    status = "RATE_LIMITED" if "RATE_LIMITED" in str(response) else "OK"
    print(f"Request {i+1}: {status}")
    
print("\nIf rate limiting is configured, later requests should show RATE_LIMITED")

In [None]:
# Cleanup
client.agents.delete_agent(agent.id)
print("Agent deleted")

---

## Complete APIM Policy Example

```xml
<policies>
    <inbound>
        <base />
        
        <!-- Authentication -->
        <validate-jwt header-name="Authorization" require-scheme="Bearer">
            <openid-config url="https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration" />
        </validate-jwt>
        
        <!-- Rate Limiting -->
        <rate-limit-by-key calls="10" renewal-period="60" 
            counter-key="@(context.Request.Headers.GetValueOrDefault('Authorization',''))" />
        
        <!-- Content Safety -->
        <llm-content-safety backend-id="content-safety">
            <text-blocklist-ids>
                <id>injection-patterns</id>
            </text-blocklist-ids>
        </llm-content-safety>
        
    </inbound>
    <backend>
        <base />
    </backend>
    <outbound>
        <base />
        
        <!-- Emit Metrics -->
        <llm-emit-token-metric namespace="mcp-gateway">
            <dimension name="Consumer" value="@(context.Subscription.Id)" />
        </llm-emit-token-metric>
        
    </outbound>
</policies>
```

---

## Summary

| APIM Feature | Policy | Purpose |
|--------------|--------|--------|
| Rate Limit | `rate-limit-by-key` | Requests per minute |
| Token Limit | `llm-token-limit` | Tokens per minute |
| Auth | `validate-jwt` | OAuth/JWT validation |
| Safety | `llm-content-safety` | Block prompt injection |
| Metrics | `llm-emit-token-metric` | Track usage |
| Caching | `llm-semantic-cache-*` | Reduce costs |

## Proof Checklist
- [ ] Rate limit returns `429` on burst
- [ ] Injection blocked by content safety
- [ ] Metrics visible in App Insights

## Next

Continue to `../10-tool-catalog-registration-in-foundry`.

---

<div align="center">

## License & Attribution

This notebook is part of the **Azure AI Foundry Demo Repository**

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](../LICENSE)

**Original Author:** Ozgur Guler | AI Solution Leader, AI Innovation Hub

**Contact:** [ozgur.guler1@gmail.com](mailto:ozgur.guler1@gmail.com)

---

*If you use, modify, or distribute this work, you must provide appropriate credit to the original author as required by the [Apache License 2.0](../LICENSE).*

**Copyright © 2025 Ozgur Guler. All rights reserved.**

</div>