Skip to content

olivershackley1999/apim-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

APIM AI Gateway

Built with Claude

Azure API Management placed in front of Azure OpenAI, so every call to the AI model goes through one controlled front door.

Project Overview

When you give an app direct access to an AI model, you hand it a key and hope for the best. There is no easy way to see who is calling, to stop one user from running up the bill, or to avoid paying for the same answer twice.

This project puts a gateway in the middle. Clients talk to the gateway, and the gateway talks to the model. Because everything passes through one place, three useful things become possible:

  • Keys you can revoke - Each client gets its own subscription key, and the real connection to the model stays with the gateway
  • Per-client usage limits - A free tier capped at 500 tokens per minute, a premium tier at 100,000
  • Caching - Repeated questions are answered from the gateway's memory, with no second call to the model and no second charge

Request lifecycle through the gateway

A single request arrives with a subscription key, the gateway checks the tier's token budget, looks for a cached answer, and only calls the model if it has to. On the way back out, the answer is saved to the cache for next time.

How It Works

  • One key in, no keys out - The client sends only its APIM subscription key. The gateway authenticates to Azure OpenAI with a managed identity, so no model key is ever written into a script or shared with a client
  • Two tiers, two budgets - Token limits live in APIM policies attached to each product. The policy counts the tokens each call uses and adds them up per subscription, so one tier never eats into the other's budget
  • Exact-match caching - The cache keys on the exact text of the request. Asking "what is APIM" twice returns a cached answer the second time, but changing a single letter to "what is apim" counts as a new question

Managed identity granting the gateway access to Azure OpenAI

The test script exercises all three behaviours in one run.

Test script output showing throttling and caching

Technology Stack

  • Gateway: Azure API Management
  • Model: Azure OpenAI running gpt-4.1-mini
  • Infrastructure: Bicep deployed via the Azure CLI
  • Authentication: Managed identity, so the gateway proves who it is without storing a key
  • Testing: Bash and curl

Project Structure

APIM + GenAI/
├── docs/              # Architecture diagram and screenshots
├── infra/
│   └── main.bicep     # Creates the APIM gateway
├── test-gateway.sh    # Drives the gateway to show each feature
└── README.md

Environment Setup

The test script reads the gateway URL and subscription keys from the environment, so no secrets are saved in the file. Set these before running:

export APIM_GATEWAY="https://<your-apim>.azure-api.net"
export APIM_FREE_KEY="<free-test subscription key>"
export APIM_PREMIUM_KEY="<premium-test subscription key>"

The deployment name and api-version default to gpt-4.1-mini and 2025-03-01-preview. Override them with APIM_DEPLOYMENT and APIM_API_VERSION if yours differ.

Running It

You need an Azure subscription, the Azure CLI installed, and a deployed gateway with a free and premium product set up.

# Deploy the gateway
az deployment group create \
  --resource-group <your-resource-group> \
  --template-file infra/main.bicep

# Run the test (after setting the environment variables above)
./test-gateway.sh

The free tier clears a couple of calls and then returns 429 (too many requests), the premium tier clears all of them, and an identical question comes back from the cache the second time it is asked.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors