Azure API Management placed in front of Azure OpenAI, so every call to the AI model goes through one controlled front door.
When you give an app direct access to an AI model, you hand it a key and hope for the best. There is no easy way to see who is calling, to stop one user from running up the bill, or to avoid paying for the same answer twice.
This project puts a gateway in the middle. Clients talk to the gateway, and the gateway talks to the model. Because everything passes through one place, three useful things become possible:
- Keys you can revoke - Each client gets its own subscription key, and the real connection to the model stays with the gateway
- Per-client usage limits - A free tier capped at 500 tokens per minute, a premium tier at 100,000
- Caching - Repeated questions are answered from the gateway's memory, with no second call to the model and no second charge
A single request arrives with a subscription key, the gateway checks the tier's token budget, looks for a cached answer, and only calls the model if it has to. On the way back out, the answer is saved to the cache for next time.
- One key in, no keys out - The client sends only its APIM subscription key. The gateway authenticates to Azure OpenAI with a managed identity, so no model key is ever written into a script or shared with a client
- Two tiers, two budgets - Token limits live in APIM policies attached to each product. The policy counts the tokens each call uses and adds them up per subscription, so one tier never eats into the other's budget
- Exact-match caching - The cache keys on the exact text of the request. Asking "what is APIM" twice returns a cached answer the second time, but changing a single letter to "what is apim" counts as a new question
The test script exercises all three behaviours in one run.
- Gateway: Azure API Management
- Model: Azure OpenAI running
gpt-4.1-mini - Infrastructure: Bicep deployed via the Azure CLI
- Authentication: Managed identity, so the gateway proves who it is without storing a key
- Testing: Bash and curl
APIM + GenAI/
├── docs/ # Architecture diagram and screenshots
├── infra/
│ └── main.bicep # Creates the APIM gateway
├── test-gateway.sh # Drives the gateway to show each feature
└── README.md
The test script reads the gateway URL and subscription keys from the environment, so no secrets are saved in the file. Set these before running:
export APIM_GATEWAY="https://<your-apim>.azure-api.net"
export APIM_FREE_KEY="<free-test subscription key>"
export APIM_PREMIUM_KEY="<premium-test subscription key>"The deployment name and api-version default to gpt-4.1-mini and 2025-03-01-preview. Override them with APIM_DEPLOYMENT and APIM_API_VERSION if yours differ.
You need an Azure subscription, the Azure CLI installed, and a deployed gateway with a free and premium product set up.
# Deploy the gateway
az deployment group create \
--resource-group <your-resource-group> \
--template-file infra/main.bicep
# Run the test (after setting the environment variables above)
./test-gateway.shThe free tier clears a couple of calls and then returns 429 (too many requests), the premium tier clears all of them, and an identical question comes back from the cache the second time it is asked.

