Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
162 changes: 68 additions & 94 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,101 +1,75 @@
# SimpleL7Proxy: Enterprise AI Gateway for Azure #
# SimpleL7Proxy

SimpleL7Proxy is a high-performance, intelligent Layer 7 router engineered to optimize **Large Language Model (LLM)** workloads. Deployed alongside **Azure API Management** and **AI Foundry**, it provides an advanced orchestration layer for **(LLM)** model providers.
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![.NET](https://img.shields.io/badge/.NET-10-purple)](https://dotnet.microsoft.com)
[![Platform](https://img.shields.io/badge/platform-Azure%20Container%20Apps-0078D4)](https://learn.microsoft.com/en-us/azure/container-apps/overview)

Unlike proprietary gateways, SimpleL7Proxy is a **fully open-source, self-hosted solution,** offering unparalleled customization for data residency, sovereign cloud requirements (GCC High), and bespoke enterprise logic.
An open-source, self-hosted **AI Gateway** for Azure. SimpleL7Proxy is a high-performance Layer 7 router purpose-built for LLM workloads — providing priority queuing, fair-share governance, circuit breaking, async orchestration, and streaming token telemetry, all running inside your own VNET.

## Core Value Propositions
→ **[Full overview, architecture, and use-case analysis](docs/OVERVIEW.md)**

| Challenge | Enterprise-Grade Solution |
|-----------|--------------------------|
| **Workload Contention** | **Tiered Priority Queuing:** Preemptive scheduling ensures mission-critical AI requests bypass background batch processing. |
| **Resource Monopolization** | **Fair-Share Governance:** Granular user/group throttling prevents "noisy neighbor" scenarios and ensures equitable capacity distribution. |
| **System Instability** | **Self-Healing Resiliency:** Integrated Circuit Breaker and automated Retry patterns prevent backend failures from cascading into outages. |
| **Observability Gaps** | **Streaming Telemetry:** Real-time capture of AI token metrics and consumption data, even across high-velocity streaming responses. |
| **Regional Latency** | **Global Traffic Steering:** Intelligent multi-region load balancing with latency-based routing for optimal response times. |
| **Timeout Constraints** | **Stateful Async Orchestration:** Native support for long-running requests (>30 min) via Azure Service Bus notifications and status tracking. |
| **Compliance Barriers** | **Zero-Trust Connectivity:** Hardened with VNET Injection, Managed Identity, and OAuth2, purpose-built for regulated industries and Gov Cloud. |



## The Open Source Advantage ##

While commercial alternatives like Portkey.ai or Helicone offer managed services, and LiteLLM provides broad provider support, **SimpleL7Proxy** is uniquely optimized for the Azure ecosystem.
By leveraging a self-hosted architecture on Azure Container Apps, organizations maintain complete ownership of the data plane. This eliminates third-party dependency, simplifies Azure API Management policy integration, and allows for deep extensibility that proprietary "black box" gateways cannot match.

## Supported Architectural Scenarios ##

**SimpleL7Proxy** is designed to be the backbone of your AI platform, seamlessly integrating with:

* **[Azure AI Foundry](docs/AI_FOUNDRY_INTEGRATION.md):** Advanced routing and rate-limiting for model endpoints.
* **[Azure API Management (APIM)](https://learn.microsoft.com/en-us/azure/api-management/api-management-key-concepts):** Enhancing the APIM platform with sophisticated queuing and async state management.
* **[Custom APIM Policy](APIM-Policy/readme.md):** A reference policy implementation for high-throughput, resilient backend connectivity.
* **Sovereign & Hybrid Cloud:** Standardizing AI egress and governance across public and government regions.
* **Multi-Cloud Portability:** The Docker-based architecture supports any orchestrator; organizations have successfully run the proxy in **AWS** and **GCP**.

## When to Choose SimpleL7Proxy

### Ideal Use Cases
* **Mixed Workloads**: You need to prevent batch processing (e.g., embeddings, summarization) from blocking interactive users (e.g., chat) using **Preemptive Priority Queuing**.
* **Long-Running Operations**: Your AI tasks exceed standard HTTP timeouts (30+ minutes) and require **Async/Stateful** execution.
* **Strict Compliance**: You require a **fully self-hosted** solution that runs entirely within your VNET (e.g., Gov Cloud) with no data egress to third-party gateways.
* **Cost Management**: You want to maximize efficient use of fixed-capacity (PTU) throughput before spilling over to Pay-As-You-Go.
* **Deep Observability**: You need to capture **Token Usage** metrics from streaming LLM responses for chargeback or auditing.
* **Azure Stack integration**: This proxy is deeply integrated with **Azure Managed Identity**, **APIM**, **ACA** and **AI Foundry**.

### When to Consider Alternatives
* **Managed Service Preference**: If you prefer a SaaS solution and do not want to manage [Azure Container Apps](https://learn.microsoft.com/en-us/azure/container-apps/overview) infrastructure, consider managed gateways like Portkey.ai or Helicone.
* **Basic Routing**: If you only require simple round-robin load balancing without priority queuing or token inspection, standard [Azure Application Gateway](https://learn.microsoft.com/en-us/azure/application-gateway/overview) is less complex to maintain.
* **Azure API Management (APIM)**: Offers native public/private gateways and streaming token counting. However, it does not support **Priority Queuing**, **User Profiles**, or deep **Stream Inspection/Modification**. *Note: When using APIM as a backend for SimpleL7Proxy, use our [Recommended High-Throughput Policy](APIM-Policy/readme.md).*


## Capabilities:

### Security
- **Virtual Network Injection:** Secure mission-critical workloads with native **VNET Integration** and identity-based access through [Microsoft Entra ID (Managed Identity)](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview) across public and sovereign regions.
- **Identity-Driven Edge Security:** Enforce **Zero Trust** principles with integrated **OAuth2 authentication** and customizable **Header Policy Enforcement** to validate or restrict inbound request structures.
- **Dynamic Access Governance:** Centrally manage user and group permissions via **External Configuration Providers**, enabling real-time suspension or restriction of access without code changes.

### Reliability
- **Global Traffic Steering & Automated Failover:** Ensure business continuity with **Multi-Region Traffic Distribution** and high-speed **DNS Propagation** for instant disaster recovery.
- **Resilient Request Handling:** Mitigate transient failures using automated **Retry Policies** and built-in **[Circuit Breaker](docs/CIRCUIT_BREAKER.md)** patterns to protect backend health during outages.
- **Dedicated Health Probing:** Ensure high availability under load with a dedicated **[Health Probe Sidecar](docs/HEALTH_CHECKING.md)** architecture that isolates Kubernetes probes from request processing.
- **Temporal Request Validation:** Maintain system integrity by automatically expiring stale requests via **TTL (Time-to-Live) Management**, preventing the processing of outdated data.

### Performance Efficiency
- **Intelligent Traffic Management:** Optimize response times with **Adaptive Load Balancing**, supporting latency-based, weighted round-robin, and randomized routing modes.
- **Tiered Workload Prioritization:** Guarantee performance for critical tasks through **Configurable Priority Levels** and isolated **Dedicated Worker Threads**.
- **Integrated Async/Sync Processing:** Deliver seamless user experiences with native support for both **Real-Time Synchronous** and **Decoupled Asynchronous** messaging patterns.

### Operational Excellence
- **Advanced AI Observability:** Real-time **[Token Usage Telemetry](docs/OBSERVABILITY.md)** for generative AI workloads, providing precise metric capture even for high-velocity **Streaming Responses**.
- **Automated Resource Safeguards:** Prevent service degradation with **Proactive Throttling** and intelligent **Circuit Breaker rejection** when backend thresholds are exceeded.

### Cost Optimization
- **Fair-Share Resource Governance:** Maximize ROI by preventing resource monopolization through **User-Level Throttling** and **Anti-Starvation** algorithms to ensure equitable allocation.



## APIM Policy Scenarios:
* Route high-priority requests to designated backend services.
* Sustain high throughput, exceeding 23M TPM.
* Control concurrency for each backend independently.
* Enable streaming with real-time token capture.
* Enforce backend timeouts to ensure responsiveness.
* Maximize PTU usage, while choosing which priorities use PayGo.

# Arch Diagram
![Architecture Diagram](docs/arch.png)

## Local Development Setup

For local development and testing, SimpleL7Proxy includes an interactive setup script (`local-setup.sh`) to configure the proxy environment.

For detailed instructions on local setup, manual configuration, and running mock backends, see [Development and Testing](docs/DEVELOPMENT.md).

## Deployment

SimpleL7Proxy is designed to be deployed to Azure Container Apps (ACA) using the Azure Developer CLI (AZD).

For comprehensive deployment instructions, including standard, high-performance, and VNET-secured scenarios, see [Container Deployment](docs/CONTAINER_DEPLOYMENT.md).
## Prerequisites

- [.NET 10 SDK](https://dotnet.microsoft.com/download)
- [Docker](https://docs.docker.com/get-docker/) (for container builds)
- [Azure Developer CLI (azd)](https://learn.microsoft.com/en-us/azure/developer/azure-developer-cli/install-azd) (for cloud deployment)
- An Azure subscription with Container Apps and (optionally) AI Foundry / APIM

## Quick Start

**Local development** (interactive setup wizard):
```bash
git clone https://github.com/your-org/SimpleL7Proxy.git
cd SimpleL7Proxy
dotnet run --project src/SimpleL7Proxy
```

**Deploy to Azure Container Apps:**
```bash
# Windows
./.azure/setup.ps1

# Linux / macOS
chmod +x ./.azure/setup.sh && ./.azure/setup.sh

azd provision
./.azure/deploy.ps1 # or deploy.sh on Linux/macOS
```

See [Development and Testing](docs/DEVELOPMENT.md) for local mock backends and manual configuration.
See [Container Deployment](docs/CONTAINER_DEPLOYMENT.md) for all deployment scenarios (standard, high-performance, VNET).

## Documentation

| Topic | Document |
|-------|----------|
| Overview & Architecture | [docs/OVERVIEW.md](docs/OVERVIEW.md) |
| Backend Host Configuration | [docs/BACKEND_HOSTS.md](docs/BACKEND_HOSTS.md) |
| Load Balancing | [docs/LOAD_BALANCING.md](docs/LOAD_BALANCING.md) |
| Priority Queuing & User Governance | [docs/ADVANCED_CONFIGURATION.md](docs/ADVANCED_CONFIGURATION.md) |
| Circuit Breaker | [docs/CIRCUIT_BREAKER.md](docs/CIRCUIT_BREAKER.md) |
| Health Checking | [docs/HEALTH_CHECKING.md](docs/HEALTH_CHECKING.md) |
| Async Operations | [docs/AsyncOperation.md](docs/AsyncOperation.md) |
| User Profiles | [docs/USER_PROFILES.md](docs/USER_PROFILES.md) |
| Request Validation | [docs/REQUEST_VALIDATION.md](docs/REQUEST_VALIDATION.md) |
| Observability & Telemetry | [docs/OBSERVABILITY.md](docs/OBSERVABILITY.md) |
| Security | [docs/SECURITY.md](docs/SECURITY.md) |
| Environment Variables | [docs/ENVIRONMENT_VARIABLES.md](docs/ENVIRONMENT_VARIABLES.md) |
| Configuration Settings | [docs/CONFIGURATION_SETTINGS.md](docs/CONFIGURATION_SETTINGS.md) |
| Azure App Configuration | [docs/AZURE_APP_CONFIGURATION.md](docs/AZURE_APP_CONFIGURATION.md) |
| AI Foundry Integration | [docs/AI_FOUNDRY_INTEGRATION.md](docs/AI_FOUNDRY_INTEGRATION.md) |
| APIM Policy | [APIM-Policy/readme.md](APIM-Policy/readme.md) |
| Container Deployment | [docs/CONTAINER_DEPLOYMENT.md](docs/CONTAINER_DEPLOYMENT.md) |
| Development & Testing | [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md) |
| Response Codes | [docs/RESPONSE_CODES.md](docs/RESPONSE_CODES.md) |

## Contributing

Issues and pull requests are welcome. Please open an issue first to discuss significant changes.

## License

MIT — see [LICENSE](LICENSE). Copyright (c) Microsoft Corporation.

3 changes: 3 additions & 0 deletions docs/CONFIGURATION_SETTINGS.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@
| EventLoggers | `EventLoggers` |
| EventHeaders | `EventHeaders` |
| LogFileName | `LogFileName` |
| LogDateTime | `LogDateTime` |

### EventHub - Configured at startup

Expand All @@ -221,6 +222,8 @@
| Name | `EventHubName` |
| Namespace | `EventHubNamespace` |
| StartupSeconds | `EventHubStartupSeconds` |
| MaxReconnectAttempts | `EventHubMaxReconnectAttempts` |
| MaxUndrainedEvents | `MaxUndrainedEvents` |

---

Expand Down
Loading