Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions APIM-Policy/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Ignore all files in the 'untracked' directory
untracked/
49 changes: 49 additions & 0 deletions docs/BACKEND_HOSTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,52 @@ IP1="10.0.1.5"
## Mixing Methods

You can mix methods across different hosts (e.g., `Host1` uses a connection string, `Host2` uses the legacy format), but you should not mix definitions for the *same* host number. If `Host1` is a connection string, `Probe_path1` and `IP1` will be ignored.

---

## Path-Based Routing

The `path` parameter in the connection string controls which requests are routed to each host.

### How Path Matching Works

1. **Specific paths take precedence**: Hosts with explicit paths (e.g., `/api/v1`) are matched before catch-all hosts.
2. **Path prefix is stripped**: When forwarding to a matched host, the matching prefix is removed from the request path.
3. **Catch-all fallback**: Hosts with `path=/` or no path specified handle requests that don't match any specific path.

### Path Matching Examples

**Configuration:**
```bash
Host1="host=https://chat-service.internal;path=/chat"
Host2="host=https://embed-service.internal;path=/embeddings"
Host3="host=https://default-service.internal;path=/"
```

**Request Routing:**

| Incoming Request | Matched Host | Forwarded Path |
|-----------------|--------------|----------------|
| `GET /chat/completions` | Host1 | `GET /completions` |
| `POST /embeddings/create` | Host2 | `POST /create` |
| `GET /models` | Host3 | `GET /models` |
| `GET /chat` | Host1 | `GET /` |

### Path Configuration Options

| Path Value | Behavior |
|------------|----------|
| `/api/v1` | Matches requests starting with `/api/v1`, strips prefix |
| `/api/v1/*` | Same as above (wildcard is implicit) |
| `/` | Catch-all, matches any path, no stripping |
| `/*` | Same as `/` |
| (empty) | Same as `/` |

### Best Practices

1. **Use specific paths for service isolation**: Route different AI models or API versions to dedicated backends.
2. **Always have a catch-all**: Include at least one host with `path=/` to handle unexpected routes.
3. **Avoid overlapping paths**: If you have `/api` and `/api/v1`, the more specific path (`/api/v1`) should be tried first.

See [LOAD_BALANCING.md](LOAD_BALANCING.md) for details on how hosts are selected after path filtering.

44 changes: 43 additions & 1 deletion docs/CIRCUIT_BREAKER.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,46 @@ Control the sensitivity of the circuit breaker using these environment variables
* **Tolerant**: Set `CBErrorThreshold=100`. Useful for "flaky" non-critical backends where you strictly prefer retries over disabling the host.

## Global Safety Net
The proxy monitors the state of **all** circuit breakers. If **all** configured backends are tripped (meaning the entire backend tier is down), the proxy may enter a fail-safe mode or return a `503 Service Unavailable` to the client immediately, protecting the proxy itself from resource exhaustion.

The proxy monitors the state of **all** circuit breakers. If **all** configured backends are tripped (meaning the entire backend tier is down), the proxy returns a `503 Service Unavailable` to the client immediately, protecting the proxy itself from resource exhaustion.

---

## Integration with Load Balancing

The circuit breaker is checked **per-host** during the backend selection loop. This means:

1. **A single tripped host doesn't block the request** - the proxy simply skips to the next host in the iterator.
2. **Healthy hosts continue receiving traffic** - only the failing host is isolated.
3. **Automatic recovery** - as the circuit closes, traffic resumes without manual intervention.

### Request Flow with Circuit Breaker

```
FOR EACH HOST in load balancer:
├─ CheckFailedStatus() ──[OPEN]──► SKIP (log and continue to next host)
└─[CLOSED]──► Send request to host
├─[Success]──► Return response ✓
└─[Failure]──► Record failure, try next host
(may trip circuit if threshold exceeded)
```

### Example Scenario

```
Hosts: [A, B, C]
Circuit Breaker Status: A=OPEN, B=CLOSED, C=CLOSED

Request arrives:
1. Iterator selects Host A → Circuit OPEN → SKIP
2. Iterator selects Host B → Circuit CLOSED → Send request → 200 OK ✓

Result: Request succeeds despite Host A being unhealthy
```

See [LOAD_BALANCING.md](LOAD_BALANCING.md) for details on how hosts are selected and iterated.

222 changes: 222 additions & 0 deletions docs/CONFIGURATION_SETTINGS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
# BackendOptions Settings - Organized by Restart Requirement

## Legend

| Tag | Description |
|-----|-------------|
| **[WARM]** | Hot-reloadable (read per-request or periodically refreshed) |
| **[DRAIN]** | Requires draining all workers (stop accepting, wait for in-flight to complete) |
| **[COLD]** | Requires cold restart (read once at startup, configures DI/infrastructure) |
| **[PARTIAL]** | Mixed - some settings WARM, others COLD/DRAIN |

---

## [WARM] Settings - Can be changed without restart

### Async (per-request settings)

| Setting | Property Name |
|---------|---------------|
| Timeout | `AsyncTimeout` |
| TTLSecs | `AsyncTTLSecs` |
| TriggerTimeout | `AsyncTriggerTimeout` |
| ClientRequestHeader | `AsyncClientRequestHeader` |
| ClientConfigFieldName | `AsyncClientConfigFieldName` |

### Logging - Read per-request or per-event

| Setting | Property Name |
|---------|---------------|
| Console | `LogConsole` |
| ConsoleEvent | `LogConsoleEvent` |
| Poller | `LogPoller` |
| Probes | `LogProbes` |
| Headers | `LogHeaders` |
| AllRequestHeaders | `LogAllRequestHeaders` |
| AllRequestHeadersExcept | `LogAllRequestHeadersExcept` |
| AllResponseHeaders | `LogAllResponseHeaders` |
| AllResponseHeadersExcept | `LogAllResponseHeadersExcept` |

### Request - Read per-request

| Setting | Property Name |
|---------|---------------|
| MaxAttempts | `MaxAttempts` |
| TimeoutHeader | `TimeoutHeader` |
| TTLHeader | `TTLHeader` |
| DefaultTTLSecs | `DefaultTTLSecs` |
| RequiredHeaders | `RequiredHeaders` |
| StripHeaders | `StripRequestHeaders` |
| DisallowedHeaders | `DisallowedHeaders` |
| DependencyHeaders | `DependancyHeaders` |

### Response - Read per-response

| Setting | Property Name |
|---------|---------------|
| StripHeaders | `StripResponseHeaders` |

### StatusCodes - Read per-response

| Setting | Property Name |
|---------|---------------|
| Acceptable | `AcceptableStatusCodes` |

### Validation - Read per-request

| Setting | Property Name |
|---------|---------------|
| Headers | `ValidateHeaders` |
| AuthAppID.Enabled | `ValidateAuthAppID` |
| AuthAppID.Url | `ValidateAuthAppIDUrl` |
| AuthAppID.FieldName | `ValidateAuthAppFieldName` |
| AuthAppID.Header | `ValidateAuthAppIDHeader` |

### Server (metadata only)

| Setting | Property Name |
|---------|---------------|
| IDStr | `IDStr` |
| ContainerApp | `ContainerApp` |
| Revision | `Revision` |

---

## [DRAIN] Settings - Require stopping all workers before restart

> ⚠️ These settings affect shared state, external connections, or would cause inconsistency during rolling update. Drain all in-flight requests before changing.

### Async - Switching modes or connections with in-flight requests causes data loss

| Setting | Property Name | Reason |
|---------|---------------|--------|
| Enabled | `AsyncModeEnabled` | Mode switch with in-flight requests |
| BlobStorage.ConnectionString | `AsyncBlobStorageConnectionString` | Connection change with pending writes |
| BlobStorage.UseMI | `AsyncBlobStorageUseMI` | Auth change with pending writes |
| BlobStorage.AccountUri | `AsyncBlobStorageAccountUri` | Connection change with pending writes |
| ServiceBus.ConnectionString | `AsyncSBConnectionString` | Connection change with pending messages |
| ServiceBus.Queue | `AsyncSBQueue` | Queue change with pending messages |
| ServiceBus.UseMI | `AsyncSBUseMI` | Auth change with pending messages |
| ServiceBus.Namespace | `AsyncSBNamespace` | Namespace change with pending messages |

### Hosts - Changing backends with in-flight requests causes routing errors

| Setting | Property Name | Reason |
|---------|---------------|--------|
| Hosts | `Hosts` | Backend routing changes |

### LoadBalancing - Changing strategy mid-flight causes uneven distribution

| Setting | Property Name | Reason |
|---------|---------------|--------|
| Mode | `LoadBalanceMode` | Strategy change mid-flight |
| IterationMode | `IterationMode` | Iterator behavior change |
| UseSharedIterators | `UseSharedIterators` | State inconsistency with active iterators |

### OAuth - Changing auth mid-flight causes 401s on in-flight requests

| Setting | Property Name | Reason |
|---------|---------------|--------|
| Enabled | `UseOAuth` | Auth change mid-flight |
| UseGov | `UseOAuthGov` | Endpoint change mid-flight |
| Audience | `OAuthAudience` | Token audience change |

### Server - Infrastructure changes with active queue

| Setting | Property Name | Reason |
|---------|---------------|--------|
| Port | `Port` | Listener stop required |
| Workers | `Workers` | Worker count with active queue |
| MaxQueueLength | `MaxQueueLength` | Queue resize with pending requests |

### Storage - Storage changes with pending writes = data loss

| Setting | Property Name | Reason |
|---------|---------------|--------|
| DbEnabled | `StorageDbEnabled` | Toggling with pending writes |
| DbContainerName | `StorageDbContainerName` | Container change with pending writes |

---

## [COLD] Settings - Require restart but can use rolling update

### Async.BlobStorage

| Setting | Property Name |
|---------|---------------|
| WorkerCount | `AsyncBlobWorkerCount` |

### CircuitBreaker - Configured at startup

| Setting | Property Name |
|---------|---------------|
| ErrorThreshold | `CircuitBreakerErrorThreshold` |
| Timeslice | `CircuitBreakerTimeslice` |

### HealthProbe - Timer and sidecar client created at startup

| Setting | Property Name |
|---------|---------------|
| Sidecar | `HealthProbeSidecar` |
| SidecarEnabled | `HealthProbeSidecarEnabled` |
| SidecarUrl | `HealthProbeSidecarUrl` |

### Hosts

| Setting | Property Name |
|---------|---------------|
| HostName | `HostName` |

### LoadBalancing.SharedIterator

| Setting | Property Name |
|---------|---------------|
| TTLSeconds | `SharedIteratorTTLSeconds` |
| CleanupIntervalSeconds | `SharedIteratorCleanupIntervalSeconds` |

### Polling - Poller timer configured at startup

| Setting | Property Name |
|---------|---------------|
| Interval | `PollInterval` |
| Timeout | `PollTimeout` |
| SuccessRate | `SuccessRate` |

### Request

| Setting | Property Name |
|---------|---------------|
| Timeout | `Timeout` (HttpClient timeout) |

### Server

| Setting | Property Name |
|---------|---------------|
| TerminationGracePeriodSeconds | `TerminationGracePeriodSeconds` |
| TrackWorkers | `TrackWorkers` |

---

## [PARTIAL] Settings - Mixed restart requirements

### Priority

| Setting | Property Name | Requirement |
|---------|---------------|-------------|
| Default | `DefaultPriority` | [WARM] |
| KeyHeader | `PriorityKeyHeader` | [WARM] |
| Keys | `PriorityKeys` | [WARM] |
| Values | `PriorityValues` | [WARM] |
| Workers | `PriorityWorkers` | [DRAIN] |

### User

| Setting | Property Name | Requirement |
|---------|---------------|-------------|
| IDFieldName | `UserIDFieldName` | [WARM] |
| ProfileHeader | `UserProfileHeader` | [WARM] |
| ConfigUrl | `UserConfigUrl` | [WARM] |
| PriorityThreshold | `UserPriorityThreshold` | [WARM] |
| UniqueHeaders | `UniqueUserHeaders` | [WARM] |
| SuspendedConfigUrl | `SuspendedUserConfigUrl` | [WARM] |
| UseProfiles | `UseProfiles` | [COLD] |
Loading