diff --git a/.azure/README.md b/.azure/README.md index 38a2645c..e91b14a5 100644 --- a/.azure/README.md +++ b/.azure/README.md @@ -6,7 +6,7 @@ This guide helps you deploy the SimpleL7Proxy service to Azure using the Azure D 1. [Azure Developer CLI (AZD)](https://learn.microsoft.com/en-us/azure/developer/azure-developer-cli/install-azd) 2. [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli) -3. [Docker](https://www.docker.com/products/docker-desktop/) +3. [Docker](https://www.docker.com/products/docker-desktop/)(optional; only needed for local container builds) ## Deployment Steps @@ -58,6 +58,13 @@ This command will deploy the Azure infrastructure defined in the Bicep templates ### 4. Build and Deploy the Application +Important +Current deployment scripts are Docker-based. + +deploy.sh and deploy.ps1 build and push images using local Docker. +deploy.sh expects images already built by Docker-based build scripts. +If Docker is unavailable, use remote ACR build commands from CONTAINER_DEPLOYMENT.md and then update/deploy using the resulting image tags. + #### For Windows: ```powershell diff --git a/.azure/local-setup.sh b/.azure/local-setup.sh index 79428f19..ca1394ce 100644 --- a/.azure/local-setup.sh +++ b/.azure/local-setup.sh @@ -111,6 +111,13 @@ echo "# Primary backend host and probe" >> "$local_config_file" # Ensure Host1 is exported here so child processes see it and it won't be overridden by seeded defaults echo "export Host1=$Host1" >> "$local_config_file" echo "export Probe_path1=$Probe_path1" >> "$local_config_file" +# Extract backend port from Host1 +backend1_port=$(printf "%s" "$Host1" | sed -n 's|.*:\([0-9]*\)$|\1|p') +if [ -z "$backend1_port" ]; then + backend1_port="3000" +fi +echo "export BACKEND1_PORT=$backend1_port" >> "$local_config_file" +echo "# Proxy listening port" >> "$local_config_file" # Export port (queried by the script) so the dynamic section contains all interactive values echo "export Port=$proxy_port" >> "$local_config_file" @@ -184,8 +191,8 @@ EOF dotnet_dir="$script_dir/../test/nullserver/dotnet" sed -e "s|{{PY_DIR}}|$py_dir|g" \ -e "s|{{DOTNET_DIR}}|$dotnet_dir|g" \ - -e "s|{{START_CMD_PY}}|source $local_config_file && python3 stream_server.py --port \$BACKEND1_PORT|g" \ - -e "s|{{START_CMD_DOTNET}}|source $local_config_file && dotnet run --urls http://localhost:\$BACKEND1_PORT|g" \ + -e "s|{{START_CMD_PY}}|source $local_config_file \&\& python3 stream_server.py --port \$BACKEND1_PORT|g" \ + -e "s|{{START_CMD_DOTNET}}|source $local_config_file \&\& dotnet run --urls http://localhost:\$BACKEND1_PORT|g" \ "$script_dir/scenarios/null_server.txt" | sed 's/^/ /' # Print proxy run instructions using template sed -e "s|{{LOCAL_CONFIG_FILE}}|$local_config_file|g" "$script_dir/scenarios/proxy_run.txt" | sed 's/^/ /' @@ -197,7 +204,7 @@ cat < See [Development & Testing](docs/DEVELOPMENT.md) for local mock backends. -> See [Container Deployment](docs/CONTAINER_DEPLOYMENT.md) for VNET and high-performance variants. +> No local Docker available? Use the remote ACR build workflow in [docs/CONTAINER_DEPLOYMENT.md](docs/CONTAINER_DEPLOYMENT.md). +> See [Getting Started — Local Development](docs/BEGINNER_DEVELOPMENT.md) for the fastest setup paths. +> +--- + +## Local Development Paths + +**Fastest: Port + Backend Only** +```bash +export Port=8080 +export Host1=http://localhost:3000 +dotnet run +``` + +**Second-fastest: Azure App Configuration** +```bash +export AZURE_APPCONFIG_ENDPOINT=https://your-appconfig.azconfig.io +export AZURE_APPCONFIG_LABEL=dev +dotnet run +``` + +→ **Need mock backends?** See [DUMMY_BACKEND.md](docs/DUMMY_BACKEND.md) for null server and Python HTTP server setups. +→ **Need help diagnosing?** See [TroubleshootTOC.md](docs/TroubleshootTOC.md) for issue-driven guidance. --- @@ -89,8 +118,11 @@ chmod +x .azure/setup.sh .azure/deploy.sh | AI Foundry Integration | [docs/AI_FOUNDRY_INTEGRATION.md](docs/AI_FOUNDRY_INTEGRATION.md) | | APIM Policy | [APIM-Policy/readme.md](APIM-Policy/readme.md) | | Container Deployment | [docs/CONTAINER_DEPLOYMENT.md](docs/CONTAINER_DEPLOYMENT.md) | -| Development & Testing | [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md) | +| Getting Started — Local Development | [docs/BEGINNER_DEVELOPMENT.md](docs/BEGINNER_DEVELOPMENT.md) | +| Advanced Development & Tuning | [docs/ADVANCED_DEVELOPMENT.md](docs/ADVANCED_DEVELOPMENT.md) | +| Mock Backends for Testing | [docs/DUMMY_BACKEND.md](docs/DUMMY_BACKEND.md) | | Response Codes | [docs/RESPONSE_CODES.md](docs/RESPONSE_CODES.md) | +| Troubleshooting (Quick Diagnosis TOC) | [docs/TroubleshootTOC.md](docs/TroubleshootTOC.md) | --- diff --git a/ReleaseNotes/version2.2.md b/ReleaseNotes/version2.2.md index 09797679..e1bd4e6f 100644 --- a/ReleaseNotes/version2.2.md +++ b/ReleaseNotes/version2.2.md @@ -1,5 +1,24 @@ # Release Notes # +2.2.11.0 + +Proyx: +* Code restructure for Async work +* Streamline event draining process for SB and file logger +* Refactor blob writer methods +* Load async templates at startup +* Bug fix: don't activate ciruit breaker on async shutdown +* Cleanup async shutdown +* Bug fix: possible loss of async Queue message +* Add configurations for AsyncBlobMaxQueue and AsyncStreamingBufferSizeBytes +* Don't trigger Async if AsyncBlobMaxQueue is exceeded +* Bug fix: don't cache the streaming response when writing to a blob +* Optimize the blob writer for throughput for streaming and file writing +* If the templates folder is not readable, turn off async mode +* Bug fix: storage account by connection was not working +Deployment: +* Create a deployment script to configure storage account + 2.2.10.7 Proxy: diff --git a/deployment/AppConfiguration/deploy.sh b/deployment/AppConfiguration/deploy.sh index 9ba9ed8b..74966a8e 100644 --- a/deployment/AppConfiguration/deploy.sh +++ b/deployment/AppConfiguration/deploy.sh @@ -228,7 +228,7 @@ mapfile -t CONFIG_ENTRIES < <( if ($0 ~ /^[[:space:]]*\[/) continue; if ($0 ~ /^[[:space:]]*public[[:space:]]+/) { - if (match($0, /^[[:space:]]*public[[:space:]]+[^ ]+[[:space:]]+([A-Za-z_][A-Za-z0-9_]*)[[:space:]]*\{/, p)) { + if (match($0, /^[[:space:]]*public[[:space:]].*[[:space:]]([A-Za-z_][A-Za-z0-9_]*)[[:space:]]*\{/, p)) { prop = p[1]; # Extract default value from "} = VALUE;" pattern defVal = ""; @@ -278,6 +278,9 @@ DEFAULT_COUNT=0 WARM_COUNT=0 COLD_COUNT=0 +# Accumulate env var → default value mappings for JSON output +declare -a ENV_DEFAULT_ENTRIES + # Build a single JSON file for batch import (all keys, single label). IMPORT_JSON_FILE="$(mktemp)" trap 'rm -f "${IMPORT_JSON_FILE}"' EXIT @@ -355,6 +358,9 @@ for entry in "${CONFIG_ENTRIES[@]}"; do if [ "${SOURCE}" = "cs-default" ] || [ "${SOURCE}" = "placeholder" ]; then DEFAULT_COUNT=$((DEFAULT_COUNT + 1)) fi + + # Record env var name and its default value for final JSON output + ENV_DEFAULT_ENTRIES+=("${ENV_NAME}|${CS_DEFAULT}") done # Add Sentinel and RefreshSeconds to the import batch (always Warm) @@ -401,3 +407,31 @@ echo -e "${GREEN}Label: ${APPCONFIG_LABEL:-(none)}${NC}" echo -e "${GREEN}Config keys published: ${SET_COUNT} (Warm: ${WARM_COUNT}, Cold: ${COLD_COUNT})${NC}" echo -e "${GREEN} of which ${DEFAULT_COUNT} used C# default or '${DEFAULT_PLACEHOLDER}' placeholder${NC}" echo -e "${GREEN}======================================${NC}" + +# ---------------------------------------------------------------------------- +# Output JSON mapping: ENVIRONMENT_VARIABLE → default value +# ---------------------------------------------------------------------------- +echo "" +echo -e "${YELLOW}Environment Variable Defaults (JSON):${NC}" +ENV_JSON="{" +ENV_JSON_FIRST=true +for edentry in "${ENV_DEFAULT_ENTRIES[@]}"; do + ED_NAME="$(echo "${edentry}" | cut -d'|' -f1)" + ED_DEFAULT="$(echo "${edentry}" | cut -d'|' -f2-)" + # Use null for empty defaults + if [ -z "${ED_DEFAULT}" ]; then + ED_JSON_VAL="null" + else + # Escape for JSON + ED_ESCAPED="$(printf '%s' "${ED_DEFAULT}" | sed 's/\\/\\\\/g; s/"/\\"/g')" + ED_JSON_VAL="\"${ED_ESCAPED}\"" + fi + if [ "${ENV_JSON_FIRST}" = true ]; then + ENV_JSON_FIRST=false + else + ENV_JSON+="," + fi + ENV_JSON+="$(printf '\n "%s": %s' "${ED_NAME}" "${ED_JSON_VAL}")" +done +ENV_JSON+=$'\n}' +echo "${ENV_JSON}" diff --git a/deployment/BlobStorage/README.md b/deployment/BlobStorage/README.md new file mode 100644 index 00000000..20fe05a1 --- /dev/null +++ b/deployment/BlobStorage/README.md @@ -0,0 +1,157 @@ +# Azure Blob Storage Deployment + +Provisions the Azure Storage account that SimpleL7Proxy uses for **async-mode** +artifacts (response/header blobs) and for the **templates** container that +holds the async response templates (`welcome.json`, `notready.json`, +`notauthorized.json`). + +The script focuses on **storage account setup and the Container App's +connection to it** — it does not touch App Configuration, Service Bus, or +the Container App's environment variables. + +## What the Script Does + +1. **Reads the live Container App** to obtain (or create) its + system-assigned managed identity. +2. **Creates the resource group and storage account** if they don't + already exist: + - `kind=StorageV2` + - `--allow-blob-public-access true` + - `--public-network-access Enabled` (required for the Container App to + reach the blob endpoint) + - `--min-tls-version TLS1_2` +3. **Assigns RBAC** to the Container App's managed identity on the storage + account scope. Default role: **`Storage Blob Data Contributor`** — the + proxy both reads templates and writes async result blobs, so Reader is + not sufficient. +4. **Optionally creates blob containers** (`templates`, `simplel7proxy`) + when `CREATE_CONTAINERS=true`. Container creation uses + `--auth-mode login`, so the signed-in user is JIT-granted + `Storage Blob Data Contributor` on the account first (with a 30 s wait + for RBAC propagation). This works whether or not shared-key auth is + enabled on the account. + +The script is **idempotent** — re-running it skips work that's already done. + +## Prerequisites + +| Requirement | Details | +|---|---| +| **Azure CLI** | `az` ≥ 2.50 with the `containerapp` extension | +| **jq** | Used to parse the Container App JSON | +| **Azure login** | `az login` (the script will prompt if needed) | +| **A running Container App** | The script reads its identity and enables it if absent | +| **Bash 4+** | Uses `${VAR,,}` lowercase expansion | + +## Quick Start + +```bash +cd deployment/BlobStorage + +# 1. Create your parameters file +cp deploy.parameters.example.sh deploy.parameters.sh + +# 2. Edit deploy.parameters.sh with your values +# (see Parameters section below) + +# 3. Run +./deploy.sh +``` + +## Parameters + +All parameters are set in `deploy.parameters.sh`. + +### Required + +| Parameter | Description | +|---|---| +| `CONTAINER_APP_NAME` | Container App that will read/write blobs using its managed identity | +| `CONTAINER_APP_RESOURCE_GROUP` | Resource group where the Container App lives | +| `RESOURCE_GROUP` | Resource group for the storage account (created if missing) | +| `LOCATION` | Azure region for the storage account | +| `STORAGE_ACCOUNT_NAME` | Globally unique storage account name (3–24 lowercase alphanumeric) | + +### Optional + +| Parameter | Default | Description | +|---|---|---| +| `STORAGE_SKU` | `Standard_LRS` | Storage replication SKU. Short forms (`lrs`, `grs`, `zrs`, `ragrs`) are normalized. | +| `CREATE_CONTAINERS` | `false` | When `true`, creates the containers listed in `BLOB_CONTAINERS`. | +| `BLOB_CONTAINERS` | `templates simplel7proxy` | Space-separated list of containers to create when `CREATE_CONTAINERS=true`. | +| `CA_BLOB_ROLE` | `Storage Blob Data Contributor` | Role assigned to the Container App's managed identity on the storage account. The proxy writes blobs, so Reader is not enough. | + +> **Do not commit `deploy.parameters.sh`** — it contains environment-specific values. +> Only `deploy.parameters.example.sh` is checked in. + +## Containers Used by the Proxy + +| Container | Purpose | +|---|---| +| `templates` | Holds async-mode response JSON templates (`welcome.json`, `notready.json`, `notauthorized.json`). Loaded once at startup by `TemplateLoader`. | +| `simplel7proxy` | Default `data` container for async result/header blobs (override per user via the `async-config` user-profile field). | + +After the script creates the `templates` container, you must upload the +template files yourself: + +```bash +az storage blob upload-batch \ + --account-name "${STORAGE_ACCOUNT_NAME}" \ + --destination templates \ + --source ../../src/SimpleL7Proxy/templates \ + --auth-mode login \ + --overwrite +``` + +## How the Proxy Connects + +The proxy reads its blob endpoint from the `AsyncBlobStorageConfig` +setting (in App Configuration or as an env var). Two formats are +accepted: + +1. **Comma-separated** (managed identity): + ``` + blobserviceuri=https://.blob.core.windows.net, useMI=true + ``` +2. **Raw portal connection string** (key-based, treated as + `useMI=false`): + ``` + DefaultEndpointsProtocol=https;AccountName=;AccountKey=...;EndpointSuffix=core.windows.net + ``` + +This script's role assignment makes option (1) work: the Container App's +managed identity gets `Storage Blob Data Contributor` on the storage +account, and the proxy authenticates with `DefaultAzureCredential`. + +## RBAC Notes + +- **Container App MI** — assigned `${CA_BLOB_ROLE}` (default + `Storage Blob Data Contributor`) on the storage account scope. +- **Signed-in user** — only when `CREATE_CONTAINERS=true`, the script JIT + assigns `Storage Blob Data Contributor` to the operator running it so + that `az storage container create --auth-mode login` succeeds. This + assignment is **not removed** by the script; remove it manually if your + org's policy requires it. +- RBAC propagation can take a few minutes. The script sleeps 30 s after + granting the operator role; if subsequent commands still 403, wait and + re-run. + +## Re-running + +The script is idempotent: + +- Existing resource group / storage account / containers are reused. +- Existing role assignments are detected and skipped. +- Safe to run repeatedly to verify state. + +## Cleanup + +```bash +az storage account delete \ + --name "${STORAGE_ACCOUNT_NAME}" \ + --resource-group "${RESOURCE_GROUP}" \ + --yes +``` + +Role assignments scoped to the storage account are removed automatically +when the account is deleted. diff --git a/deployment/BlobStorage/deploy.parameters.example.sh b/deployment/BlobStorage/deploy.parameters.example.sh new file mode 100644 index 00000000..c0da7939 --- /dev/null +++ b/deployment/BlobStorage/deploy.parameters.example.sh @@ -0,0 +1,39 @@ +#!/bin/bash + +# Deployment Parameters for SimpleL7Proxy Blob Storage provisioning +# +# 1) Copy this file to deploy.parameters.sh +# 2) Update values for your environment +# 3) Run ./deploy.sh +# +# The script reads the Container App so it can grant its managed +# identity read access to the storage account. +# +# Do not commit deploy.parameters.sh with real values. + +# ============================================================================= +# Container App (consumer that needs read access to the storage account) +# ============================================================================= +export CONTAINER_APP_NAME="myapp" +export CONTAINER_APP_RESOURCE_GROUP="rg-myapp-prod" + +# ============================================================================= +# Storage account +# ============================================================================= +export RESOURCE_GROUP="rg-myapp-storage" +export LOCATION="eastus" +export STORAGE_ACCOUNT_NAME="myappstorage" + +# Storage SKU. Accepted: Standard_LRS, Standard_GRS, Standard_ZRS, Standard_RAGRS +# (short forms "lrs", "grs", "zrs", "ragrs" are also accepted). +export STORAGE_SKU="Standard_LRS" + +# Set to "true" to create the blob containers listed in BLOB_CONTAINERS. +export CREATE_CONTAINERS="true" + +# Space-separated list of blob containers to create when CREATE_CONTAINERS=true. +export BLOB_CONTAINERS="templates simplel7proxy" + +# Role granted to the Container App's managed identity on the storage account. +# Use "Storage Blob Data Contributor" if the proxy must also write blobs. +export CA_BLOB_ROLE="Storage Blob Data Contributor" diff --git a/deployment/BlobStorage/deploy.sh b/deployment/BlobStorage/deploy.sh new file mode 100644 index 00000000..ebe1c361 --- /dev/null +++ b/deployment/BlobStorage/deploy.sh @@ -0,0 +1,245 @@ +#!/bin/bash + +# Deploy/Update Azure Storage Account for SimpleL7Proxy. +# +# What this script does: +# 1. Creates the storage account (if it does not already exist) in the +# configured resource group and region. +# 2. Optionally creates the blob containers used by the proxy +# (`templates` and `simplel7proxy`) when CREATE_CONTAINERS=true. +# 3. Grants the Container App's system-assigned managed identity the +# "Storage Blob Data Contributor" role on the storage account so the +# proxy can read and write blobs at runtime using its managed identity. + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +if [ -f "${SCRIPT_DIR}/deploy.parameters.sh" ]; then + echo "Sourcing deploy.parameters.sh..." + # shellcheck disable=SC1091 + source "${SCRIPT_DIR}/deploy.parameters.sh" +elif [ -f "${SCRIPT_DIR}/deploy.parameters.example.sh" ]; then + echo "deploy.parameters.sh not found." + echo "Copy deploy.parameters.example.sh to deploy.parameters.sh and update values." + echo "Example: cp deploy.parameters.example.sh deploy.parameters.sh" + exit 1 +fi + +# ---------------------------------------------------------------------------- +# Required parameters +# ---------------------------------------------------------------------------- +CONTAINER_APP_NAME="${CONTAINER_APP_NAME:?'CONTAINER_APP_NAME must be set'}" +CONTAINER_APP_RESOURCE_GROUP="${CONTAINER_APP_RESOURCE_GROUP:?'CONTAINER_APP_RESOURCE_GROUP must be set'}" +RESOURCE_GROUP="${RESOURCE_GROUP:?'RESOURCE_GROUP must be set (storage account resource group)'}" +LOCATION="${LOCATION:?'LOCATION must be set (storage account location)'}" +STORAGE_ACCOUNT_NAME="${STORAGE_ACCOUNT_NAME:?'STORAGE_ACCOUNT_NAME must be set'}" + +# ---------------------------------------------------------------------------- +# Optional overrides +# ---------------------------------------------------------------------------- +# Accept STORAGE_SKU but fall back to APPCONFIG_SKU for backwards-compat +# with the existing deploy.parameters.sh template. +STORAGE_SKU="${STORAGE_SKU:-${APPCONFIG_SKU:-Standard_LRS}}" +# Normalize common short forms (e.g. "lrs" -> "Standard_LRS") +case "${STORAGE_SKU,,}" in + lrs) STORAGE_SKU="Standard_LRS" ;; + grs) STORAGE_SKU="Standard_GRS" ;; + zrs) STORAGE_SKU="Standard_ZRS" ;; + ragrs) STORAGE_SKU="Standard_RAGRS" ;; +esac + +CREATE_CONTAINERS="${CREATE_CONTAINERS:-false}" +# Containers to create when CREATE_CONTAINERS=true (space-separated). +BLOB_CONTAINERS="${BLOB_CONTAINERS:-templates simplel7proxy}" + +# Role assigned to the Container App managed identity. The proxy reads +# and writes blobs, so Contributor is required. +CA_BLOB_ROLE="${CA_BLOB_ROLE:-Storage Blob Data Contributor}" + +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +RED='\033[0;31m' +NC='\033[0m' + +# ---------------------------------------------------------------------------- +# Preconditions +# ---------------------------------------------------------------------------- +if ! command -v az >/dev/null 2>&1; then + echo -e "${RED}Error: Azure CLI is not installed.${NC}" + exit 1 +fi + +if ! command -v jq >/dev/null 2>&1; then + echo -e "${RED}Error: jq is not installed.${NC}" + exit 1 +fi + +echo -e "${YELLOW}Checking Azure login status...${NC}" +az account show >/dev/null 2>&1 || az login >/dev/null + +SUBSCRIPTION_ID="$(az account show --query id -o tsv)" +echo -e "${GREEN}Using subscription: ${SUBSCRIPTION_ID}${NC}" + +# ---------------------------------------------------------------------------- +# Read the live Container App (for managed identity principal) +# ---------------------------------------------------------------------------- +echo -e "${YELLOW}Reading Container App '${CONTAINER_APP_NAME}' from '${CONTAINER_APP_RESOURCE_GROUP}'...${NC}" +CA_JSON="$(az containerapp show \ + --name "${CONTAINER_APP_NAME}" \ + --resource-group "${CONTAINER_APP_RESOURCE_GROUP}" \ + -o json)" || { echo -e "${RED}Error: Could not read Container App.${NC}"; exit 1; } + +CA_PRINCIPAL_ID="$(echo "${CA_JSON}" | jq -r '.identity.principalId // empty')" +if [ -z "${CA_PRINCIPAL_ID}" ]; then + echo -e "${YELLOW}Container App has no system-assigned managed identity. Enabling it...${NC}" + az containerapp identity assign \ + --name "${CONTAINER_APP_NAME}" \ + --resource-group "${CONTAINER_APP_RESOURCE_GROUP}" \ + --system-assigned \ + >/dev/null + CA_PRINCIPAL_ID="$(az containerapp show \ + --name "${CONTAINER_APP_NAME}" \ + --resource-group "${CONTAINER_APP_RESOURCE_GROUP}" \ + --query identity.principalId -o tsv)" +fi +echo -e "${GREEN}Container App principalId: ${CA_PRINCIPAL_ID}${NC}" + +# ---------------------------------------------------------------------------- +# Create or reuse storage account +# ---------------------------------------------------------------------------- +echo -e "${YELLOW}Ensuring resource group '${RESOURCE_GROUP}' exists...${NC}" +az group create --name "${RESOURCE_GROUP}" --location "${LOCATION}" >/dev/null + +EXISTING_STORAGE="$(az storage account show \ + --name "${STORAGE_ACCOUNT_NAME}" \ + --resource-group "${RESOURCE_GROUP}" \ + --query name -o tsv 2>/dev/null || true)" + +if [ -z "${EXISTING_STORAGE}" ]; then + # Confirm the name is not taken globally by someone else + NAME_AVAILABLE="$(az storage account check-name --name "${STORAGE_ACCOUNT_NAME}" --query nameAvailable -o tsv)" + if [ "${NAME_AVAILABLE}" != "true" ]; then + REASON="$(az storage account check-name --name "${STORAGE_ACCOUNT_NAME}" --query message -o tsv)" + echo -e "${RED}Error: storage account name '${STORAGE_ACCOUNT_NAME}' is not available: ${REASON}${NC}" + exit 1 + fi + + echo -e "${YELLOW}Creating storage account '${STORAGE_ACCOUNT_NAME}' (${STORAGE_SKU}) in '${LOCATION}'...${NC}" + az storage account create \ + --name "${STORAGE_ACCOUNT_NAME}" \ + --resource-group "${RESOURCE_GROUP}" \ + --location "${LOCATION}" \ + --sku "${STORAGE_SKU}" \ + --kind StorageV2 \ + --allow-blob-public-access true \ + --public-network-access Enabled \ + --min-tls-version TLS1_2 \ + >/dev/null + echo -e "${GREEN}✓ Storage account created${NC}" +else + echo -e "${GREEN}Using existing storage account: ${STORAGE_ACCOUNT_NAME}${NC}" +fi + +STORAGE_RESOURCE_ID="$(az storage account show \ + --name "${STORAGE_ACCOUNT_NAME}" \ + --resource-group "${RESOURCE_GROUP}" \ + --query id -o tsv)" +STORAGE_BLOB_ENDPOINT="$(az storage account show \ + --name "${STORAGE_ACCOUNT_NAME}" \ + --resource-group "${RESOURCE_GROUP}" \ + --query primaryEndpoints.blob -o tsv)" + +# ---------------------------------------------------------------------------- +# Grant the Container App's managed identity read access to the storage account +# ---------------------------------------------------------------------------- +EXISTING_CA_ROLE="$(az role assignment list \ + --assignee "${CA_PRINCIPAL_ID}" \ + --role "${CA_BLOB_ROLE}" \ + --scope "${STORAGE_RESOURCE_ID}" \ + --query "[0].id" -o tsv 2>/dev/null || true)" + +if [ -z "${EXISTING_CA_ROLE}" ]; then + echo -e "${YELLOW}Assigning '${CA_BLOB_ROLE}' role to Container App managed identity (${CA_PRINCIPAL_ID})...${NC}" + az role assignment create \ + --assignee-object-id "${CA_PRINCIPAL_ID}" \ + --assignee-principal-type ServicePrincipal \ + --role "${CA_BLOB_ROLE}" \ + --scope "${STORAGE_RESOURCE_ID}" \ + >/dev/null + echo -e "${GREEN}✓ Role assigned. RBAC propagation may take a few minutes.${NC}" +else + echo -e "${GREEN}Container App managed identity already has '${CA_BLOB_ROLE}' role.${NC}" +fi + +# ---------------------------------------------------------------------------- +# Optionally create blob containers +# ---------------------------------------------------------------------------- +if [ "${CREATE_CONTAINERS,,}" = "true" ]; then + echo -e "${YELLOW}Creating blob containers (CREATE_CONTAINERS=true)...${NC}" + + # Container creation uses Azure AD auth (--auth-mode login) because + # storage accounts may have shared-key auth disabled. The signed-in + # user therefore needs data-plane access on the account. + SIGNED_IN_PRINCIPAL_ID="$(az ad signed-in-user show --query id -o tsv 2>/dev/null || true)" + if [ -n "${SIGNED_IN_PRINCIPAL_ID}" ]; then + EXISTING_USER_ROLE="$(az role assignment list \ + --assignee "${SIGNED_IN_PRINCIPAL_ID}" \ + --role "Storage Blob Data Contributor" \ + --scope "${STORAGE_RESOURCE_ID}" \ + --query "[0].id" -o tsv 2>/dev/null || true)" + + if [ -z "${EXISTING_USER_ROLE}" ]; then + echo -e "${YELLOW}Assigning 'Storage Blob Data Contributor' to current user for container management...${NC}" + az role assignment create \ + --assignee "${SIGNED_IN_PRINCIPAL_ID}" \ + --role "Storage Blob Data Contributor" \ + --scope "${STORAGE_RESOURCE_ID}" \ + >/dev/null + echo -e "${YELLOW}Waiting for RBAC propagation (30s)...${NC}" + sleep 30 + fi + fi + + for CONTAINER in ${BLOB_CONTAINERS}; do + EXISTS="$(az storage container exists \ + --name "${CONTAINER}" \ + --account-name "${STORAGE_ACCOUNT_NAME}" \ + --auth-mode login \ + --query exists -o tsv 2>/dev/null || echo "false")" + + if [ "${EXISTS}" = "true" ]; then + echo -e "${GREEN} ✓ Container '${CONTAINER}' already exists${NC}" + else + echo -e "${YELLOW} Creating container '${CONTAINER}'...${NC}" + az storage container create \ + --name "${CONTAINER}" \ + --account-name "${STORAGE_ACCOUNT_NAME}" \ + --auth-mode login \ + --public-access off \ + >/dev/null + echo -e "${GREEN} ✓ Container '${CONTAINER}' created${NC}" + fi + done +else + echo -e "${GREEN}Skipping container creation (CREATE_CONTAINERS != true).${NC}" +fi + +# ---------------------------------------------------------------------------- +# Summary +# ---------------------------------------------------------------------------- +echo -e "${GREEN}" +echo "==============================================================" +echo " Deployment complete" +echo "==============================================================" +echo " Storage Account : ${STORAGE_ACCOUNT_NAME}" +echo " Resource Group : ${RESOURCE_GROUP}" +echo " Location : ${LOCATION}" +echo " SKU : ${STORAGE_SKU}" +echo " Blob endpoint : ${STORAGE_BLOB_ENDPOINT}" +echo " Container App : ${CONTAINER_APP_NAME} (${CA_BLOB_ROLE})" +if [ "${CREATE_CONTAINERS,,}" = "true" ]; then + echo " Containers : ${BLOB_CONTAINERS}" +fi +echo "==============================================================" +echo -e "${NC}" diff --git a/deployment/proxy-with-sidecar/README.md b/deployment/proxy-with-sidecar/README.md index de8c6214..107e2a5c 100644 --- a/deployment/proxy-with-sidecar/README.md +++ b/deployment/proxy-with-sidecar/README.md @@ -6,7 +6,7 @@ Deploy a multi-container Azure Container App with a health probe sidecar pattern - [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli) installed - Access to an Azure Container Registry (ACR) -- Docker installed (for building images) +- Docker installed (optional; only needed for building images locally) ## Quick Start @@ -41,6 +41,13 @@ export HOST1="host=https://your-api.azure-api.net;mode=apim;path=/;probe=/health ### 2. Build Images +Important +Current deployment scripts are Docker-based. + +deploy.sh and deploy.ps1 build and push images using local Docker. +deploy.sh expects images already built by Docker-based build scripts. +If Docker is unavailable, use remote ACR build commands from CONTAINER_DEPLOYMENT.md and then update/deploy using the resulting image tags. + ```bash cd ../../src/SimpleL7Proxy && ./build.sh cd ../HealthProbe && ./build.sh diff --git a/docs/ADVANCED_DEVELOPMENT.md b/docs/ADVANCED_DEVELOPMENT.md new file mode 100644 index 00000000..abd2b5f4 --- /dev/null +++ b/docs/ADVANCED_DEVELOPMENT.md @@ -0,0 +1,218 @@ +# Advanced Development & Tuning + +For fine-tuning proxy behavior during local development, optimizing throughput, and testing advanced features. + +> **See also:** [BEGINNERDEVELOPMENT.md](BEGINNERDEVELOPMENT.md) for basic local setup. This guide covers performance tuning and feature-specific configuration. + +--- + +## Startup Performance Tuning + +Adjust these settings after the proxy is running successfully with default values. + +> **Units used in this doc:** time values are in **milliseconds** unless the setting name ends with `Secs`. + +| Variable | Default | Description | +|----------|---------|-------------| +| `Workers` | `10` | Concurrent worker count — increase for higher throughput, decrease to reduce resource usage | +| `Timeout` | `1200000` ms (20 min) | Per-host request timeout — lower for faster failure detection, raise for slow backends | +| `MaxQueueLength` | `1000` | Max queued requests before returning 429 — raise if backends are slow, lower to fail fast | +| `PollInterval` | `15000` ms (15 s) | Backend health check frequency — lower for faster circuit breaker recovery, raise to reduce overhead | + +### Quick Tuning Guide + +**For high throughput (many concurrent requests):** +```bash +export Workers=20 +export MaxQueueLength=2000 +export PollInterval=10000 +``` + +**For slow backends:** +```bash +export Timeout=3600000 # 1 hour +export MaxQueueLength=5000 +export PollInterval=30000 # 30 seconds +``` + +**For fast failure detection:** +```bash +export Timeout=300000 # 5 minutes +export MaxQueueLength=500 +export PollInterval=5000 # 5 seconds +``` + +--- + +## Health Probes — Advanced Configuration + +Setup internal sidecar health probes for Kubernetes and orchestration platforms. + +| Variable | Default | Description | +|----------|---------|-------------| +| `HealthProbeSidecar` | `Enabled=false;url=http://localhost:9000` | Sidecar health probe config — format: `Enabled=true/false;url=http://host:port` | +| `PollTimeout` | `3000` ms | Health probe timeout — increase if network is slow | + +### Enabling Sidecar Health Probes + +For Kubernetes or Container Apps with sidecar health checks: + +```bash +export HealthProbeSidecar="Enabled=true;url=http://localhost:9000" +dotnet run +``` + +The proxy will expose: +- `/liveness` — is the proxy running? +- `/readiness` — is the proxy ready to accept requests? +- `/startup` — has the proxy completed startup? + +Use in your Kubernetes probes (e.g., `livenessProbe.httpGet.path=/liveness`). + +--- + +## User Profiles — Advanced Setup + +Multi-tenant configuration with user profile enrichment and access control. + +| Variable | Default | Description | +|----------|---------|-------------| +| `UseProfiles` | `false` | Enable user profile enrichment | +| `UserConfigUrl` | `""` | URL for user profile config (file: or http:) | +| `UserProfileHeader` | `X-UserProfile` | Header to inject with profile data | +| `UserIDFieldName` | `userId` | JSON field used as user identifier | + +### Configuration File Format + +Create a `users.json`: + +```json +{ + "user1": { + "userId": "user1", + "tier": "premium", + "quota": 1000 + }, + "user2": { + "userId": "user2", + "tier": "standard", + "quota": 100 + } +} +``` + +### Setup with Local File + +```bash +export UseProfiles=true +export UserConfigUrl="file:users.json" +export UserProfileHeader="X-UserProfile" +dotnet run +``` + +Now each request with `X-UserID: user1` will be enriched with the profile data. + +### Setup with Remote Config + +```bash +export UseProfiles=true +export UserConfigUrl="http://localhost:8080/api/users" +export UserProfileHeader="X-UserProfile" +dotnet run +``` + +The proxy will periodically fetch user configs from the URL. + +--- + +## Event Hub Logging + +Stream request/response events to Azure Event Hub for centralized logging and analytics. + +| Variable | Default | Description | +|----------|---------|-------------| +| `EVENTHUB_CONNECTIONSTRING` | `""` | Event Hub connection string | +| `EVENTHUB_NAME` | `""` | Event Hub name | +| `EVENTHUB_NAMESPACE` | `""` | Event Hub namespace | +| `EVENT_LOGGERS` | `file` | Comma-separated list of event sinks (file, eventhub) | + +### Setup + +```bash +export EVENTHUB_CONNECTIONSTRING="Endpoint=sb://my-namespace.servicebus.windows.net/;..." +export EVENTHUB_NAME="proxy-events" +export EVENT_LOGGERS="file,eventhub" +dotnet run +``` + +> [!NOTE] +> Event Hub is optional. The default is to write events to a local file only (`EVENT_LOGGERS=file`). + +--- + +## Mock Backends for Load Testing + +Use the included null server to simulate slow/fast backends without actual HTTP calls. + +```bash +# Terminal 1 — start the included mock backend +cd test/nullserver/Python +python streamserver.py --delay=100 # 100ms delay per request + +# Terminal 2 — start the proxy +export Port=8080 +export Host1=http://localhost:3000 +export Workers=20 +export MaxQueueLength=2000 +dotnet run +``` + +Now send load from a test client and monitor queue depth with the `/debug/queues` endpoint (if available). + +--- + +## Debugging Tips + +### Enable all header logging + +```bash +export LogAllRequestHeaders=true +export LogAllResponseHeaders=true +dotnet run +``` + +### Increase log verbosity + +```bash +export LOG_LEVEL=Debug +dotnet run +``` + +### Check queue depth in real-time + +```bash +# Terminal 1 — start the proxy +dotnet run + +# Terminal 2 — send requests in a loop +for i in {1..100}; do curl http://localhost:8080/api/endpoint & done + +# Terminal 3 — check event log +tail -f eventslog.json | jq '.QueueLength' +``` + +--- + +## Performance Baselines + +These are typical baseline metrics on a 4-core machine with 8GB RAM: + +| Setting | Baseline | Notes | +|---------|----------|-------| +| Max concurrent requests | ~500 | With `Workers=10` and healthy backend | +| Throughput | ~1000 req/s | Depends on backend latency | +| Queue depth (typical) | <10 | With balanced load | +| Health check overhead | <1% | With `PollInterval=15000` ms | + +Adjust `Workers` and `PollInterval` based on your workload profiling. + diff --git a/docs/AZURE_APP_CONFIGURATION.md b/docs/AZURE_APP_CONFIGURATION.md index f6521f89..6e334a0f 100644 --- a/docs/AZURE_APP_CONFIGURATION.md +++ b/docs/AZURE_APP_CONFIGURATION.md @@ -28,14 +28,14 @@ All keys share the `Warm:` prefix (single `Select("Warm:*")` query). The **Label | `AZURE_APPCONFIG_ENDPOINT` | One of these two | — | Managed Identity endpoint (recommended) | | `AZURE_APPCONFIG_CONNECTION_STRING` | One of these two | — | Connection string (dev/fallback) | | `AZURE_APPCONFIG_LABEL` | No | *(none)* | Label filter for Warm settings | -| `AZURE_APPCONFIG_REFRESH_SECONDS` | No | `30` | Sentinel poll interval in seconds | +| `AZURE_APPCONFIG_REFRESH_INTERVAL_SECONDS` | No | `30` | Sentinel poll interval in seconds | --- ## How Refresh Works ``` -Every AZURE_APPCONFIG_REFRESH_SECONDS +Every AZURE_APPCONFIG_REFRESH_INTERVAL_SECONDS │ ▼ Check Warm:Sentinel ──changed?──Yes──► Reload ALL Warm settings → apply live @@ -116,7 +116,7 @@ az containerapp update \ --set-env-vars \ AZURE_APPCONFIG_ENDPOINT=https://appconfig-proxy.azconfig.io \ AZURE_APPCONFIG_LABEL=Production \ - AZURE_APPCONFIG_REFRESH_SECONDS=30 + AZURE_APPCONFIG_REFRESH_INTERVAL_SECONDS=30 ``` > [!WARNING] @@ -124,6 +124,71 @@ az containerapp update \ --- +## Automating Setup with the Deploy Script + +**Instead of manual steps 1–4, use `deployment/AppConfiguration/deploy.sh` to automate everything at once.** + +### What the script does + +The script: +1. **Discovers all publishable settings** — parses `src/SimpleL7Proxy/Config/ProxyConfig.cs` for `[ConfigOption(...)]` decorations +2. **Reads current values** — fetches live env vars from a running Container App (fallback to local shell variables) +3. **Creates Warm and Cold prefixed keys** — all settings appear under `Warm:*` and `Cold:*` prefixes in App Configuration +4. **Seeds the full catalog** — ensures every publishable setting is visible in the portal so operators can see all options at a glance +5. **Sets up the sentinel** — initializes `Warm:Sentinel=1` as the refresh trigger +6. **Assigns the role** — grants the Container App's managed identity `App Configuration Data Reader` access + +### When to use it + +- **Initial deployment:** After `azd provision` creates the Container App and App Configuration, run this script to seed all settings at once. +- **Migration:** Moving from environment-variable–only config to App Configuration? This script reads your current Container App env vars and imports them. +- **Catalog sync:** Proxy code added new settings? Re-run the script to discover and publish them automatically. + +### How to run + +```bash +cd deployment/AppConfiguration +cp deploy.parameters.example.sh deploy.parameters.sh +``` + +Edit `deploy.parameters.sh` with your resource details: + +```bash +CONTAINER_APP_NAME="your-proxy-app" +CONTAINER_APP_RESOURCE_GROUP="rg-proxy" +APPCONFIG_NAME="appconfig-proxy" +RESOURCE_GROUP="rg-proxy" +LOCATION="eastus" +APPCONFIG_SKU="standard" +APPCONFIG_LABEL="Production" +AZURE_APPCONFIG_REFRESH_SECONDS=30 +``` + +Run the script: + +```bash +./deploy.sh +``` + +**Output:** +- All keys published to App Configuration under `Warm:*` and `Cold:*` prefixes +- Managed identity role assignment configured +- Container App environment variables (`AZURE_APPCONFIG_ENDPOINT`, `AZURE_APPCONFIG_LABEL`, `AZURE_APPCONFIG_REFRESH_INTERVAL_SECONDS`) set automatically if `UPDATE_CONTAINER_APP_ENV=true` (default) + +> [!NOTE] +> **Container App restart:** The script does NOT restart the Container App automatically. After the first run, either manually restart via `az containerapp update --name ... --force-deploy` or allow the next `.azure/deploy.sh` invocation to pick up the updated environment variables. + +### Worked Example — Automated Setup + +| Step | Command | Result | +|------|---------|--------| +| After `azd provision` | `cd deployment/AppConfiguration && ./deploy.sh` | All proxy settings discovered and seeded; role assigned | +| Check portal | Azure Portal → App Configuration → Configuration Explorer | 30+ keys visible under `Warm:*` and `Cold:*`; `Warm:Sentinel=1` present | +| Restart Container App | `az containerapp update --name your-proxy-app --resource-group rg-proxy --force-deploy` | Container App pulls settings from App Configuration; logs show `✓ Azure App Configuration initialized` | +| Change a Warm setting | Portal: edit `Warm:MaxAttempts=5` → Update `Warm:Sentinel=$(date +%s)` | All instances refresh within 30 s; no restart needed | + +--- + ## Per-Request Override **Rule: To change a Warm setting at runtime, update the key value then bump `Warm:Sentinel` — both steps are required.** @@ -145,7 +210,7 @@ az appconfig kv set \ ``` > [!NOTE] -> All instances pick up the change within `AZURE_APPCONFIG_REFRESH_SECONDS` (default 30 s) — no rolling restart needed. +> All instances pick up the change within `AZURE_APPCONFIG_REFRESH_INTERVAL_SECONDS` (default 30 s) — no rolling restart needed. --- @@ -200,4 +265,4 @@ The proxy emits these log entries around refresh: - [CONFIGURATION_SETTINGS.md](CONFIGURATION_SETTINGS.md) — Full list of all settings and their reload types - [ENVIRONMENT_VARIABLES.md](ENVIRONMENT_VARIABLES.md) — All environment variables -- [DEVELOPMENT.md](DEVELOPMENT.md) — Local development setup +- [BEGINNERDEVELOPMENT.md](BEGINNERDEVELOPMENT.md) — Local development setup diff --git a/docs/BEGINNER_DEVELOPMENT.md b/docs/BEGINNER_DEVELOPMENT.md new file mode 100644 index 00000000..53e1db40 --- /dev/null +++ b/docs/BEGINNER_DEVELOPMENT.md @@ -0,0 +1,106 @@ +# Development and Testing + +Purpose: get a local SimpleL7Proxy instance running quickly, validate core request paths, and diagnose common startup issues without deploying to Azure. + +> **TL;DR** +> - **Fastest path:** set only `Port` and `Host1`, then run `dotnet run`. +> - **Second-fastest path:** point the proxy to Azure App Configuration (`AZURE_APPCONFIG_ENDPOINT`) and run. + +> Need issue-driven guidance? Start at [TroubleshootTOC.md](TroubleshootTOC.md). For advanced tuning, see [ADVANCED_DEVELOPMENT.md](ADVANCED_DEVELOPMENT.md). + +--- + +## Reference — Essential Settings by Feature + +> **Units used in this doc:** time values are in **milliseconds** unless the setting name ends with `Secs`. + +### Startup + +| Variable | Default | Description | +|----------|---------|-------------| +| `Port` | `80` | Proxy listen port | +| `Host1` / `Host2` | — | Backend URLs (at least one required) | +| `AZURE_APPCONFIG_ENDPOINT` | — | App Configuration endpoint URL | +| `AZURE_APPCONFIG_LABEL` | *(none)* | Label filter (use `dev` for local work) | + +--- + +## Diagnosis Checklist + +- Confirm the proxy starts and binds the expected port. +- Confirm at least one backend URL in `Host1..Host9` is reachable before startup. +- Confirm one smoke request succeeds before running load tests. +- If using App Configuration, confirm endpoint and label are set. + +--- + +## Setting Up Locally + +> [!NOTE] +> **Prerequisites:** .NET SDK 10.0+, Git. Docker is optional (for containerized testing). + +### Fastest path — set only Port + Host1 + +```bash +export Port=8080 +export Host1=http://localhost:3000 +dotnet run +``` + +### Second-fastest path — use Azure App Configuration + +For detailed setup, role assignment, and configuration seeding instructions, see [AZURE_APP_CONFIGURATION.md](AZURE_APP_CONFIGURATION.md). + +**Quick start:** +```bash +export AZURE_APPCONFIG_ENDPOINT=https://your-appconfig.azconfig.io +export AZURE_APPCONFIG_LABEL=dev +dotnet run +``` + +--- + +## Next Steps + +You can either use real backends or for a mock backend, see [DUMMY_BACKEND.md](DUMMY_BACKEND.md). + +--- + +## IDE Configuration + +Add `.vscode/launch.json` to start the proxy from VS Code with F5: + +```json +{ + "version": "0.2.0", + "configurations": [ + { + "name": ".NET Core Launch (web)", + "type": "coreclr", + "request": "launch", + "preLaunchTask": "build", + "program": "${workspaceFolder}/bin/Debug/net10.0/SimpleL7Proxy.dll", + "args": [], + "cwd": "${workspaceFolder}", + "stopAtEntry": false, + "env": { + "ASPNETCORE_ENVIRONMENT": "Development", + "Port": "8080", + "Host1": "http://localhost:3000", + "Host2": "http://localhost:5000", + "LogAllRequestHeaders": "true" + } + } + ] +} +``` + +--- + +## Related Documentation + +- [CONFIGURATION_SETTINGS.md](CONFIGURATION_SETTINGS.md) — All environment variables and config keys +- [LOAD_BALANCING.md](LOAD_BALANCING.md) — Backend selection and retry settings +- [CIRCUIT_BREAKER.md](CIRCUIT_BREAKER.md) — Health check and failover configuration +- [OBSERVABILITY.md](OBSERVABILITY.md) — Logging, metrics, and tracing +- [CONTAINER_DEPLOYMENT.md](CONTAINER_DEPLOYMENT.md) — Building and deploying Docker images to Azure diff --git a/docs/CONFIGURATION_CATEGORIES.md b/docs/CONFIGURATION_CATEGORIES.md new file mode 100644 index 00000000..7721bd86 --- /dev/null +++ b/docs/CONFIGURATION_CATEGORIES.md @@ -0,0 +1,218 @@ +# Configuration Settings — By Frequency of Use + +This document categorizes all proxy settings into three groups to guide documentation and operator prioritization. + +> **Documentation Rule:** Focus on **Common** settings in general docs. Only include **Essential** and **Advanced** in specialized docs (deployment guides, troubleshooting, performance tuning). + +--- + +## ESSENTIAL — Core settings required in every deployment + +**These settings must be configured for the proxy to function. Most deployments set all of them.** + +### Backends +| Env Var | Property | Default | Purpose | +|---------|----------|---------|---------| +| `Host1` | (Backend URL) | — | **Required.** First backend URL. Format: `protocol://host[:port][;probe=/path]` | +| `Host2`–`Host9` | (Backend URL) | — | Optional additional backends (up to 9 total) | + +### Server +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `Port` | `Port` | Cold | `80` | Proxy listen port | +| `Workers` | `Workers` | Cold | `10` | Concurrent worker count; tune for your backend throughput | +| `MaxQueueLength` | `MaxQueueLength` | Cold | `1000` | Max queued requests before returning 429 | + +### Request Handling +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `DefaultTimeout` | `Timeout` | Warm | `1200000` ms (20 min) | Per-host request timeout; adjust for your SLAs | +| `MaxAttempts` | `MaxAttempts` | Warm | `10` | Max retries per request | +| `DefaultPriority` | `DefaultPriority` | Warm | `2` | Base priority for requests without priority header | + +--- + +## COMMON — Standard configuration for typical deployments + +**These settings fine-tune behavior for typical use cases. Set them when initial deployment is running.** + +### Request Processing +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `DefaultTTLSecs` | `DefaultTTLSecs` | Warm | `300` s | Request TTL; how long before queue entry expires | +| `AcceptableStatusCodes` | `AcceptableStatusCodes` | Warm | `[200,202,400,401,...]` | Status codes returned without retry | +| `UniqueUserHeaders` | `UniqueUserHeaders` | Warm | `["X-UserID"]` | Headers that identify a unique user for queue tracking | + +### Load Balancing +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `LoadBalanceMode` | `LoadBalanceMode` | Warm | `latency` | `roundrobin`, `latency`, or `random` | + +### Circuit Breaker +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `CBErrorThreshold` | `CircuitBreakerErrorThreshold` | Warm | `50` % | Error % that opens circuit | +| `CBTimeslice` | `CircuitBreakerTimeslice` | Warm | `60` s | Rolling window for error rate | +| `SuccessRate` | `SuccessRate` | Cold | `80` % | Min success rate to keep circuit closed | + +### Health Checking +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `PollInterval` | `PollInterval` | Cold | `15000` ms | Backend health check frequency | +| `PollTimeout` | `PollTimeout` | Cold | `3000` ms | Health check timeout | + +### Logging (Basic) +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `LogAllRequestHeaders` | `LogAllRequestHeaders` | Warm | `false` | Log all inbound headers (for debugging) | +| `LogAllResponseHeaders` | `LogAllResponseHeaders` | Warm | `false` | Log all outbound headers (for debugging) | +| `LogToConsole` | `LogToConsole` | Cold | `["*"]` | Event categories written to stdout | +| `LogToEvents` | `LogToEvents` | Cold | `["async","backend","probe",...]` | Event categories written to event store | + +### User Profiles (if using) +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `UseProfiles` | `UseProfiles` | Warm | `false` | Enable user profile enrichment | +| `UserConfigUrl` | `UserConfigUrl` | Warm | `""` | URL to user config (file: or http:) | +| `UserProfileHeader` | `UserProfileHeader` | Warm | `X-UserProfile` | Header to inject with profile data | + +### Security (Basic) +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `IgnoreSSLCert` | `IgnoreSSLCert` | Cold | `false` | Skip TLS verification (dev/test only) | + +### Async (if enabling basic async) +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `AsyncModeEnabled` | `AsyncModeEnabled` | Cold | `false` | Enable asynchronous request processing | + +--- + +## ADVANCED — Specialized settings for specific scenarios + +**These settings address async pipelines, multi-tenancy, advanced auth, performance tuning, or high-scale deployments. Set only when needed.** + +### Async (Advanced) +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `AsyncBlobStorageConfig` | `AsyncBlobStorageConfig` | Cold | `""` | Blob storage connection string + auth method | +| `AsyncSBConfig` | `AsyncSBConfig` | Cold | `""` | Service Bus connection string + queue name | +| `AsyncBlobWorkerCount` | `AsyncBlobWorkerCount` | Cold | `2` | Worker threads for blob uploads | +| `AsyncTimeout` | `AsyncTimeout` | Warm | `1800000` ms (30 min) | Max backend processing time in async mode | +| `AsyncTTLSecs` | `AsyncTTLSecs` | Warm | `86400` s (24 h) | Async result blob retention period | +| `AsyncTriggerTimeout` | `AsyncTriggerTimeout` | Warm | `10000` ms | Delay before queued request converts to async | +| `AsyncClientRequestHeader` | `AsyncClientRequestHeader` | Warm | `S7PAsyncMode` | Header clients use to enable async mode | +| `AsyncClientConfigFieldName` | `AsyncClientConfigFieldName` | Warm | `async-config` | JSON field in async client config | +| `AsyncClassNames` | `AsyncClassNames` | Cold | `""` | Comma-separated class names allowed in async requests | + +### Auth / App ID Validation (if needed) +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `ValidateAuthAppID` | `ValidateAuthAppID` | Warm | `false` | Enable app ID validation | +| `ValidateAuthAppIDUrl` | `ValidateAuthAppIDUrl` | Warm | `""` | URL to app ID allowlist (file: or http:) | +| `ValidateAuthAppIDHeader` | `ValidateAuthAppIDHeader` | Warm | `X-MS-CLIENT-PRINCIPAL-ID` | Header containing app ID | +| `ValidateAuthAppFieldName` | `ValidateAuthAppFieldName` | Warm | `authAppID` | JSON field name in allowlist | + +### User Profiles (Advanced) +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `UserConfigRequired` | `UserConfigRequired` | Warm | `false` | Reject requests when user config unavailable | +| `SuspendedUserConfigUrl` | `SuspendedUserConfigUrl` | Warm | `""` | URL to suspended user list (file: or http:) | +| `UserIDFieldName` | `UserIDFieldName` | Warm | `userId` | JSON field used as user identifier | +| `UserConfigRefreshIntervalSecs` | `UserConfigRefreshIntervalSecs` | Cold | `3600` s | User config reload frequency | +| `UserSoftDeleteTTLMinutes` | `UserSoftDeleteTTLMinutes` | Cold | `360` min | Soft-deleted user record TTL | + +### Request Headers (Advanced) +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `RequiredHeaders` | `RequiredHeaders` | Warm | `[]` | Headers that must be present or request rejected | +| `DisallowedHeaders` | `DisallowedHeaders` | Warm | `[]` | Headers that must not be present | +| `StripRequestHeaders` | `StripRequestHeaders` | Warm | `[]` | Headers stripped before forwarding to backend | +| `StripResponseHeaders` | `StripResponseHeaders` | Warm | `[]` | Headers stripped from backend response | +| `ValidateHeaders` | `ValidateHeaders` | Warm | `{}` | Header name → expected value validation map | +| `LogHeaders` | `LogHeaders` | Warm | `[]` | Specific headers to log (if not using LogAll*) | +| `LogAllRequestHeadersExcept` | `LogAllRequestHeadersExcept` | Warm | `["Authorization"]` | Headers excluded from full request logging | +| `LogAllResponseHeadersExcept` | `LogAllResponseHeadersExcept` | Warm | `["Api-Key"]` | Headers excluded from full response logging | +| `DependancyHeaders` | `DependancyHeaders` | Warm | `["Backend-Host",...]` | Headers copied from response into event log | + +### Priority Management (Advanced) +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `PriorityKeys` | `PriorityKeys` | Warm | `["12345","234"]` | Known priority key values | +| `PriorityValues` | `PriorityValues` | Warm | `[1,3]` | Priority levels assigned per key | +| `PriorityKeyHeader` | `PriorityKeyHeader` | Warm | `S7PPriorityKey` | Header clients use to pass priority key | +| `UserPriorityThreshold` | `UserPriorityThreshold` | Warm | `0.1` | Max fraction of queue a single user may occupy | + +### Health Probe (Advanced) +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `HealthProbeSidecar` | `HealthProbeSidecar` | Warm | `Enabled=false;url=http://localhost:9000` | Sidecar health probe config | + +### Load Balancing (Advanced) +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `IterationMode` | `IterationMode` | Warm | `SinglePass` | `SinglePass` or `MultiPass` retry mode | + +### OAuth / Security (Advanced) +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `UseOAuth` | `UseOAuth` | Cold | `false` | Enable OAuth token validation | +| `OAuthAudience` | `OAuthAudience` | Cold | `""` | Expected OAuth audience claim | + +### Logging (Advanced) +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `LogToAI` | `LogToAI` | Warm | `[""]` | Event categories sent to Application Insights | +| `APPINSIGHTS_CONNECTIONSTRING` | `AppInsightsConnectionString` | Cold | `""` | Application Insights connection string | +| `EVENTHUB_CONNECTIONSTRING` | `EventHubConnectionString` | Cold | `""` | Event Hub connection string | +| `EVENTHUB_NAME` | `EventHubName` | Cold | `""` | Event Hub name | +| `EVENTHUB_NAMESPACE` | `EventHubNamespace` | Cold | `""` | Event Hub namespace | +| `EVENTHUB_STARTUP_SECONDS` | `EventHubStartupSeconds` | Cold | `10` s | Delay before Event Hub starts sending | +| `EVENTHUB_MAX_RECONNECT_ATTEMPTS` | `EventHubMaxReconnectAttempts` | Cold | `5` | Max reconnect attempts on failure | +| `EVENTHUB_MAX_UNDRAINED_EVENTS` | `MaxUndrainedEvents` | Cold | `100` | Max buffered events before blocking | +| `EVENT_LOGGERS` | `EventLoggers` | Cold | `file` | Comma-separated list of event sinks (file, eventhub, appinsights) | +| `LOGFILE_NAME` | `LogFileName` | Cold | `eventslog.json` | Event log file path | +| `LOGDATETIME` | `LogDateTime` | Cold | `false` | Prefix log entries with timestamp | +| `LOG_LEVEL` | `LogLevel` | Hidden | `Information` | Log level (Debug, Information, Warning, Error) | +| `EVENT_HEADERS` | `EventHeaders` | Cold | `SimpleL7Proxy.Events.CommonEventHeaders` | Event data class name (for custom telemetry) | +| `ReuseEvents` | `ReuseEvents` | Cold | `false` | Reuse event objects across requests (performance optimization) | + +### Server / Performance Tuning (Advanced) +| Env Var | Property | Mode | Default | Purpose | +|---------|----------|------|---------|---------| +| `GC2InternalSecs` | `GC2InternalSecs` | Cold | `300` s | Garbage collection internal cleanup interval | +| `SharedIteratorTTLSeconds` | `SharedIteratorTTLSeconds` | Cold | `60` s | TTL for an unused shared iterator | +| `SharedIteratorCleanupIntervalSeconds` | `SharedIteratorCleanupIntervalSeconds` | Cold | `30` s | Shared iterator cleanup frequency | +| `TERMINATION_GRACE_PERIOD_SECONDS` | `TerminationGracePeriodSeconds` | Cold | `30` s | Graceful shutdown drain window | + +--- + +## Mapping: Where Each Category Appears in Docs + +| Document | Should Discuss | Notes | +|----------|---|---------| +| BEGINNERDEVELOPMENT.md | Essential | Local setup uses basic config | +| CONTAINER_DEPLOYMENT.md | Essential + Common | Initial deployment checklist | +| AZURE_APP_CONFIGURATION.md | Essential + Common | Seed script outputs both | +| CONFIGURATION_SETTINGS.md | All three (with labels) | Complete reference | +| ADVANCED_CONFIGURATION.md | Advanced only | Deep-dive for specialists | +| Troubleshooting guides | Common + Advanced (context-dependent) | E.g., circuit-breaker guide discusses CB thresholds (Common) and error window (Advanced) | +| HEALTH_CHECKING.md | Common (+ Advanced if sidecar) | PollInterval/Timeout are Common; sidecar config is Advanced | + +--- + +## Quick Decision Tree + +**Q: What settings should I change first in a new deployment?** +→ Configure **Essential** group. Your deployment won't work without them. + +**Q: The proxy is working but I want to fine-tune it.** +→ Adjust **Common** group. These directly impact throughput, latency, and reliability. + +**Q: I need async requests / multi-tenancy / advanced auth.** +→ Configure **Advanced** group. Read ADVANCED_CONFIGURATION.md first. + +**Q: Which settings appear in the portal (App Configuration)?** +→ All **Warm** and **Cold** settings (regardless of frequency group). +→ **Hidden** settings are env-var only. + diff --git a/docs/CONFIGURATION_SETTINGS.md b/docs/CONFIGURATION_SETTINGS.md index 1326546a..8e7d57ad 100644 --- a/docs/CONFIGURATION_SETTINGS.md +++ b/docs/CONFIGURATION_SETTINGS.md @@ -32,7 +32,7 @@ Startup | Env Var / Config Name | Property | Default | Description | |----------------------|----------|---------|-------------| -| `AsyncClientRequestHeader` | `AsyncClientRequestHeader` | `AsyncMode` | Request header that enables async mode | +| `AsyncClientRequestHeader` | `AsyncClientRequestHeader` | `S7PAsyncMode` | Request header that enables async mode | | `AsyncClientConfigFieldName` | `AsyncClientConfigFieldName` | `async-config` | JSON field in async client config | | `AsyncTimeout` | `AsyncTimeout` | `1800000` ms (30 min) | Max backend processing time in async mode | | `AsyncTTLSecs` | `AsyncTTLSecs` | `86400` s (24 h) | Async result blob retention | @@ -275,6 +275,6 @@ These are never set via config — the proxy computes them at startup from other ## Related Documentation - [AZURE_APP_CONFIGURATION.md](AZURE_APP_CONFIGURATION.md) — Setting up hot-reload with App Configuration -- [DEVELOPMENT.md](DEVELOPMENT.md) — Local dev setup and minimal required config +- [BEGINNERDEVELOPMENT.md](BEGINNERDEVELOPMENT.md) — Local dev setup and minimal required config - [TIMEOUTS.md](TIMEOUTS.md) — How TTL, Timeout, and AsyncTimeout interact - [LOAD_BALANCING.md](LOAD_BALANCING.md) — LoadBalanceMode, IterationMode, and retry settings diff --git a/docs/CONTAINER_DEPLOYMENT.md b/docs/CONTAINER_DEPLOYMENT.md index 33419d82..d63b6c02 100644 --- a/docs/CONTAINER_DEPLOYMENT.md +++ b/docs/CONTAINER_DEPLOYMENT.md @@ -5,9 +5,8 @@ Build a Docker image from the `src/` directory and run it locally or deploy it t > **TL;DR** > - **Build from `src/`** — the Dockerfile requires `Shared/` and `SimpleL7Proxy/` side-by-side; build context must be `src/`. > - **Probe paths are in the Host connection string** — use `Host1=host=https://api.example.com;probe=/health` (not separate `Probe_path1=` variables). -> - **Fastest path to Azure:** run `.azure/setup.sh` → `azd provision` → `.azure/deploy.sh`. +> - **Fastest path to Azure:** (1) `.azure/setup.sh` (check prerequisites, choose scenario), (2) `azd provision` (create Container App + App Configuration + ACR), (3) `deployment/AppConfiguration/deploy.sh` (seed App Configuration from your config), (4) `.azure/deploy.sh` (build image, push to ACR, update Container App). ---- ## Container Ports @@ -21,6 +20,107 @@ Build a Docker image from the `src/` directory and run it locally or deploy it t --- +## Deployment Workflow — From Code to Production + +**Full end-to-end path with automated scripts:** + +### Step 1: Run setup and provision infrastructure + +```bash +# From repo root +.azure/setup.sh +``` + +**What it does:** +- Checks prerequisites (azd, Azure CLI) +- Authenticates to Azure and selects subscription +- Guides you through selecting a deployment scenario (local-with-cloud, full-cloud, secure-vnet) +- Initializes AZD environment + +**Output:** `.azure/.env` file with resource names and subscription ID. + +--- + +### Step 2: Provision Azure resources (Container App, App Configuration, ACR) + +```bash +azd provision +``` + +**What it does:** +- Creates resource group, Container App, Azure Container Registry, App Configuration store +- Sets up managed identity for Container App +- Configures networking based on your scenario selection +- Exports deployment variables to `.azure/.env` for downstream scripts + +**Time:** ~5–10 minutes. + +--- + +### Step 3: Seed App Configuration with proxy settings + +```bash +cd deployment/AppConfiguration +cp deploy.parameters.example.sh deploy.parameters.sh +# Edit deploy.parameters.sh with your resource names +./deploy.sh +``` + +**What it does:** +- Discovers all publishable settings from `src/SimpleL7Proxy/Config/ProxyConfig.cs` (marked with `[ConfigOption(...)]`) +- Reads current values from the running Container App (or falls back to local shell variables) +- Seeds App Configuration with all discovered keys in both **Warm** (hot-reload) and **Cold** (restart) modes +- Publishes them under prefixes `Warm:*` and `Cold:*` so you can toggle reload behavior instantly from the portal +- Sets up the `Warm:Sentinel` key as the refresh trigger + +**Parameters required:** +- `CONTAINER_APP_NAME`: deployed Container App name +- `CONTAINER_APP_RESOURCE_GROUP`: resource group containing the Container App +- `APPCONFIG_NAME`: App Configuration store name +- `RESOURCE_GROUP`: resource group for App Configuration (usually same as Container App) +- `LOCATION`: Azure region + +**Output:** All settings now visible in Azure Portal **App Configuration > Configuration Explorer**; operators can modify Warm settings without restarting. + +--- + +### Step 4: Build, push image, and update Container App + +```bash +.azure/deploy.sh +``` + +**What it does:** +- Extracts Docker image name and version from `src/SimpleL7Proxy/Constants.cs` +- Logs in to ACR (from AZD environment) +- Builds Docker image from `src/` directory (correct context) +- Pushes image to ACR with version tags (e.g., `simple-l7-proxy:1.0.0`, `simple-l7-proxy:latest`) +- **Optionally:** applies an environment template (Standard Production, High Performance, Cost Optimized, High Availability) +- Updates running Container App to the new image +- Restarts container with updated configuration + +**Output:** Container App running latest image; all Warm settings hot-reloaded within ~30 seconds; Cold settings require a second restart. + +--- + +## Workflow Summary (for quick reference) + +``` +.azure/setup.sh + ↓ (authenticate, select scenario) +azd provision + ↓ (create resources) +cd deployment/AppConfiguration && ./deploy.sh + ↓ (seed all proxy settings into App Config) +.azure/deploy.sh + ↓ (build, push, deploy) +✅ Proxy running in Azure with App Configuration hot-reload enabled +``` + +**Key takeaway:** After the first deployment, you can change Warm settings in the Azure Portal **without restarting**; just update `Warm:Sentinel` to trigger a refresh cycle. + +--- + ## Building the Image **Rule: Use `src/SimpleL7Proxy/build.sh` — it handles the correct build context, version extraction from `Constants.cs`, ACR login, and push automatically.** @@ -78,6 +178,22 @@ docker build -t simplel7proxy:latest -f SimpleL7Proxy/Dockerfile . --- +## Building without Docker (Remote ACR Build) + +If Docker is not available locally (corporate restrictions, CI/CD runners, etc.), build directly in Azure Container Registry: + +```bash +# From repo root +export ACR=myregistry # Your ACR name, without .azurecr.io +export VERSION=v2.2.11 # Or read from Constants.cs + +az acr build +--registry $ACR +--image simple-l7-proxy:$VERSION +--file src/SimpleL7Proxy/Dockerfile +src +``` + ## Running Locally with Docker ### Minimal run @@ -146,7 +262,8 @@ This is the recommended path for provisioning all required Azure resources (ACR, - [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli) - [Azure Developer CLI (azd)](https://learn.microsoft.com/en-us/azure/developer/azure-developer-cli/install-azd) -- Docker +- Docker (optional; only needed for local image builds) +- For no-Docker environments, use the remote ACR build workflow in [Building without Docker (Remote ACR Build)](#building-without-docker-remote-acr-build) ### Step 1 — Setup diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md deleted file mode 100644 index 98998412..00000000 --- a/docs/DEVELOPMENT.md +++ /dev/null @@ -1,210 +0,0 @@ -# Development and Testing - -Get SimpleL7Proxy running locally in under five minutes using the automated setup script, or configure it manually with the steps below. - -> **TL;DR** -> - **Fastest path:** run `.azure/local-setup.sh` — it generates your environment file interactively. -> - **Minimum config:** set `Host1`, `Port`, and `Workers`; everything else has a working default. -> - **Debugging:** add `LogAllRequestHeaders=true` and `LogAllResponseHeaders=true` to see all headers in the log. - ---- - -## Reference — Key Development Settings - -| Variable | Default | Description | -|----------|---------|-------------| -| `Port` | `80` | Proxy listen port | -| `Host1` / `Host2` | — | Backend URLs (at least one required) | -| `Workers` | `10` | Concurrent worker count | -| `Timeout` | `1200000` ms | Per-host request timeout | -| `IgnoreSSLCert` | `false` | Skip TLS verification (dev only) | -| `LogAllRequestHeaders` | `false` | Log every inbound header | -| `LogAllResponseHeaders` | `false` | Log every outbound header | -| `LOGFILE_NAME` | `eventslog.json` | Path for event log output | -| `MaxQueueLength` | `1000` | Max queued requests before 429 | -| `AZURE_APPCONFIG_ENDPOINT` | — | App Configuration endpoint URL | -| `AZURE_APPCONFIG_LABEL` | *(none)* | Label filter (use `dev` for local work) | -| `AZURE_APPCONFIG_REFRESH_SECONDS` | `30` | Sentinel poll interval in seconds | - ---- - -## Setting Up Locally - -**Rule: Use the automated script for the fastest, error-free setup; fall back to manual steps only if the script cannot reach your backends.** - -```bash -cd .azure -./local-setup.sh # interactive wizard → generates .env file -``` - -> [!NOTE] -> **Prerequisites:** .NET SDK 10.0+, Git. Docker is optional (for containerized testing). - -> [!TIP] -> **Troubleshooting:** If the script fails with a permission error, run `chmod +x .azure/local-setup.sh` first. - -### Manual setup - -```bash -export Port=8080 -export Host1=http://localhost:3000 -dotnet run -``` - -### Using Azure App Configuration in dev mode - -**All settings (Warm and Cold) are loaded from App Configuration at startup. Warm settings are then re-applied every `AZURE_APPCONFIG_REFRESH_SECONDS` seconds via the sentinel key — no restart needed for those changes.** - -```bash -export AZURE_APPCONFIG_ENDPOINT=https://nvm2-tc26-appcfg.azconfig.io -export AZURE_APPCONFIG_LABEL=dev -export AZURE_APPCONFIG_REFRESH_SECONDS=30 -dotnet run -``` - -Before running, assign both roles to your developer account on the App Configuration resource: - -```bash -APPCONFIG_ID=$(az appconfig show --name nvm2-tc26-appcfg --query id -o tsv) -USER_ID=$(az ad signed-in-user show --query id -o tsv) - -az role assignment create --role "App Configuration Data Reader" \ - --assignee $USER_ID --scope $APPCONFIG_ID - -az role assignment create --role "App Configuration Data Owner" \ - --assignee $USER_ID --scope $APPCONFIG_ID -``` - -> [!NOTE] -> **Data Reader** is sufficient if you only read settings. **Data Owner** is required if you also update keys or bump the sentinel from the CLI during development. - -> [!TIP] -> **Troubleshooting:** If the proxy fails to connect, run `az login` to refresh your developer credentials — the SDK uses the default Azure credential chain. Role assignments can take a few minutes to propagate. - ---- - -## Running with Mock Backends - -**Rule: Use the included null server for the fastest mock backend; it requires only Python and no extra dependencies.** - -```bash -# Terminal 1 — start the included mock backend -cd test/nullserver/Python -python streamserver.py - -# Terminal 2 — start the proxy pointing at it -export Port=8080 -export Host1=http://localhost:3000 -dotnet run -``` - -> [!NOTE] -> `Host1` must be reachable before the proxy starts or the initial health check will mark it as OPEN. - -> [!TIP] -> **Troubleshooting:** Run `curl http://localhost:3000/` to confirm the mock backend is up before starting the proxy. - ---- - -## Testing Scenarios - -**Rule: Use targeted `curl` commands to exercise priority, TTL, and async paths individually before running load tests.** - -```bash -# Priority + TTL override -curl -H "S7PPriorityKey: 12345" -H "S7PTTL: 60" http://localhost:8080/test - -# Async mode -curl -H "AsyncMode: true" -H "X-UserID: user1" http://localhost:8080/async-test -``` - -```bash -# Load test (curl loop) -for i in {1..100}; do curl -s http://localhost:8080/test & done; wait -``` - -> [!NOTE] -> Async mode also requires `AsyncBlobStorageConnectionString` and `AsyncSBConnectionString` to be set. - -> [!WARNING] -> **Error:** A `429` response during load testing means `MaxQueueLength` was reached — increase it or reduce concurrency. - ---- - -## Container Development - -**Rule: Build the image locally and inject environment variables at `docker run` time; do not bake secrets into the image.** - -```bash -docker build -t proxy-dev -f Dockerfile . - -docker run -p 8080:443 \ - -e Host1=http://host.docker.internal:3000 \ - -e Host2=http://host.docker.internal:5000 \ - -e LogAllRequestHeaders=true \ - -e Workers=5 \ - proxy-dev -``` - -> [!NOTE] -> Use `host.docker.internal` to reach mock backends running on the host from inside the container. - -> [!TIP] -> **Troubleshooting:** If the container exits immediately, check logs with `docker logs ` — a missing `Host1` value is the most common cause. - ---- - -## Worked Example — Full Local Stack - -> **Goal:** Proxy on port 8080 with two nginx mock backends, header logging enabled, 10-worker pool. - -| Step | Command | Expected result | -|------|---------|----------------| -| Start backend 1 | `python -m http.server 3000` | Listening on :3000 | -| Start backend 2 | `python -m http.server 5000` | Listening on :5000 | -| Export config | `export Port=8080 Host1=http://localhost:3000 Host2=http://localhost:5000 Workers=10 LogAllRequestHeaders=true` | — | -| Start proxy | `dotnet run` | `Listening on port 8080` | -| Smoke test | `curl -v http://localhost:8080/` | `200 OK` from backend 1 or 2 | -| Check failover | Stop backend 1; `curl http://localhost:8080/` | `200 OK` routed to backend 2 | - -**Stopping backend 1 while the proxy is running triggers circuit-breaker logic — subsequent requests route automatically to backend 2.** - ---- - -## IDE Configuration - -Add `.vscode/launch.json` to start the proxy from VS Code with F5: - -```json -{ - "version": "0.2.0", - "configurations": [ - { - "name": ".NET Core Launch (web)", - "type": "coreclr", - "request": "launch", - "preLaunchTask": "build", - "program": "${workspaceFolder}/bin/Debug/net10.0/SimpleL7Proxy.dll", - "args": [], - "cwd": "${workspaceFolder}", - "stopAtEntry": false, - "env": { - "ASPNETCORE_ENVIRONMENT": "Development", - "Port": "8080", - "Host1": "http://localhost:3000", - "Host2": "http://localhost:5000", - "LogAllRequestHeaders": "true" - } - } - ] -} -``` - ---- - -## Related Documentation - -- [CONFIGURATION_SETTINGS.md](CONFIGURATION_SETTINGS.md) — All environment variables and config keys -- [LOAD_BALANCING.md](LOAD_BALANCING.md) — Backend selection and retry settings -- [CIRCUIT_BREAKER.md](CIRCUIT_BREAKER.md) — Health check and failover configuration -- [OBSERVABILITY.md](OBSERVABILITY.md) — Logging, metrics, and tracing diff --git a/docs/DUMMY_BACKEND.md b/docs/DUMMY_BACKEND.md new file mode 100644 index 00000000..f184be1c --- /dev/null +++ b/docs/DUMMY_BACKEND.md @@ -0,0 +1,119 @@ +# Dummy / Mock Backends for Local Development + +Purpose: Set up local backend servers for testing the proxy without requiring cloud deployments. + +> **TL;DR** +> - **Fastest:** `cd test/nullserver/Python && python streamserver.py` (already in repo) +> - **Standard Python:** `python -m http.server 3000` (built-in) +> - **Full stack example** at bottom of this doc + +--- + +## Included Null Server (Recommended) + +**Rule: Use the included null server for the fastest mock backend; it requires only Python and no extra dependencies.** + +The proxy repository includes a lightweight mock server in `test/nullserver/Python` that responds to all requests with configurable delays. + +```bash +# Terminal 1 — start the included mock backend +cd test/nullserver/Python +python streamserver.py + +# Terminal 2 — start the proxy pointing at it +export Port=8080 +export Host1=http://localhost:3000 +dotnet run +``` + +> [!NOTE] +> `Host1` must be reachable before the proxy starts or the initial health check will mark it as OPEN. + +> [!TIP] +> **Troubleshooting:** Run `curl http://localhost:3000/` to confirm the mock backend is up before starting the proxy. + +--- + +## Standard Python HTTP Server + +For simple testing, use Python's built-in HTTP server: + +```bash +# Terminal 1 +python -m http.server 3000 + +# Terminal 2 +python -m http.server 5000 + +# Terminal 3 +export Port=8080 +export Host1=http://localhost:3000 +export Host2=http://localhost:5000 +dotnet run +``` + +--- + +## Testing Scenarios + +**Rule: Test the proxy with simple requests before running load tests.** + +```bash +# First, verify the backend is serving the file directly (not through proxy) +curl http://localhost:3000/lorem_ipsum.txt + +# Then test through the proxy (should return same content) +curl http://localhost:8080/lorem_ipsum.txt + +# Simple sequential load test (10 requests, shows status) +for i in {1..10}; do curl -w "Request $i: %{http_code}\n" -o /dev/null http://localhost:8080/lorem_ipsum.txt; done + +# Parallel load test with timing (100 requests, 5 concurrent) +time for i in {1..100}; do curl -s http://localhost:8080/lorem_ipsum.txt > /dev/null & [ $((i % 5)) -eq 0 ] && wait; done +``` + +> [!TIP] +> **Verify backend first:** If `curl http://localhost:3000/lorem_ipsum.txt` returns empty, the backend isn't serving files. Ensure you're running `python -m http.server 3000` **from the `test/nullserver/Python` directory** where lorem_ipsum.txt lives. + +> [!WARNING] +> **Error:** A `429` response during load testing means `MaxQueueLength` was reached — increase it or reduce concurrency. + +--- + +## Container Development with Mock Backends + +**Rule: Build the image locally and inject environment variables at `docker run` time; do not bake secrets into the image.** + +```bash +docker build -t proxy-dev -f Dockerfile . + +docker run -p 8080:443 \ + -e Host1=http://host.docker.internal:3000 \ + -e Host2=http://host.docker.internal:5000 \ + -e LogAllRequestHeaders=true \ + -e Workers=5 \ + proxy-dev +``` + +> [!NOTE] +> Use `host.docker.internal` to reach mock backends running on the host from inside the container. + +> [!TIP] +> **Troubleshooting:** If the container exits immediately, check logs with `docker logs ` — a missing `Host1` value is the most common cause. + +--- + +## Canonical Example — Full Local Stack + +> **Goal:** proxy on port 8080 with two mock backends, header logging enabled, and clear reproduce -> inspect -> verify steps. + +| Step | Command | Expected result | +|------|---------|----------------| +| Start backend 1 | `python -m http.server 3000` | Listening on :3000 | +| Start backend 2 | `python -m http.server 5000` | Listening on :5000 | +| Export config | `export Port=8080 Host1=http://localhost:3000 Host2=http://localhost:5000 Workers=10 LogAllRequestHeaders=true` | — | +| Start proxy | `dotnet run` | `Listening on port 8080` | +| Reproduce + inspect | `curl -v http://localhost:8080/lorem_ipsum.txt` | Lorem ipsum text from backend 1 or 2 | +| Apply fault + verify failover | Stop backend 1; `curl http://localhost:8080/` | `200 OK` routed to backend 2 | + +**Stopping backend 1 while the proxy is running triggers circuit-breaker logic — subsequent requests route automatically to backend 2.** diff --git a/docs/ENVIRONMENT_VARIABLES.md b/docs/ENVIRONMENT_VARIABLES.md index 1d715b41..dd6a6e31 100644 --- a/docs/ENVIRONMENT_VARIABLES.md +++ b/docs/ENVIRONMENT_VARIABLES.md @@ -127,7 +127,7 @@ For production deployments, consider also configuring: | **AsyncBlobStorageUseMI** | bool | Use Managed Identity for Blob Storage (parsed from AsyncBlobStorageConfig). | false | | **AsyncBlobWorkerCount** | int | Number of workers for async blob processing. | 2 | | **AsyncClientConfigFieldName** | string | User profile field name that designates the client configuration. It contains enabled, containername, topic, timeout. | async-config | -| **AsyncClientRequestHeader** | string | Header indicating async mode is requested. | AsyncMode | +| **AsyncClientRequestHeader** | string | Header indicating async mode is requested. | S7PAsyncMode | | **AsyncModeEnabled** | bool | Enables or disables async processing mode. Requires restart. | false | | **AsyncSBConfig** | string | Composite connection string for Azure Service Bus. Format: `cs=,ns=,q=,mi=`. Parsed into individual SB settings. | cs=example-sb-connection-string,ns=example-namespace,q=requeststatus,mi=false | | **AsyncSBConnectionString** | string | Azure Service Bus connection string (parsed from AsyncSBConfig). | example-sb-connection-string | diff --git a/docs/OVERVIEW.md b/docs/OVERVIEW.md index da95ef88..a5b27701 100644 --- a/docs/OVERVIEW.md +++ b/docs/OVERVIEW.md @@ -4,6 +4,8 @@ SimpleL7Proxy is a high-performance, intelligent Layer 7 router engineered to op Unlike proprietary gateways, SimpleL7Proxy is a **fully open-source, self-hosted solution,** offering unparalleled customization for data residency, sovereign cloud requirements (GCC High), and bespoke enterprise logic. +> Need help diagnosing issues quickly? Start at [TroubleshootTOC.md](TroubleshootTOC.md). + ## Core Value Propositions | Challenge | Enterprise-Grade Solution | diff --git a/docs/SIDECAR_DEPLOYMENT.md b/docs/SIDECAR_DEPLOYMENT.md index 4c94f6f8..f7372a70 100644 --- a/docs/SIDECAR_DEPLOYMENT.md +++ b/docs/SIDECAR_DEPLOYMENT.md @@ -84,6 +84,8 @@ export HOST1="host=https://my-api.azure-api.net;mode=apim;path=/;probe=/status-0 ### Step 2 — Build both images +Option A: Local Docker build (fast local iteration) + Both build scripts read `ACR` from `deploy.parameters.sh` automatically. ```bash @@ -102,6 +104,16 @@ Each script: 3. Builds from `src/` (includes `Shared/`) 4. Pushes `$ACR.azurecr.io/myproxy:` and `$ACR.azurecr.io/healthprobe:` respectively +Option B: Remote ACR build (no Docker required) + +Use Option B in corporate/restricted environments or CI/CD runners where Docker is unavailable. + +```bash +# Run from repo root +az acr build --registry $ACR --image myproxy:latest --file src/SimpleL7Proxy/Dockerfile src +az acr build --registry $ACR --image healthprobe:latest --file src/HealthProbe/Dockerfile src +``` + ### Step 3 — Create the Container App and assign RBAC Run `setup.sh` **once** — it creates the Container App with a placeholder image, enables system-assigned managed identity, and grants `AcrPull` on the ACR. It waits 60 seconds for the role to propagate. diff --git a/docs/TroubleshootTOC.md b/docs/TroubleshootTOC.md new file mode 100644 index 00000000..972030c4 --- /dev/null +++ b/docs/TroubleshootTOC.md @@ -0,0 +1,23 @@ +# Troubleshooting + +> **Start here.** Find your symptom in the table below and follow the link to the dedicated guide. + +## Quick Diagnosis + +| Symptom | Guide | +|---------|-------| +| **App Configuration** settings not loading or not refreshing | [App Configuration not loading](troubleshooting/app-configuration.md) | +| **Async expected** but request returns sync (no `202 Accepted`) | [Async expected but 202 never issued](troubleshooting/async-202-never-issued.md) | +| **Async requests** never completing / blobs empty or missing | [Async requests not completing](troubleshooting/async-requests.md) | +| **Backend hosts** not being picked up at startup | [Backend hosts not healthy](troubleshooting/backend-hosts.md) | +| **Event Hub** — no messages arriving | [Event Hub messages not appearing](troubleshooting/event-hub.md) | +| **Health probes failing** / pod keeps restarting | [Health probe failures](troubleshooting/health-probes.md) | +| A backend host is **stuck as unhealthy** / circuit breaker won't close | [Circuit breaker stuck open](troubleshooting/circuit-breaker.md) | +| Clients receiving **400 Bad Request** (`InvalidTTL`) | [Getting 400 / invalid TTL format](troubleshooting/requests-400-invalid-ttl.md) | +| Clients receiving **412 Precondition Failed** | [Getting 412 / TTL expired](troubleshooting/requests-412.md) | +| Clients receiving **429 Too Many Requests** | [Getting 429 responses](troubleshooting/requests-429.md) | +| Clients receiving **503 Service Unavailable** or **502** | [Getting 503 / all backends failing](troubleshooting/requests-503.md) | + + + + diff --git a/docs/troubleshooting/TROUBLESHOOTING_TODO.md b/docs/troubleshooting/TROUBLESHOOTING_TODO.md new file mode 100644 index 00000000..d39d58cb --- /dev/null +++ b/docs/troubleshooting/TROUBLESHOOTING_TODO.md @@ -0,0 +1,27 @@ +# Troubleshooting Backlog + +Purpose: Track recommended troubleshooting guides not yet authored. + +## Prioritized guide backlog + +| Priority | Status | Proposed guide | Primary symptom | Notes | +|----------|--------|----------------|-----------------|-------| +| P1 | DONE | `requests-400-invalid-ttl.md` | `400` with malformed TTL / `InvalidTTL` | Proxy-originated validation failure | +| P1 | TODO | `requests-403-auth-profile.md` | `403` (`DisallowedAppID`, `UnknownProfile`) | AuthAppID / user profile path | +| P1 | TODO | `requests-417-header-validation.md` | `417` missing required header / invalid header | Request validation and header rules | +| P2 | TODO | `requests-408-backend-timeout.md` | `408` backend I/O cancellation / timeout | Distinguish proxy timeout vs backend timeout | +| P2 | TODO | `requests-500-internal-error.md` | `500` unhandled exception / content too large | Include immediate triage signals | +| P2 | TODO | `startup-no-active-hosts.md` | Readiness stays `503`, active hosts `0` | Startup/bootstrap host validation | +| P2 | TODO | `appconfig-label-mismatch.md` | Keys exist but not loaded | `AZURE_APPCONFIG_LABEL` mismatch | +| P3 | TODO | `eventhub-startup-disabled.md` | Event Hub backend silently disabled at startup | File logging continues, Event Hub absent | +| P3 | DONE | `async-202-never-issued.md` | Expected async but always sync | Trigger timeout + header + profile conditions | +| P3 | TODO | `high-latency-without-errors.md` | High queue latency without 5xx | Queue pressure and CB delay band | + +## Authoring contract reminder + +All new guides should follow the troubleshooting guide contract in `.github/copilot-instructions.md`: +- failure-first +- runtime behavior explained +- operator audience +- diagnosis checklist +- one canonical reproduce/inspect/fix/verify example diff --git a/docs/troubleshooting/app-configuration.md b/docs/troubleshooting/app-configuration.md new file mode 100644 index 00000000..c96feffc --- /dev/null +++ b/docs/troubleshooting/app-configuration.md @@ -0,0 +1,107 @@ +# App Configuration Not Loading + +> **TL;DR** +> 1. Set `AZURE_APPCONFIG_ENDPOINT` (managed identity) or `AZURE_APPCONFIG_CONNECTION_STRING`. +> 2. Assign the **`App Configuration Data Reader`** role to the proxy identity. +> 3. Warm settings refresh in ~30 s after bumping `Warm:Sentinel`. Cold settings require a restart. + +--- + +## How settings are loaded + +The proxy loads from Azure App Configuration at startup and refreshes Warm settings continuously. The refresh trigger is the `Warm:Sentinel` key — whenever its value changes, all Warm-labeled keys are reloaded within approximately 30 seconds. + +``` +Startup: read all keys (Warm:* + Cold:*) filtered by AZURE_APPCONFIG_LABEL +Runtime: poll Warm:Sentinel every ~30s → if changed, reload all Warm:* keys +``` + +**Key format:** `{Warm|Cold}:{Section}:{SubSection}:{Name}` +- The `Warm:` or `Cold:` prefix determines reload behaviour (hot vs. restart required). +- The **label** is the environment name (e.g., `dev`, `prod`) — set via `AZURE_APPCONFIG_LABEL`. +- The proxy loads only keys whose label matches `AZURE_APPCONFIG_LABEL`. + +Cold settings are **never** refreshed at runtime. To apply a Cold setting change: update the key in App Config and restart the proxy. + +--- + +## Step 1 — Set the connection variable + +Use **one** of the following: + +| Method | Env Var | Notes | +|--------|---------|-------| +| Managed identity (recommended) | `AZURE_APPCONFIG_ENDPOINT=https://.azconfig.io` | Requires RBAC role (see below) | +| Connection string | `AZURE_APPCONFIG_CONNECTION_STRING=` | Simpler; stores credential in plain text | + +If neither is set, the proxy reads settings from environment variables only. + +--- + +## Step 2 — Assign the RBAC role (managed identity) + +```bash +IDENTITY_ID=$(az containerapp show \ + --name \ + --resource-group \ + --query identity.principalId -o tsv) + +APPCONFIG_ID=$(az appconfig show \ + --name \ + --resource-group \ + --query id -o tsv) + +az role assignment create \ + --role "App Configuration Data Reader" \ + --assignee $IDENTITY_ID \ + --scope $APPCONFIG_ID +``` + +> [!NOTE] +> The proxy only needs **Data Reader** access — it never writes to App Configuration. + +--- + +## Step 3 — Verify key format + +All keys use a `Warm:` or `Cold:` prefix followed by the section path. The **label** is the environment name — it must match `AZURE_APPCONFIG_LABEL`. + +| Prefix | Reload behaviour | +|--------|------------------| +| `Warm:` | Reloaded within ~30 s of bumping `Warm:Sentinel` | +| `Cold:` | Loaded at startup only; requires restart to change | + +**Correct key examples:** + +| Key | Label | Value | Notes | +|-----|-------|-------|-------| +| `Cold:Logging:EventLoggers` | `dev` | `eventhub` | Cold — restart required | +| `Warm:CircuitBreaker:ErrorThreshold` | `dev` | `60` | Warm — hot reloaded | +| `Warm:Sentinel` | `dev` | `2` | Bump this value to trigger refresh | + +--- + +## Symptom: settings loaded at startup but not refreshing + +1. Bump `Warm:Sentinel` — change its value to anything different (with label matching `AZURE_APPCONFIG_LABEL`). Refresh happens within ~30 s. +2. Verify the key being changed has the `Warm:` prefix (not `Cold:`). +3. Check that `AZURE_APPCONFIG_ENDPOINT` or connection string is still valid. + +## Symptom: proxy ignores App Configuration entirely + +- If neither `AZURE_APPCONFIG_ENDPOINT` nor `AZURE_APPCONFIG_CONNECTION_STRING` is set, App Configuration is not used. The proxy reads only environment variables. +- If the managed identity does not have the `App Configuration Data Reader` role, the connection will fail at startup. Check logs for `[AppConfig]` entries. + +## Symptom: Cold setting change not taking effect + +Cold settings (keys prefixed `Cold:`) are not reloaded at runtime. Bumping `Warm:Sentinel` has no effect on them. + +**Fix:** Update the App Config key and restart the proxy. + +--- + +## Related + +- [AZURE_APP_CONFIGURATION.md](../AZURE_APP_CONFIGURATION.md) — full App Config setup guide +- [CONFIGURATION_SETTINGS.md](../CONFIGURATION_SETTINGS.md) — all settings with Warm/Cold classification +- [ENVIRONMENT_VARIABLES.md](../ENVIRONMENT_VARIABLES.md) — complete environment variable reference diff --git a/docs/troubleshooting/async-202-never-issued.md b/docs/troubleshooting/async-202-never-issued.md new file mode 100644 index 00000000..17a10dc7 --- /dev/null +++ b/docs/troubleshooting/async-202-never-issued.md @@ -0,0 +1,89 @@ +# Async Expected but 202 Never Issued + +Purpose: Diagnose cases where clients expect async behavior but the proxy returns a normal synchronous response instead of `202 Accepted`. + +> **TL;DR** +> 1. Confirm the request includes the configured async opt-in header (`AsyncClientRequestHeader`, default `S7PAsyncMode`). +> 2. Confirm async is enabled at all three gates: proxy (`AsyncModeEnabled=true`), user profile (`enabled=true` in async config), and request header. +> 3. Compare backend completion time to `AsyncTriggerTimeout`: if the backend responds faster than the trigger timeout, sync response is expected behavior. + +--- + +## Runtime behavior + +The proxy does not force all async-tagged requests to return `202`. It first starts processing synchronously, then upgrades to async only if processing crosses the trigger window. + +Runtime decision path: + +1. Request arrives. +2. Proxy validates async gates: + - `AsyncModeEnabled=true` (system gate) + - Request has opt-in header (default `S7PAsyncMode`) + - User async profile exists and is enabled +3. If any gate fails, request stays synchronous. +4. If gates pass, proxy starts sync processing and waits up to `AsyncTriggerTimeout`. +5. If backend finishes before `AsyncTriggerTimeout`, request returns sync (200/4xx/5xx as applicable). +6. If processing exceeds `AsyncTriggerTimeout`, proxy returns `202` and continues in async pipeline. + +--- + +## Diagnosis checklist + +- Verify request header name and value: + - Header name is `AsyncClientRequestHeader` (default `S7PAsyncMode`). + - Header must be present on the failing request. +- Verify proxy-level async switch: + - `AsyncModeEnabled=true`. + - If using App Config, key is `Cold:Async:Enabled` and requires restart after change. +- Verify user profile async config: + - `enabled=true`. + - `containername` and `topic` are present. +- Verify trigger behavior: + - `AsyncTriggerTimeout` is set as expected. + - Backend latency is actually above this threshold for the request path being tested. +- Verify no config drift: + - `AsyncClientRequestHeader` on server matches client header name exactly. + - In App Config, `AZURE_APPCONFIG_LABEL` matches the label containing your async keys. + +--- + +## Canonical example (reproduce -> inspect -> fix -> verify) + +```bash +# 1) Reproduce (client expects async) +curl -i https:/// -H "S7PAsyncMode: true" + +# 2) Inspect key runtime settings (example env inspection) +echo $AsyncModeEnabled +echo $AsyncClientRequestHeader +echo $AsyncTriggerTimeout + +# 3) Apply one targeted fix (example: lower trigger timeout to force async on slow path) +export AsyncTriggerTimeout=1000 + +# If using App Config instead of env vars: +# Warm:Async:TriggerTimeout = 1000 +# Then bump Warm:Sentinel (same label as active environment) + +# 4) Verify +curl -i https:/// -H "S7PAsyncMode: true" +# Expect: HTTP/1.1 202 Accepted (for requests exceeding new trigger timeout) +``` + +--- + +## Common operator pitfalls + +- `AsyncModeEnabled` changed in App Config but proxy not restarted (Cold key). +- Client sends `AsyncMode: true` while server expects `S7PAsyncMode` (header mismatch). +- Profile async config exists but `enabled=false`. +- Test backend is too fast; request naturally completes before trigger timeout. + +--- + +## Related + +- [async-requests.md](async-requests.md) — broader async troubleshooting +- [../AsyncOperation.md](../AsyncOperation.md) — async config and flow reference +- [../TIMEOUTS.md](../TIMEOUTS.md) — timeout interactions +- [app-configuration.md](app-configuration.md) — App Config label/key troubleshooting diff --git a/docs/troubleshooting/async-requests.md b/docs/troubleshooting/async-requests.md new file mode 100644 index 00000000..76136f3d --- /dev/null +++ b/docs/troubleshooting/async-requests.md @@ -0,0 +1,127 @@ +# Async Requests Not Completing + +> **TL;DR** +> 1. Check that `AsyncModeEnabled=true` and the request sends the `S7PAsyncMode` header. +> 2. Verify the user profile has async configuration with a valid blob container and Service Bus topic. +> 3. If blobs exist but are empty, the critical cause is `OutputStream` not being cleared — see below. + +--- + +## How async mode works + +``` +Client sends request with S7PAsyncMode header + │ + ▼ +Proxy processes normally → if processing exceeds AsyncTriggerTimeout: + │ + ▼ +Proxy returns 202 immediately + │ (blob URI + SB topic in response body) + ▼ +AsyncWorker writes response → Blob Storage + │ + ▼ +Status update written → Service Bus topic +``` + +--- + +## Step 1 — Verify async is enabled + +All three must be true for async to activate: + +| Condition | Setting | Env Var | +|-----------|---------|---------| +| System switch on | `AsyncModeEnabled=true` | `AsyncModeEnabled` | +| Request header present | Header name = `AsyncClientRequestHeader` (default `S7PAsyncMode`) | set on request | +| User profile has async config | Profile field `async-config` contains blob container + SB topic | — | + +--- + +## Step 2 — Check the trigger timeout + +`AsyncTriggerTimeout` (default 10 s) is the time a request must be in flight before async kicks in. If the backend responds in under 10 s, the request is returned **synchronously** — this is normal. + +To force a request to go async sooner, reduce `AsyncTriggerTimeout`: + +| Setting | Env Var | App Config key | +|---------|---------|----------------| +| Trigger timeout (ms) | `AsyncTriggerTimeout=` | `Warm:Async:TriggerTimeout` | + +--- + +## Step 3 — Check blob storage configuration + +`AsyncBlobStorageConfig` is a composite string: + +```bash +AsyncBlobStorageConfig=uri=https://mystorageaccount.blob.core.windows.net,mi=true +``` + +For managed identity (`mi=true`), the proxy identity needs **`Storage Blob Data Contributor`** on the storage account. + +```bash +az role assignment create \ + --assignee \ + --role "Storage Blob Data Contributor" \ + --scope "/subscriptions//resourceGroups//providers/Microsoft.Storage/storageAccounts/" +``` + +--- + +## Step 4 — Check Service Bus configuration + +`AsyncSBConfig` is a composite string: + +```bash +AsyncSBConfig=cs=,ns=,q=requeststatus,mi=true +``` + +For managed identity, the proxy identity needs **`Azure Service Bus Data Sender`** on the namespace or topic. + +--- + +## Symptom: blobs exist but are empty + +> [!WARNING] +> **Critical bug (fixed Feb 2026):** After the proxy sends the 202 response and closes the client connection, `OutputStream` must be explicitly set to `null`. If it is not, `GetOrCreateDataStreamAsync()` returns the already-closed client stream instead of opening a blob stream, and the backend response is written to nothing. + +If you are running a custom build or patched version, verify [AsyncWorker.cs](../../src/SimpleL7Proxy/Proxy/AsyncWorker.cs) contains: + +```csharp +_requestData.Context.Response.Close(); +_requestData.OutputStream = null; // ← CRITICAL — must follow Close() +``` + +This line must appear **after** `Response.Close()`. Without it, the blob will be created but will remain empty. + +--- + +## Symptom: 202 never arrives — request returns synchronously + +- `AsyncTriggerTimeout` may be larger than the backend response time. The backend responded before async was triggered. +- Verify the client is sending the `S7PAsyncMode` header. + +## Symptom: 202 arrives but blob URI is never populated + +- Check Service Bus connectivity — the status message (with the blob URI) may not be reaching the client. +- Check blob container name in the user profile `async-config` field. +- Check `AsyncBlobWorkerCount` — if set to 0 or the workers are all busy, blobs queue up. + +## Symptom: async request times out with no blob + +`AsyncTimeout` (default 30 min) is the maximum async request lifetime. After this, the request is abandoned. If the backend takes longer than 30 min, increase `AsyncTimeout`. + +| Setting | Env Var | App Config key | +|---------|---------|----------------| +| Max async lifetime (ms) | `AsyncTimeout=` | `Warm:Async:Timeout` | +| Blob SAS token lifetime | `AsyncTTLSecs=` | `Warm:Async:TTLSecs` | + +--- + +## Related + +- [AsyncOperation.md](../AsyncOperation.md) — full async configuration reference +- [USER_PROFILES.md](../USER_PROFILES.md) — how to configure async per user profile +- [TIMEOUTS.md](../TIMEOUTS.md) — TTL and timeout interactions diff --git a/docs/troubleshooting/backend-hosts.md b/docs/troubleshooting/backend-hosts.md new file mode 100644 index 00000000..02490cf4 --- /dev/null +++ b/docs/troubleshooting/backend-hosts.md @@ -0,0 +1,107 @@ +# Backend Hosts Not Healthy + +> **TL;DR** +> 1. Verify the `host=` URL and `probe=` path are reachable from the proxy. +> 2. Check `SuccessRate` — hosts below the threshold are removed from the active pool. +> 3. For serverless/on-demand backends, use `mode=direct` to skip probing entirely. + +--- + +## How host health works + +The proxy polls every configured backend every `PollInterval` ms. Each poll result is recorded. A host stays in the active pool as long as its recent success rate stays above `SuccessRate` (default 80%). + +``` +Poll result: success → rate increases +Poll result: failure → rate decreases +rate < SuccessRate % → host removed from active pool (still polled) +rate >= SuccessRate % → host added back automatically +``` + +--- + +## Step 1 — Verify the host URL and probe path + +Test the probe directly from the environment where the proxy runs: + +```bash +# Standard probe path +curl -v https:/// + +# Example +curl -v https://api.backend.com/echo/resource?param1=sample +``` + +The probe must return a 2xx response. Any non-2xx is recorded as a failure. + +--- + +## Step 2 — Check host configuration syntax + +The connection string format is recommended. An **unrecognised key** causes `UriFormatException` at startup and prevents the proxy from starting. + +```bash +# Correct +Host1="host=https://api.backend.com;probe=/health;path=/api" + +# Wrong — 'url=' is not a valid key +Host1="url=https://api.backend.com;probe=/health" +``` + +| Key | Notes | +|-----|-------| +| `host` | Required. Protocol defaults to `https://` if omitted. | +| `probe` | Default: `echo/resource?param1=sample`. Must return 2xx. | +| `path` | Optional path prefix for routing. Default `/`. | +| `mode=direct` | Disables all probing. Host is always healthy. | + +--- + +## Step 3 — Tune poll settings + +| Setting | Env Var | App Config key | Default | +|---------|---------|----------------|---------| +| Poll interval (ms) | `PollInterval=` | `Cold:Server:PollInterval` | 10000 | +| Probe timeout (ms) | `PollTimeout=` | `Cold:Server:PollTimeout` | 5000 | +| Min success rate (%) | `SuccessRate=` | `Cold:CircuitBreaker:SuccessRate` | 80 | + +> [!TIP] +> If `PollTimeout` is shorter than the backend's warm-up time, every probe fails. For slow-starting backends, increase `PollTimeout` or use `mode=direct` and let the circuit breaker handle failures. + +--- + +## Using `mode=direct` for serverless backends + +Backends that scale to zero (Azure Functions, Container Apps with min replicas = 0) should use `mode=direct`. This prevents the health poller from waking them unnecessarily and guarantees they are always in the active pool. + +```bash +Host3="host=https://my-func.azurewebsites.net;mode=direct;path=/api/v1" +``` + +In direct mode, the circuit breaker still tracks per-request failures — the host is excluded once it breaches `CBErrorThreshold` failures within `CBTimeslice`. + +> [!WARNING] +> In direct mode, there is no readiness check at startup. The first real request is the first probe. Ensure your backend is able to handle a cold-start request. + +--- + +## Authenticated backends (managed identity / OAuth) + +If the backend requires a Bearer token, set `usemi=true` and provide `audience`: + +```bash +Host2="host=https://secure-api.internal;usemi=true;audience=api://my-app-id;probe=/health" +``` + +The proxy acquires a token from the managed identity endpoint. If the token acquisition fails, probe requests will receive `401` and the host will fail health checks. + +**Fix:** Verify the proxy managed identity has the required role on the backend API and that `audience` matches the backend's app registration. + +--- + +## Related + +- [BACKEND_HOSTS.md](../BACKEND_HOSTS.md) — full host configuration reference +- [HEALTH_CHECKING.md](../HEALTH_CHECKING.md) — health endpoint reference +- [circuit-breaker.md](circuit-breaker.md) — circuit breaker troubleshooting +- [health-probes.md](health-probes.md) — Kubernetes probe configuration diff --git a/docs/troubleshooting/circuit-breaker.md b/docs/troubleshooting/circuit-breaker.md new file mode 100644 index 00000000..26d5adc5 --- /dev/null +++ b/docs/troubleshooting/circuit-breaker.md @@ -0,0 +1,93 @@ +# Circuit Breaker Stuck Open + +> **TL;DR** +> The circuit breaker self-heals — it closes automatically once old failures age out of the sliding window. If it stays open, the backends are still actively failing or `CBTimeslice` is set very large. + +--- + +## How the circuit breaker works + +The circuit breaker tracks failure timestamps in a sliding window. It opens when the count inside the window exceeds `CBErrorThreshold`. Once failures age out (older than `CBTimeslice` seconds), the count drops and the circuit closes itself — no manual reset is needed. + +``` +Failures in last CBTimeslice seconds >= CBErrorThreshold → OPEN (host skipped) +Failures in last CBTimeslice seconds < CBErrorThreshold → CLOSED (host used) +``` + +--- + +## Diagnose + +### Check readiness probe + +```bash +curl -v http:///readiness +# 503 → at least one circuit is open +# 200 → all circuits are closed +``` + +### Check logs + +Search for `[CB-DELAY]` and `Circuit breaker BLOCKING` log entries: + +``` +[CB-DELAY] Circuit breaker is experiencing elevated error rates. Count: 42, Introducing delay: 300ms +[ProxyToBackEnd] ⚠ Circuit breaker BLOCKING host: https://api.backend.com +``` + +The count in `[CB-DELAY]` tells you how close you are to the threshold. + +--- + +## Common causes and fixes + +### Backends are genuinely failing + +The circuit is doing its job. Fix the backend first. + +```bash +# Test the backend probe path directly +curl -v / +``` + +### Threshold too low for normal error rate + +If backends occasionally return 5xx during normal operation (e.g., transient errors), the circuit may open too easily. + +**Fix:** Raise `CBErrorThreshold` or increase `CBTimeslice` so transient bursts don't accumulate enough to trip the circuit. + +| Setting | Env Var | App Config key | +|---------|---------|----------------| +| Failure threshold | `CBErrorThreshold=` | `Warm:CircuitBreaker:ErrorThreshold` | +| Window width (seconds) | `CBTimeslice=` | `Warm:CircuitBreaker:Timeslice` | + +> [!NOTE] +> Both settings have the `Warm:` prefix — update them in App Configuration and bump `Warm:Sentinel`; no restart needed. + +### Status codes counted as failures incorrectly + +By default, any code not in `AcceptableStatusCodes` counts as a failure. If a backend legitimately returns `503` or `500` in normal operation, add it to the acceptable list to stop it triggering the circuit breaker. + +| Setting | Env Var | App Config key | +|---------|---------|----------------| +| Acceptable codes | `AcceptableStatusCodes=[200,202,503]` | `Warm:Response:AcceptableStatusCodes` | + +### Window too large — old failures keeping circuit open + +If `CBTimeslice` is very large (e.g., 3600 s), a burst of failures from an hour ago is still counted. + +**Fix:** Reduce `CBTimeslice` so the window reflects recent behaviour. + +### Progressive delay making requests slow before full open + +As the failure count approaches the threshold, the proxy adds a 100–500 ms artificial delay per request. If you see elevated latency but no 429s, the circuit is in the delay zone (50–99% of threshold). + +This is intentional — it slows traffic to the struggling host before fully blocking it. No action is needed unless the latency is unacceptable, in which case raise `CBErrorThreshold`. + +--- + +## Related + +- [CIRCUIT_BREAKER.md](../CIRCUIT_BREAKER.md) — full circuit breaker reference +- [requests-429.md](requests-429.md) — 429 responses caused by open circuits +- [backend-hosts.md](backend-hosts.md) — backend health and probing diff --git a/docs/troubleshooting/event-hub.md b/docs/troubleshooting/event-hub.md new file mode 100644 index 00000000..c04881fe --- /dev/null +++ b/docs/troubleshooting/event-hub.md @@ -0,0 +1,81 @@ +# Event Hub — Messages Not Appearing + +> **TL;DR** +> 1. Set `EVENT_LOGGERS` to include `eventhub`. +> 2. Provide either a connection string **or** a namespace (managed identity) — not both. +> 3. For managed identity, assign the **`Azure Event Hubs Data Sender`** role. +> 4. All Event Hub settings are **Cold** — a restart is required after any change. + +--- + +## Step 1 — Enable the Event Hub backend + +Set each value using **either** an environment variable **or** an Azure App Configuration key — not both. +Use environment variables for simple deployments; use App Configuration when managing settings centrally across multiple instances. + +| Setting | Env Var | App Config key | +|---------|---------|----------------| +| Enable Event Hub logging | `EVENT_LOGGERS=eventhub` | `Cold:Logging:EventLoggers` | + +To enable both file and Event Hub logging simultaneously, set the value to `file,eventhub`. + +--- + +## Step 2 — Provide connection details (choose one) + +### Option A — Connection string *(simpler, no managed identity required)* + +| Setting | Env Var | App Config key | +|---------|---------|----------------| +| Connection string | `EVENTHUB_CONNECTIONSTRING=` | `Cold:Logging:EventHub:ConnectionString` | +| Hub name | `EVENTHUB_NAME=` | `Cold:Logging:EventHub:Name` | + +### Option B — Managed identity / RBAC *(recommended for production)* + +| Setting | Env Var | App Config key | +|---------|---------|----------------| +| Namespace | `EVENTHUB_NAMESPACE=` | `Cold:Logging:EventHub:Namespace` | +| Hub name | `EVENTHUB_NAME=` | `Cold:Logging:EventHub:Name` | + +`EVENTHUB_NAMESPACE` accepts either the short name (e.g. `myns`) or the full hostname (e.g. `myns.servicebus.windows.net`). + +The identity running the proxy (managed identity, workload identity, or service principal) must have the **`Azure Event Hubs Data Sender`** role on the Event Hub or its parent namespace. + +```bash +az role assignment create \ + --assignee \ + --role "Azure Event Hubs Data Sender" \ + --scope "/subscriptions//resourceGroups//providers/Microsoft.EventHub/namespaces//eventhubs/" +``` + +--- + +## Verifying the connection + +Check logs at startup for `[EVENT HUB]` entries: + +``` +[EVENT HUB] connecting via connection string, eventhubname: +``` + +or, for managed identity: + +``` +[EVENT HUB] connecting via managed identity, namespace: +``` + +If neither appears, `EVENT_LOGGERS` may not include `eventhub`, or the setting change has not taken effect yet (restart required). + +> [!NOTE] +> If the Event Hub connection fails at startup, the backend is **silently disabled** and other configured backends (e.g., `file`) continue unaffected. Verify the connection string or role assignment and restart. + +> [!TIP] +> **Sovereign cloud:** if your namespace ends in `.servicebus.usgovcloudapi.net`, set `EVENTHUB_NAMESPACE` to the full hostname — the proxy uses it as-is and will not append `.servicebus.windows.net`. + +--- + +## Related + +- [OBSERVABILITY.md](../OBSERVABILITY.md) — full Event Hub architecture and custom loggers +- [ENVIRONMENT_VARIABLES.md](../ENVIRONMENT_VARIABLES.md) — all `EVENTHUB_*` variables +- [AZURE_APP_CONFIGURATION.md](../AZURE_APP_CONFIGURATION.md) — how to set Cold settings in App Config diff --git a/docs/troubleshooting/health-probes.md b/docs/troubleshooting/health-probes.md new file mode 100644 index 00000000..35710db2 --- /dev/null +++ b/docs/troubleshooting/health-probes.md @@ -0,0 +1,108 @@ +# Health Probe Failures / Pod Restarts + +> **TL;DR** +> Probe failures under heavy load usually mean ThreadPool starvation. Enable the **Health Probe Sidecar** to isolate probes from application traffic. Under light load, probe failures almost always mean no backends are healthy. + +--- + +## Probe endpoints reference + +| Endpoint | Port | Returns 200 when… | +|----------|------|-------------------| +| `/liveness` | main / 9000 | Process is running | +| `/readiness` | main / 9000 | At least one backend is healthy | +| `/startup` | main / 9000 | Backend poller has completed its first pass | +| `/health` | main only | Always 200 (alias for liveness) | + +--- + +## Symptom: readiness returns 503 + +The probe body will tell you why: + +| Body | Cause | +|------|-------| +| `Not Healthy. Active Hosts: 0` | No backends passed health checks | +| `Not Healthy. Failed Hosts: True` | At least one circuit breaker is open | + +**Fix for "Active Hosts: 0":** +- Verify `Host1`…`Host9` are correctly configured with valid URLs and probe paths. +- Test the backend probe path directly: `curl /` +- Check `PollInterval` and `PollTimeout` — if `PollTimeout` is shorter than the backend's response time, every probe times out. + +| Setting | Env Var | App Config key | +|---------|---------|----------------| +| Poll interval (ms) | `PollInterval=` | `Cold:Server:PollInterval` | +| Probe timeout (ms) | `PollTimeout=` | `Cold:Server:PollTimeout` | + +> [!TIP] +> Set `SuccessRate` lower (e.g. `50`) to keep a partially-recovering host in the active pool. Default is `80` (%). + +**Fix for "Failed Hosts: True":** +→ See [circuit-breaker.md](circuit-breaker.md). + +--- + +## Symptom: probes slow or timing out under high load + +Under heavy load (~1000 concurrent requests), async Kestrel handlers compete with proxy workers for ThreadPool threads. Probes can queue for 1–2 seconds, causing Kubernetes to mark the pod unhealthy and restart it. + +**Fix — enable the Health Probe Sidecar:** + +The sidecar is a lightweight Kestrel process on port 9000 that serves probes from memory. The main proxy pushes its health state to it every second. Probes are served synchronously, avoiding any ThreadPool dependency. + +```bash +# Enable sidecar +HealthProbeSidecar=Enabled=true;url=http://localhost:9000 +``` + +Then point your Kubernetes probes to port 9000: + +```yaml +livenessProbe: + httpGet: + path: /liveness + port: 9000 + failureThreshold: 3 + periodSeconds: 5 + +readinessProbe: + httpGet: + path: /readiness + port: 9000 + failureThreshold: 3 + periodSeconds: 5 + +startupProbe: + httpGet: + path: /startup + port: 9000 + failureThreshold: 30 + periodSeconds: 5 +``` + +> [!NOTE] +> If the sidecar does not receive a status update from the main proxy for more than 10 seconds, it automatically fails all probes — protecting against a silently deadlocked main process. + +--- + +## Symptom: startup probe fails before backends are ready + +The startup probe returns 503 until the backend poller completes its first pass. If `failureThreshold × periodSeconds` is shorter than `PollInterval`, the pod restarts before it can become ready. + +**Fix:** Increase `failureThreshold` on the startup probe so it waits at least as long as one full poll cycle. + +```yaml +startupProbe: + failureThreshold: 30 # 30 × 5s = 150s budget + periodSeconds: 5 +``` + +--- + +## Related + +- [HEALTH_CHECKING.md](../HEALTH_CHECKING.md) — full health probe reference +- [SIDECAR_DEPLOYMENT.md](../SIDECAR_DEPLOYMENT.md) — sidecar deployment configuration +- [circuit-breaker.md](circuit-breaker.md) — circuit breaker troubleshooting +- [backend-hosts.md](backend-hosts.md) — backend host troubleshooting diff --git a/docs/troubleshooting/requests-400-invalid-ttl.md b/docs/troubleshooting/requests-400-invalid-ttl.md new file mode 100644 index 00000000..e21aaaa6 --- /dev/null +++ b/docs/troubleshooting/requests-400-invalid-ttl.md @@ -0,0 +1,80 @@ +# Getting 400 Bad Request (Invalid TTL) + +Purpose: Diagnose and fix `400 Bad Request` responses caused by invalid TTL header values. + +> **TL;DR** +> 1. `400 InvalidTTL` happens when the TTL header cannot be parsed. +> 2. Use one supported format only: relative seconds (`300`), absolute unix seconds (`+1735689600`), or a parseable datetime. +> 3. If clients cannot guarantee valid TTL formatting, remove the header and rely on `DefaultTTLSecs`. + +--- + +## Runtime behavior + +When the request enters the proxy queue, the proxy computes expiration in `CalculateExpiration(...)` from the configured TTL header (default `S7PTTL`). + +Parsing order: + +1. If header starts with `+` and the rest is an integer, it is treated as absolute unix epoch seconds. +2. Else if it parses as a float, it is treated as relative seconds from enqueue time. +3. Else if it parses as datetime, it is converted to UTC and used directly. +4. Else the proxy throws `InvalidTTL` and returns `400`. + +If the header is missing or empty, proxy uses `DefaultTTLSecs` (or fallback default timeout path) and does not return `400`. + +--- + +## Diagnosis checklist + +- Confirm response is proxy-originated: + - Status code is `400`. + - Message includes `Invalid TTL format` or `InvalidTTL`. +- Confirm header name: + - Check configured `TTLHeader` (default `S7PTTL`). + - Verify client is setting that exact header name. +- Inspect actual header value sent by client: + - Reject values with unit suffixes like `300s` unless your datetime parser format explicitly supports it. + - Reject free text values such as `five-minutes`. +- Validate value against supported formats: + - Relative seconds: `300` or `2.5` + - Absolute epoch seconds: `+1735689600` + - Date/time: `2026-04-29T10:30:00Z` +- If value origin is downstream middleware or APIM policy, inspect transformation step for type/format drift. + +--- + +## Canonical example (reproduce -> inspect -> fix -> verify) + +```bash +# 1) Reproduce a failure +curl -i https:/// -H "S7PTTL: 300s" +# Expect: HTTP/1.1 400 Bad Request + +# 2) Inspect by sending a known-valid TTL format +curl -i https:/// -H "S7PTTL: 300" + +# 3) Apply one targeted fix (client/APIM policy) +# Change TTL value generation from "300s" to "300" + +# 4) Verify +curl -i https:/// -H "S7PTTL: 300" +# Expect: request is accepted and processed (not 400 InvalidTTL) +``` + +--- + +## Common operator pitfalls + +- Sending `S7PTTL` with suffixes (`ms`, `sec`, `s`) from policy templates. +- Setting a custom `TTLHeader` in config but clients still sending `S7PTTL`. +- Treating `+300` as relative; in proxy logic `+...` is absolute unix seconds. +- Timezone ambiguity in non-UTC datetime strings. + +--- + +## Related + +- [../RESPONSE_CODES.md](../RESPONSE_CODES.md) — `400 InvalidTTL` behavior +- [requests-412.md](requests-412.md) — TTL expired after successful parsing +- [../TIMEOUTS.md](../TIMEOUTS.md) — TTL and timeout interactions +- [../REQUEST_VALIDATION.md](../REQUEST_VALIDATION.md) — other request validation failures diff --git a/docs/troubleshooting/requests-412.md b/docs/troubleshooting/requests-412.md new file mode 100644 index 00000000..9abd370f --- /dev/null +++ b/docs/troubleshooting/requests-412.md @@ -0,0 +1,75 @@ +# Getting 412 Precondition Failed + +> **TL;DR** +> `412` means the request's TTL expired while it was waiting in the priority queue — the proxy never sent it to a backend. Shorten queue wait time or increase the TTL. + +--- + +## What causes 412 + +Every request has a time-to-live (TTL). The proxy stamps an `ExpiresAt` on each request when it is enqueued. If the request has not been dispatched to a backend by `ExpiresAt`, the worker discards it with a `412`. + +``` +Enqueue time + TTL = ExpiresAt + │ + ▼ + Request reaches worker after ExpiresAt → 412 +``` + +--- + +## Diagnose + +Check the `x-Request-Queue-Duration` response header — if it equals or exceeds the TTL, the request expired in the queue. + +```bash +curl -i http:///api/... +# Look for: +# HTTP/1.1 412 Precondition Failed +# x-Request-Queue-Duration: 300123.4 ms ← expired after 300 s +``` + +--- + +## Fix options + +### 1 — Increase the default TTL + +The default TTL is 300 s (`DefaultTTLSecs`). Increase it if requests legitimately take longer to process. + +| Setting | Env Var | App Config key | +|---------|---------|----------------| +| Default TTL (seconds) | `DefaultTTLSecs=` | `Warm:Request:DefaultTTLSecs` | + +### 2 — Override TTL per request + +Clients can send a per-request TTL via the `S7PTTL` header (value in seconds): + +```bash +curl -H "S7PTTL: 600" http:///api/... +``` + +> [!WARNING] +> A client-supplied `S7PTTL` **overrides** `DefaultTTLSecs` for that request. If clients send a very small value, they will see 412s regardless of the server default. + +### 3 — Reduce queue wait time + +If the queue is consistently long, workers are not draining fast enough. Options: + +- Increase `Workers` count (Cold — restart required) +- Increase `MaxQueueLength` to smooth burst absorption +- Add more backend capacity +- Reduce per-attempt `Timeout` so failed attempts free workers faster + +| Setting | Env Var | App Config key | +|---------|---------|----------------| +| Worker count | `Workers=` | `Cold:Server:Workers` | +| Per-attempt timeout (ms) | `Timeout=` | `Warm:Request:DefaultTimeout` | + +--- + +## Related + +- [TIMEOUTS.md](../TIMEOUTS.md) — TTL, Timeout, and AsyncTimeout interactions +- [RESPONSE_CODES.md](../RESPONSE_CODES.md) — full list of proxy-originated codes +- [requests-429.md](requests-429.md) — queue full (upstream cause of 412 under load) diff --git a/docs/troubleshooting/requests-429.md b/docs/troubleshooting/requests-429.md new file mode 100644 index 00000000..2fd03cd0 --- /dev/null +++ b/docs/troubleshooting/requests-429.md @@ -0,0 +1,105 @@ +# Getting 429 Too Many Requests + +> **TL;DR** +> A `429` from the proxy means it rejected the request before ever sending it to a backend. It is always one of three causes: queue full, all circuit breakers open, or no active hosts. + +--- + +## Diagnose the cause + +The proxy includes a reason in the response body. Inspect it first: + +| Response body contains | Cause | Jump to | +|------------------------|-------|---------| +| `Circuit breaker on` | All backend circuit breakers are open | [Circuit breaker open](#circuit-breaker-open) | +| `Queue full` / `MaxQueueLength` | Incoming rate exceeds worker throughput | [Queue full](#queue-full) | +| `No active hosts` | No backends passed the health check | [No active hosts](#no-active-hosts) | +| `Max events` | Event logger buffer full | [Max undrained events](#max-undrained-events) | + +--- + +## Circuit breaker open + +All backend hosts have exceeded their failure threshold. The proxy will not attempt any backend until the failure window ages out. + +**Immediate check:** + +```bash +curl http:///readiness +# Returns 503 when any circuit is open +``` + +**Fix:** +- Wait for the sliding window (`CBTimeslice`, default 60 s) to age out failures — the circuit self-heals. +- If backends are genuinely down, fix the backend first. +- To tune the threshold so the circuit opens less aggressively, raise `CBErrorThreshold`. + +| Setting | Env Var | App Config key | +|---------|---------|----------------| +| Failure threshold | `CBErrorThreshold=` | `Warm:CircuitBreaker:ErrorThreshold` | +| Window width (seconds) | `CBTimeslice=` | `Warm:CircuitBreaker:Timeslice` | + +> [!TIP] +> See [circuit-breaker.md](circuit-breaker.md) for a full diagnosis guide. + +--- + +## Queue full + +The priority queue has reached `MaxQueueLength`. New requests are rejected with 429 until workers drain the backlog. + +**Fix options:** + +1. **Increase worker count** — more workers drain the queue faster (Cold setting, requires restart). +2. **Increase queue length** — absorbs bursts, but increases memory usage (Cold setting, requires restart). +3. **Add more backend hosts** — higher throughput means faster drain. +4. **Reduce per-request timeout** — shorter timeouts free workers sooner. + +| Setting | Env Var | App Config key | +|---------|---------|----------------| +| Queue size | `MaxQueueLength=` | `Cold:Server:MaxQueueLength` | +| Worker count | `Workers=` | `Cold:Server:Workers` | + +--- + +## No active hosts + +Every configured backend has failed health probes and been removed from the active pool. + +**Check:** + +```bash +curl http:///readiness +# Body: "Not Healthy. Active Hosts: 0" +``` + +**Fix:** +- Verify backend URLs and probe paths are correct. +- Check backend health directly: `curl /`. +- Review `PollInterval` and `PollTimeout` — if they are too aggressive they may mark healthy backends as failed. + +> [!TIP] +> See [backend-hosts.md](backend-hosts.md) for a full diagnosis guide. + +--- + +## Max undrained events + +The Event Hub logger buffer (`EVENTHUB_MAX_UNDRAINED_EVENTS`) is full. This typically means the Event Hub connection is degraded and events are not being flushed. + +**Fix:** +- Check the Event Hub connection. See [event-hub.md](event-hub.md). +- Increase `EVENTHUB_MAX_UNDRAINED_EVENTS` to absorb spikes (Cold setting). + +| Setting | Env Var | App Config key | +|---------|---------|----------------| +| Buffer limit | `EVENTHUB_MAX_UNDRAINED_EVENTS=` | `Cold:Server:MaxUndrainedEvents` | + +--- + +## Related + +- [RESPONSE_CODES.md](../RESPONSE_CODES.md) — full list of proxy-originated codes +- [CIRCUIT_BREAKER.md](../CIRCUIT_BREAKER.md) — circuit breaker reference +- [circuit-breaker.md](circuit-breaker.md) — circuit breaker troubleshooting +- [backend-hosts.md](backend-hosts.md) — backend host troubleshooting diff --git a/docs/troubleshooting/requests-503.md b/docs/troubleshooting/requests-503.md new file mode 100644 index 00000000..7b5e6a4b --- /dev/null +++ b/docs/troubleshooting/requests-503.md @@ -0,0 +1,80 @@ +# Getting 503 Service Unavailable + +> **TL;DR** +> A `503` from the proxy means every backend was tried and every attempt failed. This is different from a `429` (rejected before sending) — a `503` means the proxy exhausted all hosts. + +--- + +## Diagnose the cause + +The proxy includes a JSON error body with per-attempt details. Read it: + +```json +{ + "status": 503, + "message": "All backends failed", + "attempts": [ + { "host": "https://api1.backend.com", "code": 500, "error": "Internal Server Error" }, + { "host": "https://api2.backend.com", "code": 502, "error": "Bad Gateway" } + ] +} +``` + +Response headers also show: +- `x-Request-Process-Duration` — time the proxy spent trying backends +- `BackendHost` — last backend attempted + +--- + +## Common causes + +### All backends returning 5xx + +The backends themselves are failing. The proxy retries each host in sequence; when all fail with non-`AcceptableStatusCodes` responses, it returns 503. + +**Fix:** +- Check each backend directly: `curl /` +- If the backend is temporarily overloaded, add it to `AcceptableStatusCodes` to pass its status through instead of retrying. + +| Setting | Env Var | App Config key | +|---------|---------|----------------| +| Acceptable codes | `AcceptableStatusCodes=[200,202,503]` | `Warm:Response:AcceptableStatusCodes` | + +> [!NOTE] +> Adding a code to `AcceptableStatusCodes` means it will be returned directly to the client **and** will not count as a circuit-breaker failure. + +### Circuit breakers blocked all hosts before the request arrived + +If all circuits opened between request enqueue and dequeue, the proxy skips every host. The result is a 503 with no actual backend attempts. + +Check the circuit breaker status: `curl http:///readiness` + +> [!TIP] +> See [circuit-breaker.md](circuit-breaker.md) for recovery steps. + +### Backends returning codes in 3xx or 404 + +Redirects (`3xx`) and `404` cause the proxy to skip to the next host. If all hosts return a redirect or 404, the result is 503. + +**Fix:** Verify the backend URLs and path routing are correct. Check `stripprefix` settings on each host — a stripped prefix may produce the wrong downstream path. + +### All hosts excluded from path routing + +If a request path does not match any configured host `path` prefix, no hosts are eligible and the proxy returns 503 immediately. + +**Fix:** Verify host `path` configuration matches the request paths you are sending. + +```bash +# Example: only requests starting with /api/v1 go to Host1 +Host1="host=https://api.backend.com;path=/api/v1;probe=/health" +``` + +--- + +## Related + +- [RESPONSE_CODES.md](../RESPONSE_CODES.md) — full list of proxy-originated codes +- [BACKEND_HOSTS.md](../BACKEND_HOSTS.md) — host configuration reference +- [CIRCUIT_BREAKER.md](../CIRCUIT_BREAKER.md) — circuit breaker reference +- [circuit-breaker.md](circuit-breaker.md) — circuit breaker troubleshooting +- [backend-hosts.md](backend-hosts.md) — backend host troubleshooting diff --git a/src/SimpleL7Proxy/Async/AsyncFileStore.cs b/src/SimpleL7Proxy/Async/AsyncFileStore.cs new file mode 100644 index 00000000..f366dd01 --- /dev/null +++ b/src/SimpleL7Proxy/Async/AsyncFileStore.cs @@ -0,0 +1,60 @@ +using SimpleL7Proxy.Async.BlobStorage; + +namespace SimpleL7Proxy.Async; + +/// +/// File-style store: small one-shot blobs that flow through the BlobWriteQueue. +/// In async mode is registered as QueuedBlobWriter, so +/// stream-based calls here transparently go through the queue. +/// bypasses the queue for minimum latency on already-materialized payloads. +/// +public sealed class AsyncFileStore : IAsyncFileStore +{ + private readonly IBlobWriter _writer; + + public AsyncFileStore(IBlobWriter writer) + { + _writer = writer ?? throw new ArgumentNullException(nameof(writer)); + } + + public Task InitializeClientAsync(string containerName) + => _writer.InitClientAsync(containerName); + + public Task WriteAsync(string containerName, string blobName, ReadOnlyMemory data, CancellationToken cancellationToken = default) + => _writer.UploadBlobAsync(containerName, blobName, data, cancellationToken); + + public Task OpenWriteStreamAsync(string containerName, string blobName) + => _writer.CreateBlobAndGetOutputStreamAsync(containerName, blobName); + + public (string dataBlobUri, string headerBlobUri) GetBlobUriPair( + string containerName, string dataBlobName, string headerBlobName) + => (_writer.GetBlobUri(containerName, dataBlobName), _writer.GetBlobUri(containerName, headerBlobName)); + + // public async Task<(string dataBlobUri, string headerBlobUri)> GenerateSasTokenPairAsync( + // string containerName, string dataBlobName, string headerBlobName, TimeSpan expiry) + // { + // var dataTask = _writer.GenerateSasTokenAsync(containerName, dataBlobName, expiry); + // var headerTask = _writer.GenerateSasTokenAsync(containerName, headerBlobName, expiry); + // await Task.WhenAll(dataTask, headerTask).ConfigureAwait(false); + // return (await dataTask, await headerTask); + // } + + public async Task CompleteWriteStreamAsync(Stream? stream, CancellationToken cancellationToken = default) + { + if (stream == null) return; + await stream.FlushAsync(cancellationToken).ConfigureAwait(false); + // QueuedBlobStream writes are asynchronous; wait for all pending operations to land. + if (stream is QueuedBlobStream qbs) + await qbs.WaitForPendingWritesAsync(cancellationToken).ConfigureAwait(false); + } + + public Task BlobExistsAsync(string containerName, string blobName) + => _writer.BlobExistsAsync(containerName, blobName); + + public Task ReadBlobAsStreamAsync(string containerName, string blobName) + => _writer.ReadBlobAsStreamAsync(containerName, blobName); + + public Task DeleteBlobAsync(string containerName, string blobName) + => _writer.DeleteBlobAsync(containerName, blobName); +} + diff --git a/src/SimpleL7Proxy/Proxy/AsyncHeaders.cs b/src/SimpleL7Proxy/Async/AsyncHeaders.cs similarity index 92% rename from src/SimpleL7Proxy/Proxy/AsyncHeaders.cs rename to src/SimpleL7Proxy/Async/AsyncHeaders.cs index 3dd7af95..86dfe1f4 100644 --- a/src/SimpleL7Proxy/Proxy/AsyncHeaders.cs +++ b/src/SimpleL7Proxy/Async/AsyncHeaders.cs @@ -1,7 +1,7 @@ using System.Net; -namespace SimpleL7Proxy.Proxy +namespace SimpleL7Proxy.Async { /// /// Represents an asynchronous worker that performs a task and disappears after completion. diff --git a/src/SimpleL7Proxy/Proxy/AsyncMessage.cs b/src/SimpleL7Proxy/Async/AsyncMessage.cs similarity index 93% rename from src/SimpleL7Proxy/Proxy/AsyncMessage.cs rename to src/SimpleL7Proxy/Async/AsyncMessage.cs index 2dcd31e8..c6192d98 100644 --- a/src/SimpleL7Proxy/Proxy/AsyncMessage.cs +++ b/src/SimpleL7Proxy/Async/AsyncMessage.cs @@ -1,4 +1,4 @@ -namespace SimpleL7Proxy.Proxy +namespace SimpleL7Proxy.Async { /// /// Represents an asynchronous worker that performs a task and disappears after completion. diff --git a/src/SimpleL7Proxy/Async/AsyncResponseTypeEnum.cs b/src/SimpleL7Proxy/Async/AsyncResponseTypeEnum.cs new file mode 100644 index 00000000..88f61f5e --- /dev/null +++ b/src/SimpleL7Proxy/Async/AsyncResponseTypeEnum.cs @@ -0,0 +1,11 @@ +namespace SimpleL7Proxy.Async; + +/// +/// Identifies a canned message template loaded from the "templates" blob container. +/// +public enum AsyncResponseTypeEnum +{ + Welcome, + NotReady, + NotAuthorized, +} \ No newline at end of file diff --git a/src/SimpleL7Proxy/Async/AsyncStreamingStore.cs b/src/SimpleL7Proxy/Async/AsyncStreamingStore.cs new file mode 100644 index 00000000..9a3d0d11 --- /dev/null +++ b/src/SimpleL7Proxy/Async/AsyncStreamingStore.cs @@ -0,0 +1,26 @@ +using SimpleL7Proxy.Async.BlobStorage; + +namespace SimpleL7Proxy.Async; + +/// +/// Streaming store: large/streamed blobs (response bodies up to gigabytes) that bypass the +/// BlobWriteQueue entirely. Holds a dedicated instance from the +/// factory whose CreateBlobAndGetOutputStreamAsync calls +/// BlobClient.OpenWriteAsync directly — the SDK's transfer buffer (~4 MiB by default) +/// is the only memory used regardless of total payload size. +/// +public sealed class AsyncStreamingStore : IAsyncStreamingStore, IDisposable +{ + private readonly IBlobWriter _writer; + + public AsyncStreamingStore(IBlobWriterFactory factory) + { + if (factory == null) throw new ArgumentNullException(nameof(factory)); + _writer = factory.CreateBlobWriter(); + } + + public Task OpenWriteStreamAsync(string containerName, string blobName, CancellationToken cancellationToken = default) + => _writer.CreateBlobAndGetOutputStreamAsync(containerName, blobName, cancellationToken); + + public void Dispose() => _writer?.Dispose(); +} diff --git a/src/SimpleL7Proxy/Async/AsyncWorkerContext.cs b/src/SimpleL7Proxy/Async/AsyncWorkerContext.cs new file mode 100644 index 00000000..93f1e8e2 --- /dev/null +++ b/src/SimpleL7Proxy/Async/AsyncWorkerContext.cs @@ -0,0 +1,44 @@ +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; + +using SimpleL7Proxy.Config; +using SimpleL7Proxy.DTO; +using SimpleL7Proxy.Proxy; + +namespace SimpleL7Proxy.Async; + +/// +/// Singleton bag of construction-time dependencies for . +/// +/// Roles: +/// AsyncInitializer — one-shot startup: ensures server-scoped blob container exists, +/// loads templates, wires statics. Runs +/// once before traffic; never participates in per-request flow. +/// AsyncWorkerContext — shared toolbox consumed every time an AsyncWorker is constructed. +/// Carries the file store (queued small blobs), logger, and +/// ProxyConfig. The streaming store is wired into AsyncWorker via +/// its static Initialize(), not through this context. No init +/// logic, no per-request state. +/// +public sealed class AsyncWorkerContext +{ + public IAsyncFileStore FileStore { get; } + public IRequestDataBackupService BackupService { get; } + public ILogger Logger { get; } + public ProxyConfig Options { get; } + public TemplateLoader Messages { get; } + + public AsyncWorkerContext( + IAsyncFileStore fileStore, + IRequestDataBackupService backupService, + ILogger logger, + IOptions options, + TemplateLoader messages) + { + FileStore = fileStore ?? throw new ArgumentNullException(nameof(fileStore)); + BackupService = backupService ?? throw new ArgumentNullException(nameof(backupService)); + Logger = logger ?? throw new ArgumentNullException(nameof(logger)); + Options = options?.Value ?? throw new ArgumentNullException(nameof(options)); + Messages = messages ?? throw new ArgumentNullException(nameof(messages)); + } +} diff --git a/src/SimpleL7Proxy/Async/BlobStorage/AzureBlobWriter.cs b/src/SimpleL7Proxy/Async/BlobStorage/AzureBlobWriter.cs new file mode 100644 index 00000000..083a0025 --- /dev/null +++ b/src/SimpleL7Proxy/Async/BlobStorage/AzureBlobWriter.cs @@ -0,0 +1,400 @@ +using Azure.Storage; +using Azure.Storage.Blobs; +using Azure.Storage.Blobs.Specialized; +using Azure.Storage.Sas; +using Azure; +using Microsoft.Extensions.Hosting; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using System.IO; +using System.Threading.Tasks; +using System.Collections.Concurrent; + +// Review DISPOSAL_ARCHITECTURE.MD in the root for details on disposal flow + +namespace SimpleL7Proxy.Async.BlobStorage +{ + /// + /// Provides methods for writing to Azure Blob Storage. + /// + public class AzureBlobWriter : IBlobWriter, IDisposable + { + // Single-flight initialization: Lazy> ensures only one CreateIfNotExists call + // per (containerName) even under 1000 concurrent first-time inits. + private static readonly ConcurrentDictionary>> _containerClients = new(); + + private readonly BlobServiceClient _blobServiceClient; + private readonly ILogger _logger; + private readonly IOptionsMonitor? _optionsMonitor; + + // Cache for the user delegation key used to sign SAS tokens when running under MI. + // Refreshing on every SAS request would add a management-plane round-trip per call. + private static readonly SemaphoreSlim _delegationKeyLock = new(1, 1); + private static Azure.Storage.Blobs.Models.UserDelegationKey _cachedDelegationKey = default!; + private static bool _hasCachedDelegationKey; + private static DateTimeOffset _delegationKeyRefreshAfter = DateTimeOffset.MinValue; + private static readonly TimeSpan DelegationKeyLifetime = TimeSpan.FromHours(1); + // Refresh slightly before expiry to avoid a thundering herd at the boundary. + private static readonly TimeSpan DelegationKeyRefreshSkew = TimeSpan.FromMinutes(10); + + public bool UsesMI { get; set; } + + public bool IsInitialized => _blobServiceClient != null; + private bool _disposed = false; + + + /// + /// Initializes a new instance of the class. + /// + /// The blob service client. + /// The logger instance. + /// Optional config monitor used to read tuning settings such as the streaming buffer size. + public AzureBlobWriter(BlobServiceClient blobServiceClient, ILogger logger, IOptionsMonitor? optionsMonitor = null) + { + _blobServiceClient = blobServiceClient ?? throw new ArgumentNullException(nameof(blobServiceClient)); + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + _optionsMonitor = optionsMonitor; + _logger.LogDebug("Starting BlobWriter service"); + } + + + public async Task InitClientAsync(string containerName) + { + if (string.IsNullOrEmpty(containerName)) + { + _logger.LogWarning("ContainerName cannot be null or empty"); + return false; + } + + // Single-flight: GetOrAdd guarantees only one Lazy is stored per containerName. Every concurrent + // caller awaits the same Task, so CreateIfNotExistsAsync runs exactly once. + // If the task faults, evict it so a later caller can retry. + var lazy = _containerClients.GetOrAdd(containerName, _ => new Lazy>( + () => CreateContainerClientAsync(containerName), + LazyThreadSafetyMode.ExecutionAndPublication)); + + try + { + _ = await lazy.Value.ConfigureAwait(false); + _logger.LogDebug("BlobWriter: Client ready for BlobContainerName: {BlobContainerName}", containerName); + return true; + } + catch (Exception ex) + { + // Evict the failed entry so the next caller can retry initialization. + _containerClients.TryRemove(new KeyValuePair>>(containerName, lazy)); + + throw new BlobWriterException($"Failed to initialize BlobContainerClient for containerName: {containerName}", ex) + { + Operation = "InitClientAsync: GetBlobContainerClient", + ContainerName = containerName + }; + } + } + + private async Task CreateContainerClientAsync(string containerName) + { + _logger.LogDebug("BlobWriter: Initializing BlobContainerName: {BlobContainerName}", containerName); + var client = _blobServiceClient.GetBlobContainerClient(containerName); + // Ensure the container exists once at init time, rather than on every write. + await client.CreateIfNotExistsAsync().ConfigureAwait(false); + return client; + } + + // Synchronously resolves an already-initialized container client. Throws if init has not + // completed successfully. Used by hot read/write paths to avoid awaiting on every call. + private BlobContainerClient GetInitializedContainerClient(string containerName, string operation, string? blobName = null) + { + if (_containerClients.TryGetValue(containerName, out var lazy) + && lazy.IsValueCreated + && lazy.Value.IsCompletedSuccessfully) + { + return lazy.Value.Result; + } + + throw new BlobWriterException($"BlobContainerClient not initialized for containerName: {containerName}. Call InitClientAsync first.") + { + Operation = operation, + BlobName = blobName ?? "N/A", + ContainerName = containerName + }; + } + + /// + /// Creates the blob container if it does not exist and returns an output stream for the specified blob. + /// + /// The name of the blob container. + /// The name of the blob. + /// Cancellation token plumbed through to the SDK handshake and StageBlock calls. + /// A writable stream to the blob. + public async Task CreateBlobAndGetOutputStreamAsync(string containerName, string blobName, CancellationToken cancellationToken = default) + { + // Container existence is ensured once in InitClientAsync; do not re-check on every write. + var containerClient = GetInitializedContainerClient(containerName, "CreateBlobAndGetOutputStreamAsync", blobName); + var blobClient = containerClient.GetBlobClient(blobName); + + _logger.LogDebug("BlobWriter: Creating blob {ContainerName}/{BlobName}", containerClient.Name, blobName); + + // The Azure SDK retries transient failures (408/429/5xx) automatically with exponential + // backoff honoring Retry-After. Overwrite mode means no 409 conflicts are expected here, + // so any failure that escapes the SDK's RetryPolicy is propagated to the caller (the + // BlobWriteQueue worker) which records it as a failed operation. + // + // BufferSize tuning: SDK default is ~4 MiB per StageBlock. For multi-GB payloads, + // raising BufferSize (e.g. 8/16 MiB) reduces round trips at the cost of memory per + // concurrent worker. Read from config at call time so it can be tuned without restart. + var bufferBytes = _optionsMonitor?.CurrentValue.AsyncStreamingBufferSizeBytes ?? 0L; + var options = bufferBytes > 0 + ? new global::Azure.Storage.Blobs.Models.BlobOpenWriteOptions { BufferSize = bufferBytes } + : null; + return await blobClient.OpenWriteAsync(overwrite: true, options: options, cancellationToken: cancellationToken).ConfigureAwait(false); + } + + /// + /// Uploads a fully-materialized payload to the specified blob in a single PUT request + /// using . + /// One network round-trip vs. two for the streaming OpenWriteAsync path. + /// + public async Task UploadBlobAsync(string containerName, string blobName, ReadOnlyMemory data, CancellationToken cancellationToken = default) + { + var containerClient = GetInitializedContainerClient(containerName, "UploadBlobAsync", blobName); + var blockClient = containerClient.GetBlockBlobClient(blobName); + + _logger.LogDebug("BlobWriter: Uploading blob {ContainerName}/{BlobName} - Size: {Size}B (single-shot)", + containerClient.Name, blobName, data.Length); + + // SDK retry policy handles transient failures. BlockBlobClient.UploadAsync + // overwrites unconditionally — no 409 conflicts. + // Avoid an unconditional ToArray() copy when the ReadOnlyMemory is array-backed + // (the common case from BinaryData/ArrayPool buffers): wrap the existing segment. + MemoryStream ms; + if (System.Runtime.InteropServices.MemoryMarshal.TryGetArray(data, out var seg) && seg.Array != null) + { + ms = new MemoryStream(seg.Array, seg.Offset, seg.Count, writable: false); + } + else + { + ms = new MemoryStream(data.ToArray(), writable: false); + } + using (ms) + { + await blockClient.UploadAsync(ms, cancellationToken: cancellationToken).ConfigureAwait(false); + } + } + + public async Task BlobExistsAsync(string containerName, string blobName) + { + var containerClient = GetInitializedContainerClient(containerName, "BlobExistsAsync", blobName); + var blobClient = containerClient.GetBlobClient(blobName); + return await blobClient.ExistsAsync().ConfigureAwait(false); + } + + + public async Task ReadBlobAsStreamAsync(string containerName, string blobName) + { + var containerClient = GetInitializedContainerClient(containerName, "ReadBlobAsStreamAsync", blobName); + + try + { + var blobClient = containerClient.GetBlobClient(blobName); + return await blobClient.OpenReadAsync().ConfigureAwait(false); + } + catch (Exception ex) + { + throw new BlobWriterException($"Failed to read blob as stream for containerName: {containerName}, blobName: {blobName}", ex) + { + Operation = "ReadBlobAsStreamAsync", + BlobName = blobName, + ContainerName = containerName + }; + } + } + + + public async Task DeleteBlobAsync(string containerName, string blobName) + { + if (string.IsNullOrEmpty(containerName)) + { + _logger.LogWarning("ContainerName cannot be null or empty"); + return false; + } + + if (string.IsNullOrEmpty(blobName)) + { + _logger.LogWarning("BlobName cannot be null or empty for containerName: {ContainerName}", containerName); + return false; + } + + var containerClient = GetInitializedContainerClient(containerName, "DeleteBlobAsync", blobName); + var blobClient = containerClient.GetBlobClient(blobName); + return await blobClient.DeleteIfExistsAsync().ConfigureAwait(false); + } + + // /// + // /// Generates a SAS token for the specified blob. + // /// + // /// The blob container name. + // /// The name of the blob. + // /// The expiry time for the SAS token. + // /// The SAS token URL for the blob. + // public async Task GenerateSasTokenAsync(string containerName, string blobName, TimeSpan expiryTime) + // { + // if (string.IsNullOrEmpty(blobName)) + // { + // throw new ArgumentException("BlobName cannot be null or empty", nameof(blobName)); + // } + + // var containerClient = GetInitializedContainerClient(containerName, "GenerateSasTokenAsync", blobName); + + // try + // { + // var blobClient = containerClient.GetBlobClient(blobName); + // var sasBuilder = new BlobSasBuilder + // { + // BlobContainerName = containerClient.Name, + // BlobName = blobName, + // Resource = "b", + // StartsOn = DateTimeOffset.UtcNow.AddMinutes(-5), // Start 5 minutes ago to account for clock skew + // ExpiresOn = DateTimeOffset.UtcNow.Add(expiryTime) + // }; + // sasBuilder.SetPermissions(BlobSasPermissions.Read | BlobSasPermissions.Delete); + + // if (UsesMI) + // { + // // Reuse a cached user delegation key; refreshing per call would add a + // // management-plane round-trip per SAS at high request rates. + // var userDelegationKey = await GetOrRefreshUserDelegationKeyAsync().ConfigureAwait(false); + + // // Generate the SAS token using the user delegation key + // var sasQueryParameters = sasBuilder.ToSasQueryParameters(userDelegationKey, _blobServiceClient.AccountName); + + // // Construct the full SAS URI + // var blobUriBuilder = new BlobUriBuilder(blobClient.Uri) + // { + // Sas = sasQueryParameters + // }; + + // var sasUri = blobUriBuilder.ToUri(); + // _logger.LogDebug("Successfully generated user delegation SAS token for blob {BlobName}", blobName); + // return sasUri.ToString(); + + // } + // else + // { + // // Check if we can use account SAS (when using connection string) + // if (blobClient.CanGenerateSasUri) + // { + // var sasUri = blobClient.GenerateSasUri(sasBuilder); + // _logger.LogDebug("Successfully generated account SAS token for blob {BlobName}", blobName); + // return sasUri.ToString(); + // } + // else + // { + // throw new InvalidOperationException("Cannot generate SAS token. Either enable managed identity (UsesMI=true) or provide a connection string with account keys."); + // } + // } + // } + // catch (Exception ex) + // { + // _logger.LogError(ex, "Failed to generate SAS token for blob {BlobName} in container {ContainerName}", blobName, containerClient.Name); + // throw new BlobWriterException($"Failed to generate SAS token for blob {blobName} in container {containerClient.Name}", ex) + // { + // Operation = "GenerateSasTokenAsync", + // BlobName = blobName, + // ContainerName = containerClient.Name + // }; + // } + // } + + /// + /// Gets the base URI for a blob without SAS token. + /// + /// The blob container name. + /// The name of the blob. + /// The base URI of the blob. + public string GetBlobUri(string containerName, string blobName) + { + if (string.IsNullOrEmpty(blobName)) + { + throw new ArgumentException("BlobName cannot be null or empty", nameof(blobName)); + } + + var containerClient = GetInitializedContainerClient(containerName, "GetBlobUri", blobName); + var blobClient = containerClient.GetBlobClient(blobName); + return blobClient.Uri.ToString(); + } + + // Returns a cached user delegation key, refreshing it shortly before expiry. Single-flight + // protected via SemaphoreSlim so a refresh storm cannot fan out. + private async Task GetOrRefreshUserDelegationKeyAsync() + { + var now = DateTimeOffset.UtcNow; + if (_hasCachedDelegationKey && now < _delegationKeyRefreshAfter) + { + return _cachedDelegationKey; + } + + await _delegationKeyLock.WaitAsync().ConfigureAwait(false); + try + { + now = DateTimeOffset.UtcNow; + if (_hasCachedDelegationKey && now < _delegationKeyRefreshAfter) + { + return _cachedDelegationKey; + } + + _logger.LogDebug("Requesting user delegation key for SAS token generation"); + var start = now.AddMinutes(-5); // tolerate clock skew + var expiry = now.Add(DelegationKeyLifetime); + var response = await _blobServiceClient + .GetUserDelegationKeyAsync(start, expiry) + .ConfigureAwait(false); + + _cachedDelegationKey = response.Value; + _delegationKeyRefreshAfter = expiry - DelegationKeyRefreshSkew; + _hasCachedDelegationKey = true; + return _cachedDelegationKey; + } + finally + { + _delegationKeyLock.Release(); + } + } + + /// + /// Gets connection information for health check and diagnostics. + /// + /// A string describing the blob storage connection configuration. + public string GetConnectionInfo() + { + if (_blobServiceClient == null) + { + return "Not Initialized"; + } + + if (UsesMI) + { + return $"MI: {_blobServiceClient.Uri.Host}"; + } + else + { + return $"ConnectionString: {_blobServiceClient.Uri.Host}"; + } + } + + public void Dispose() + { + Dispose(true); + GC.SuppressFinalize(this); + } + + protected virtual void Dispose(bool disposing) + { + // The BlobServiceClient is owned by BlobWriterFactory (singleton, shared + // across all BlobWriter instances), so we deliberately do not dispose it here. + _disposed = true; + } + + + } +} \ No newline at end of file diff --git a/src/SimpleL7Proxy/Async/BlobStorage/BlobWriteQueue.cs b/src/SimpleL7Proxy/Async/BlobStorage/BlobWorkerPump.cs similarity index 76% rename from src/SimpleL7Proxy/Async/BlobStorage/BlobWriteQueue.cs rename to src/SimpleL7Proxy/Async/BlobStorage/BlobWorkerPump.cs index 88f86707..8b23652d 100644 --- a/src/SimpleL7Proxy/Async/BlobStorage/BlobWriteQueue.cs +++ b/src/SimpleL7Proxy/Async/BlobStorage/BlobWorkerPump.cs @@ -51,8 +51,7 @@ public class BlobWriteOperation /// Data to write. Uses ReadOnlyMemory to avoid copying. /// public ReadOnlyMemory Data { get; init; } - - public int Priority { get; init; } = 0; + public DateTime EnqueuedAt { get; } = DateTime.UtcNow; private readonly TaskCompletionSource _completionSource = new(); @@ -90,31 +89,42 @@ public class BlobWriteResult /// Each worker independently batches operations for the same container. /// Operations for the same blob are routed to the same worker via hashing. /// - public class BlobWriteQueue : IHostedService, IDisposable + public class BlobWorkerPump : IHostedService, IDisposable { private readonly Channel[] _workerChannels; private readonly List _workers; private readonly CancellationTokenSource _shutdownCts; private readonly CancellationTokenSource _metricsLoopCts; - private readonly ILogger _logger; + private readonly SemaphoreSlim _lifecycleLock = new(1, 1); + private readonly ILogger _logger; private readonly BlobWriteQueueOptions _options; - private readonly BlobWriter _blobWriter; + private readonly IBlobWriter _blobWriter; // Metrics private long _operationsQueued = 0; private long _operationsCompleted = 0; private long _operationsFailed = 0; + private long _operationsDeduplicated = 0; + private long _operationsInFlight = 0; private long _batchesExecuted = 0; private long _totalQueueTimeMs = 0; private long _totalProcessTimeMs = 0; private volatile bool _isShuttingDown = false; + private bool _isStarted = false; + private Task? _stopTask; + + /// + /// Gets the total queue depth across all worker channels. + /// Can be used by health checks to monitor queue pressure. + /// + public int QueueDepth => (int)_workerChannels.Sum(ch => ch.Reader.Count); - public BlobWriteQueue( - BlobWriter blobWriter, + public BlobWorkerPump( + IBlobWriterFactory blobWriterFactory, BlobWriteQueueOptions options, - ILogger logger) + ILogger logger) { - _blobWriter = blobWriter ?? throw new ArgumentNullException(nameof(blobWriter)); + _blobWriter = blobWriterFactory?.CreateBlobWriter() ?? throw new ArgumentNullException(nameof(blobWriterFactory)); _options = options ?? throw new ArgumentNullException(nameof(options)); _logger = logger ?? throw new ArgumentNullException(nameof(logger)); _shutdownCts = new CancellationTokenSource(); @@ -149,7 +159,7 @@ public BlobWriteQueue( } _logger.LogInformation( - "[BlobWriteQueue] Initialized - Workers: {Workers}, MaxQueue: {MaxQueue}, Batching: {Batching}, " + + "[BlobWr-Q] Initialized - Workers: {Workers}, MaxQueue: {MaxQueue}, Batching: {Batching}, " + "BatchSize: {BatchSize}, BatchWait: {BatchWait}ms", _options.WorkerCount, _options.MaxQueueSize == 0 ? "Unbounded" : _options.MaxQueueSize.ToString(), @@ -181,59 +191,86 @@ public async Task EnqueueAsync(BlobWriteOperation operation, CancellationT try { var workerId = GetWorkerForBlob(operation.ContainerName, operation.BlobName); - - // Back-pressure: graduated delays based on queue depth to slow down producers - var queueDepth = _workerChannels[workerId].Reader.Count; - int delayMs = queueDepth switch - { - >= 150 => 300, - >= 100 => 200, - >= 50 => 100, - _ => 0 - }; - - if (delayMs > 0) - { - _logger.LogDebug("[BlobWriteQueue] Back-pressure: queue depth {Depth} - delaying {Delay}ms", queueDepth, delayMs); - await Task.Delay(delayMs, cancellationToken).ConfigureAwait(false); - } - await _workerChannels[workerId].Writer.WriteAsync(operation, cancellationToken).ConfigureAwait(false); Interlocked.Increment(ref _operationsQueued); _logger.LogTrace( - "[BlobWriteQueue] Enqueued {OperationId} to Worker-{WorkerId} - Container: {Container}, Blob: {Blob}, Size: {Size}B", + "[BlobWr-Q] Enqueued {OperationId} to Worker-{WorkerId} - Container: {Container}, Blob: {Blob}, Size: {Size}B", operation.OperationId, workerId, operation.ContainerName, operation.BlobName, operation.Data.Length); return true; } catch (Exception ex) { - _logger.LogError(ex, "[BlobWriteQueue] Failed to enqueue operation {OperationId}", operation.OperationId); + _logger.LogError(ex, "[BlobWr-Q] Failed to enqueue operation {OperationId}", operation.OperationId); operation.SetException(ex); return false; } } - public Task StartAsync(CancellationToken cancellationToken) + public async Task StartAsync(CancellationToken cancellationToken) { - _logger.LogInformation("[BlobWriteQueue] Starting {WorkerCount} workers", _options.WorkerCount); - - for (int i = 0; i < _options.WorkerCount; i++) + if (_isStarted || _stopTask is not null) { - int workerId = i; - _workers.Add(Task.Run(() => WorkerLoop(workerId, _shutdownCts.Token), _shutdownCts.Token)); + return; } - _workers.Add(Task.Run(() => MetricsLoop(_metricsLoopCts.Token), _metricsLoopCts.Token)); + await _lifecycleLock.WaitAsync(cancellationToken).ConfigureAwait(false); + try + { + if (_isStarted || _stopTask is not null) + { + return; + } + + _logger.LogInformation("[BlobWr-Q] Starting {WorkerCount} workers", _options.WorkerCount); - return Task.CompletedTask; + for (int i = 0; i < _options.WorkerCount; i++) + { + int workerId = i; + _workers.Add(Task.Run(() => WorkerLoop(workerId, _shutdownCts.Token), _shutdownCts.Token)); + } + + _workers.Add(Task.Run(() => MetricsLoop(_metricsLoopCts.Token), _metricsLoopCts.Token)); + _isStarted = true; + } + finally + { + _lifecycleLock.Release(); + } } public async Task StopAsync(CancellationToken cancellationToken) { - _logger.LogInformation("[BlobWriteQueue] Stopping..."); - + Task? stopTask; + + await _lifecycleLock.WaitAsync(cancellationToken).ConfigureAwait(false); + try + { + if (_stopTask is not null) + { + stopTask = _stopTask; + } + else if (!_isStarted) + { + return; + } + else + { + _stopTask = StopCoreAsync(); + stopTask = _stopTask; + } + } + finally + { + _lifecycleLock.Release(); + } + + await stopTask.ConfigureAwait(false); + } + + private async Task StopCoreAsync() + { // Signal shutdown to MetricsLoop (will increase frequency) _isShuttingDown = true; @@ -249,7 +286,7 @@ public async Task StopAsync(CancellationToken cancellationToken) // DO NOT cancel _shutdownCts - let ALL blob operations complete var shutdownTimeout = TimeSpan.FromSeconds(60); // Allow time for blob operations to complete - _logger.LogInformation("[BlobWriteQueue] Waiting for workers to complete (timeout: {Timeout}s)...", shutdownTimeout.TotalSeconds); + _logger.LogInformation("[BlobWr-Q] ⏳ Waiting for blob workers to complete (timeout: {Timeout}s)...", shutdownTimeout.TotalSeconds); try { @@ -263,7 +300,7 @@ public async Task StopAsync(CancellationToken cancellationToken) if (completedTask != workerTask) { - _logger.LogWarning("[BlobWriteQueue] Shutdown timeout reached - {Timeout}s", shutdownTimeout.TotalSeconds); + _logger.LogWarning("[BlobWr-Q] ❌ Shutdown timeout reached - {Timeout}s", shutdownTimeout.TotalSeconds); // DO NOT cancel _shutdownCts - we need blob operations to complete // Just wait a bit more for cleanup try @@ -276,7 +313,6 @@ public async Task StopAsync(CancellationToken cancellationToken) { // Workers completed normally await workerTask.ConfigureAwait(false); - _logger.LogDebug("[BlobWriteQueue] All workers completed gracefully"); } // NOW stop MetricsLoop (it was last to run) @@ -291,21 +327,50 @@ public async Task StopAsync(CancellationToken cancellationToken) } catch (OperationCanceledException) { - _logger.LogDebug("[BlobWriteQueue] Shutdown cancelled"); + _logger.LogDebug("[BlobWr-Q] Shutdown cancelled"); } catch (Exception ex) { - _logger.LogWarning(ex, "[BlobWriteQueue] Error during shutdown"); + _logger.LogWarning(ex, "[BlobWr-Q] Error during shutdown"); + } + + // Diagnostic: Collect any unflushed blob names still in channels + var unflushedBlobs = new List(); + try + { + foreach (var channel in _workerChannels) + { + while (channel.Reader.TryRead(out var operation)) + { + unflushedBlobs.Add($"{operation.ContainerName}/{operation.BlobName}"); + } + } + } + catch { /* Safely ignore any channel read errors */ } + + // Log diagnostic info about unflushed blobs + if (unflushedBlobs.Count > 0) + { + _logger.LogWarning("[BlobWr-Q] ⚠️ Found {UnflushedCount} unflushed blob operations at shutdown:", unflushedBlobs.Count); + for (int i = 0; i < unflushedBlobs.Count && i < 50; i++) // Limit to first 50 to avoid log spam + { + _logger.LogWarning("[BlobWr-Q] - {BlobName}", unflushedBlobs[i]); + } + if (unflushedBlobs.Count > 50) + { + _logger.LogWarning("[BlobWr-Q] ... and {RemainingCount} more", unflushedBlobs.Count - 50); + } } var avgQueueTime = _operationsCompleted > 0 ? _totalQueueTimeMs / _operationsCompleted : 0; var avgProcessTime = _operationsCompleted > 0 ? _totalProcessTimeMs / _operationsCompleted : 0; _logger.LogInformation( - "[BlobWriteQueue] Stopped - Queued: {Queued}, Completed: {Completed}, Failed: {Failed}, " + - "Batches: {Batches}, AvgQueueTime: {AvgQueue}ms, AvgProcessTime: {AvgProcess}ms", - _operationsQueued, _operationsCompleted, _operationsFailed, _batchesExecuted, + "[BlobWr-Q] ⏹ Stopped Σ Q={Queued} C={Completed} D={Dedup} Fail={Failed} B={Batches} ║ avg q/p {AvgQueue}/{AvgProcess} ms", + _operationsQueued, _operationsCompleted, _operationsDeduplicated, _operationsFailed, _batchesExecuted, avgQueueTime, avgProcessTime); + + _isStarted = false; } private async Task WorkerLoop(int workerId, CancellationToken cancellationToken) @@ -319,6 +384,7 @@ private async Task WorkerLoop(int workerId, CancellationToken cancellationToken) await foreach (var operation in _workerChannels[workerId].Reader.ReadAllAsync(cancellationToken)) { + Interlocked.Increment(ref _operationsInFlight); try { if (_options.EnableBatching) @@ -345,6 +411,7 @@ await ProcessSingleOperationAsync(operation, workerId, cancellationToken) }); Interlocked.Increment(ref _operationsFailed); + Interlocked.Decrement(ref _operationsInFlight); } } @@ -372,15 +439,14 @@ private async Task ProcessSingleOperationAsync( try { - // BlobWriter already caches container clients, no need to init - var stream = await _blobWriter.CreateBlobAndGetOutputStreamAsync( + // Queue is small-blob-only: payload is fully materialized in memory, so use + // BlockBlobClient.UploadAsync (1 round-trip). Large/streamed payloads bypass + // the queue via AsyncStreamingStore → BlobClient.OpenWriteAsync. + await _blobWriter.UploadBlobAsync( operation.ContainerName, - operation.BlobName) - .ConfigureAwait(false); - - await stream.WriteAsync(operation.Data, cancellationToken).ConfigureAwait(false); - await stream.FlushAsync(cancellationToken).ConfigureAwait(false); - await stream.DisposeAsync().ConfigureAwait(false); + operation.BlobName, + operation.Data, + cancellationToken).ConfigureAwait(false); sw.Stop(); @@ -392,6 +458,7 @@ private async Task ProcessSingleOperationAsync( }); Interlocked.Increment(ref _operationsCompleted); + Interlocked.Decrement(ref _operationsInFlight); Interlocked.Add(ref _totalQueueTimeMs, (long)queueTime.TotalMilliseconds); Interlocked.Add(ref _totalProcessTimeMs, sw.ElapsedMilliseconds); @@ -415,6 +482,7 @@ private async Task ProcessSingleOperationAsync( }); Interlocked.Increment(ref _operationsFailed); + Interlocked.Decrement(ref _operationsInFlight); } } @@ -448,6 +516,7 @@ private async Task ProcessWithBatchingAsync( { if (_workerChannels[workerId].Reader.TryRead(out var nextOperation)) { + Interlocked.Increment(ref _operationsInFlight); if (nextOperation.ContainerName == containerName) { batchBuffer.Add(nextOperation); @@ -478,6 +547,9 @@ await ProcessSingleOperationAsync(nextOperation, workerId, cancellationToken) // Single operation, no batching benefit await ProcessSingleOperationAsync(batchBuffer[0], workerId, cancellationToken).ConfigureAwait(false); } + + // Clear so the shutdown flush in WorkerLoop doesn't re-execute these already-completed ops + batchBuffer.Clear(); } private async Task ExecuteBatchAsync( @@ -526,7 +598,9 @@ private async Task ExecuteBatchAsync( QueueTime = DateTime.UtcNow - dupOp.EnqueuedAt }); - Interlocked.Increment(ref _operationsCompleted); + // Counted as Dedup only (disjoint from Completed) + Interlocked.Increment(ref _operationsDeduplicated); + Interlocked.Decrement(ref _operationsInFlight); } } } @@ -544,14 +618,12 @@ private async Task ExecuteBatchAsync( try { - var stream = await _blobWriter.CreateBlobAndGetOutputStreamAsync( + // Queue is small-blob-only: 1-RT UploadAsync for fully-materialized payloads. + await _blobWriter.UploadBlobAsync( operation.ContainerName, - operation.BlobName) - .ConfigureAwait(false); - - await stream.WriteAsync(operation.Data, cancellationToken).ConfigureAwait(false); - await stream.FlushAsync(cancellationToken).ConfigureAwait(false); - await stream.DisposeAsync().ConfigureAwait(false); + operation.BlobName, + operation.Data, + cancellationToken).ConfigureAwait(false); opSw.Stop(); @@ -563,6 +635,7 @@ private async Task ExecuteBatchAsync( }); Interlocked.Increment(ref _operationsCompleted); + Interlocked.Decrement(ref _operationsInFlight); Interlocked.Add(ref _totalQueueTimeMs, (long)queueTime.TotalMilliseconds); Interlocked.Add(ref _totalProcessTimeMs, opSw.ElapsedMilliseconds); } @@ -585,6 +658,7 @@ private async Task ExecuteBatchAsync( }); Interlocked.Increment(ref _operationsFailed); + Interlocked.Decrement(ref _operationsInFlight); } }); @@ -620,6 +694,7 @@ private async Task ExecuteBatchAsync( }); Interlocked.Increment(ref _operationsFailed); + Interlocked.Decrement(ref _operationsInFlight); } } } @@ -628,6 +703,7 @@ private async Task ExecuteBatchAsync( private long _lastQueued = 0; private long _lastCompleted = 0; private long _lastFailed = 0; + private long _lastDeduplicated = 0; private long _lastBatches = 0; private async Task MetricsLoop(CancellationToken cancellationToken) @@ -638,7 +714,7 @@ private async Task MetricsLoop(CancellationToken cancellationToken) { // During shutdown, log more frequently to show progress var delay = _isShuttingDown - ? TimeSpan.FromSeconds(2) + ? TimeSpan.FromSeconds(.5) : TimeSpan.FromSeconds(_options.MetricsIntervalSeconds); await Task.Delay(delay, cancellationToken) @@ -648,43 +724,52 @@ await Task.Delay(delay, cancellationToken) var queued = Interlocked.Read(ref _operationsQueued); var completed = Interlocked.Read(ref _operationsCompleted); var failed = Interlocked.Read(ref _operationsFailed); + var deduplicated = Interlocked.Read(ref _operationsDeduplicated); var batches = Interlocked.Read(ref _batchesExecuted); // Calculate deltas since last report var deltaQueued = queued - _lastQueued; var deltaCompleted = completed - _lastCompleted; var deltaFailed = failed - _lastFailed; + var deltaDeduplicated = deduplicated - _lastDeduplicated; var deltaBatches = batches - _lastBatches; - // Skip if nothing happened since last report - if (deltaQueued == 0 && deltaCompleted == 0 && deltaFailed == 0) + var remaining = _workerChannels.Sum(ch => ch.Reader.Count); + var inFlight = Interlocked.Read(ref _operationsInFlight); + + // Suppress redundant lines: if no counters moved AND nothing is queued or + // in flight, the snapshot is identical to the last one — don't log it. + if (deltaQueued == 0 && deltaCompleted == 0 && deltaFailed == 0 + && deltaDeduplicated == 0 && deltaBatches == 0 + && remaining == 0 && inFlight == 0) + { continue; + } // Update snapshots _lastQueued = queued; _lastCompleted = completed; _lastFailed = failed; + _lastDeduplicated = deduplicated; _lastBatches = batches; - var remaining = _workerChannels.Sum(ch => ch.Reader.Count); var avgQueueTime = completed > 0 ? _totalQueueTimeMs / completed : 0; var avgProcessTime = completed > 0 ? _totalProcessTimeMs / completed : 0; + // DIAGNOSTIC: Log queue depth + in-flight (dequeued but not yet completed) if (failed > 0) { _logger.LogWarning( - "[BlobWriteQueue] Δ Queued: +{DeltaQueued}, Completed: +{DeltaCompleted}, Failed: +{DeltaFailed} (total: {TotalFailed}), " + - "Batches: +{DeltaBatches} | Remaining: {Remaining}, AvgQueue: {AvgQueue}ms, AvgProcess: {AvgProcess}ms", - deltaQueued, deltaCompleted, deltaFailed, failed, - deltaBatches, remaining, avgQueueTime, avgProcessTime); + "[BlobWr-Q] Δ Q+{DeltaQueued} C+{DeltaCompleted} Dup+{DeltaDeduplicated} Fail+{DeltaFailed} Bch+{DeltaBatches} (failed total: {TotalFailed}) ║ depth {Remaining} / inflight {InFlight} ║ avg q/p {AvgQueue}/{AvgProcess} ms", + deltaQueued, deltaCompleted, deltaDeduplicated, deltaFailed, deltaBatches, failed, + remaining, inFlight, avgQueueTime, avgProcessTime); } else { _logger.LogInformation( - "[BlobWriteQueue] Δ Queued: +{DeltaQueued}, Completed: +{DeltaCompleted}, " + - "Batches: +{DeltaBatches} | Remaining: {Remaining}, AvgQueue: {AvgQueue}ms, AvgProcess: {AvgProcess}ms", - deltaQueued, deltaCompleted, - deltaBatches, remaining, avgQueueTime, avgProcessTime); + "[BlobWr-Q] Δ Q+{DeltaQueued} C+{DeltaCompleted} Dup+{DeltaDeduplicated} Bch+{DeltaBatches} ║ depth {Remaining} / inflight {InFlight} ║ avg q/p {AvgQueue}/{AvgProcess} ms", + deltaQueued, deltaCompleted, deltaDeduplicated, deltaBatches, + remaining, inFlight, avgQueueTime, avgProcessTime); } } catch (OperationCanceledException) @@ -698,6 +783,7 @@ public void Dispose() { _shutdownCts?.Dispose(); _metricsLoopCts?.Dispose(); + _lifecycleLock.Dispose(); GC.SuppressFinalize(this); } } diff --git a/src/SimpleL7Proxy/Async/BlobStorage/BlobWriter.cs b/src/SimpleL7Proxy/Async/BlobStorage/BlobWriter.cs deleted file mode 100644 index e76d8349..00000000 --- a/src/SimpleL7Proxy/Async/BlobStorage/BlobWriter.cs +++ /dev/null @@ -1,404 +0,0 @@ -using Azure.Storage; -using Azure.Storage.Blobs; -using Azure.Storage.Blobs.Specialized; -using Azure.Storage.Sas; -using Azure; -using Microsoft.Extensions.Hosting; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using System.IO; -using System.Threading.Tasks; -using System.Collections.Concurrent; - -// Review DISPOSAL_ARCHITECTURE.MD in the root for details on disposal flow - -namespace SimpleL7Proxy.Async.BlobStorage -{ - /// - /// Provides methods for writing to Azure Blob Storage. - /// - public class BlobWriter : IBlobWriter, IDisposable - { - private static readonly ConcurrentDictionary _containerClients = new(); - //private readonly BlobContainerClient _containerClient = null!; - - private readonly BlobServiceClient _blobServiceClient; - private readonly ILogger _logger; - - public bool UsesMI { get; set; } - - public bool IsInitialized => _blobServiceClient != null; - private bool _disposed = false; - - - /// - /// Initializes a new instance of the class. - /// - /// The blob service client. - /// The logger instance. - public BlobWriter(BlobServiceClient blobServiceClient, ILogger logger) - { - _blobServiceClient = blobServiceClient ?? throw new ArgumentNullException(nameof(blobServiceClient)); - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - _logger.LogDebug("Starting BlobWriter service"); - } - - - public async Task InitClientAsync(string userId, string containerName) - { - - if (string.IsNullOrEmpty(userId)) - { - _logger.LogWarning("UserId cannot be null or empty"); - return false; - } - - if (string.IsNullOrEmpty(containerName)) - { - _logger.LogWarning("ContainerName cannot be null or empty for userId: {UserId}", userId); - return false; - } - // Check if the client for this userId already exists - // Should we check if the writer is valid ? - if (_containerClients.ContainsKey(userId)) - { - // Client already exists, no need to create a new one - _logger.LogDebug("BlobWriter: Client already initialized for UserId: {UserId}, BlobContainerName: {BlobContainerName}", userId, containerName); - return true; - } - _logger.LogDebug("BlobWriter: Initializing for UserId: {UserId}, BlobContainerName: {BlobContainerName}", userId, containerName); - - try - { - var client = _blobServiceClient.GetBlobContainerClient(containerName); - // Ensure container exists - await client.CreateIfNotExistsAsync().ConfigureAwait(false); - - if (_containerClients.TryAdd(userId, client)) - { - // Successfully added the client to the dictionary - return true; - } - } - catch (Exception ex) - { - - throw new BlobWriterException($"Failed to initialize BlobContainerClient for userId: {userId}, containerName: {containerName}", ex) - { - Operation = "InitClientAsync: CreateIfNotExistsAsync", - ContainerName = containerName, - UserId = userId - }; - // Log the exception or handle it as needed - //Console.WriteLine($"Error initializing BlobContainerClient for userId {userId}: {ex.Message}"); - - } - - return false; - } - - /// - /// Creates the blob container if it does not exist and returns an output stream for the specified blob. - /// - /// The name of the blob. - /// A writable stream to the blob. - public async Task CreateBlobAndGetOutputStreamAsync(string userId, string blobName) - { - //_logger.LogTrace($"[BLOB-TRACE] CreateBlobAndGetOutputStreamAsync | Container: {userId} | Blob: {blobName} | Thread: {System.Threading.Thread.CurrentThread.ManagedThreadId} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); - - // Get the client for the userId - if (!_containerClients.TryGetValue(userId, out var _containerClient)) - { - throw new BlobWriterException($"BlobContainerClient not initialized for userId: {userId}. Call InitializeClientAsync first.") - { - Operation = "CreateBlobAndGetOutputStreamAsync", - BlobName = blobName, - UserId = userId - }; - } - - await _containerClient.CreateIfNotExistsAsync().ConfigureAwait(false); - var blobClient = _containerClient.GetBlobClient(blobName); - - _logger.LogDebug("BlobWriter: Creating blob {ContainerName}/{BlobName} for user {UserId}", _containerClient.Name, blobName, userId); - - // Retry logic for 409 conflicts (concurrent writes) - const int maxRetries = 3; - const int baseDelayMs = 100; - - for (int attempt = 0; attempt < maxRetries; attempt++) - { - try - { - //_logger.LogTrace($"[BLOB-TRACE] WRITE-START | Container: {userId} | Blob: {blobName} | Attempt: {attempt + 1}/{maxRetries} | Thread: {System.Threading.Thread.CurrentThread.ManagedThreadId} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); - - // OpenWriteAsync will create the blob if it does not exist and return a writable stream. - var stream = await blobClient.OpenWriteAsync(overwrite: true).ConfigureAwait(false); - - //_logger.LogTrace($"[BLOB-TRACE] WRITE-SUCCESS | Container: {userId} | Blob: {blobName} | Thread: {System.Threading.Thread.CurrentThread.ManagedThreadId} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); - return stream; - } - catch (Azure.RequestFailedException ex) when (ex.Status == 409 && attempt < maxRetries - 1) - { - // 409 = Conflict - blob is likely being written by another process - var delay = baseDelayMs * (int)Math.Pow(2, attempt); // Exponential backoff - - _logger.LogWarning($"[BLOB-TRACE] 409-CONFLICT | Container: {userId} | Blob: {blobName} | Attempt: {attempt + 1}/{maxRetries} | ErrorCode: {ex.ErrorCode} | Message: {ex.Message} | Thread: {System.Threading.Thread.CurrentThread.ManagedThreadId} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); - _logger.LogWarning($"[BLOB-TRACE] 409-STACK | {ex.StackTrace}"); - - _logger.LogWarning("BlobWriter: Blob conflict (409) for {BlobName}, attempt {Attempt}/{MaxRetries} - retrying in {Delay}ms", - blobName, attempt + 1, maxRetries, delay); - - _logger.LogWarning($"[BLOB-TRACE] 409-RETRY | Container: {userId} | Blob: {blobName} | DelayMs: {delay} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); - await Task.Delay(delay).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError($"[BLOB-TRACE] ERROR | Container: {userId} | Blob: {blobName} | Error: {ex.GetType().Name} | Message: {ex.Message} | Thread: {System.Threading.Thread.CurrentThread.ManagedThreadId} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); - throw; - } - } - - // If we get here, all retries failed - try one last time and let any exception propagate - _logger.LogWarning($"[BLOB-TRACE] 409-FAILED | Container: {userId} | Blob: {blobName} | AllRetriesExhausted | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); - return await blobClient.OpenWriteAsync(overwrite: true).ConfigureAwait(false); - } - - public async Task BlobExistsAsync(string userId, string blobName) - { - // Get the client for the userId - if (!_containerClients.TryGetValue(userId, out var _containerClient)) - { - throw new BlobWriterException($"BlobContainerClient not initialized for userId: {userId}. Call InitializeClientAsync first.") - { - Operation = "CreateBlobAndGetOutputStreamAsync", - BlobName = blobName, - UserId = userId - }; - } - var blobClient = _containerClient.GetBlobClient(blobName); - return await blobClient.ExistsAsync().ConfigureAwait(false); - } - - - public async Task ReadBlobAsStreamAsync(string userId, string blobName) - { - //_logger.LogTrace($"[BLOB-TRACE] READ-START | Container: {userId} | Blob: {blobName} | Thread: {System.Threading.Thread.CurrentThread.ManagedThreadId} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); - - // Get the client for the userId - if (!_containerClients.TryGetValue(userId, out var _containerClient)) - { - throw new BlobWriterException($"BlobContainerClient not initialized for userId: {userId}. Call InitializeClientAsync first.") - { - Operation = "ReadBlobAsStreamAsync", - BlobName = blobName, - UserId = userId - }; - } - - try - { - var blobClient = _containerClient.GetBlobClient(blobName); - var stream = await blobClient.OpenReadAsync().ConfigureAwait(false); - - //_logger.LogTrace($"[BLOB-TRACE] READ-SUCCESS | Container: {userId} | Blob: {blobName} | Thread: {System.Threading.Thread.CurrentThread.ManagedThreadId} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); - return stream; - } - catch (Exception ex) - { - _logger.LogError($"[BLOB-TRACE] READ-ERROR | Container: {userId} | Blob: {blobName} | Error: {ex.GetType().Name} | Message: {ex.Message} | Thread: {System.Threading.Thread.CurrentThread.ManagedThreadId} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); - - throw new BlobWriterException($"Failed to read blob as stream for userId: {userId}, blobName: {blobName}", ex) - { - Operation = "ReadBlobAsStreamAsync", - BlobName = blobName, - UserId = userId - }; - } - } - - - public async Task DeleteBlobAsync(string userId, string blobName) - { - if (string.IsNullOrEmpty(userId)) - { - _logger.LogWarning("UserId cannot be null or empty"); - return false; - } - - if (string.IsNullOrEmpty(blobName)) - { - _logger.LogWarning("BlobName cannot be null or empty for userId: {UserId}", userId); - return false; - } - - // Get the client for the userId - if (!_containerClients.TryGetValue(userId, out var _containerClient)) - { - throw new InvalidOperationException($"BlobContainerClient not initialized for userId: {userId}. Call InitializeClientAsync first."); - } - - var blobClient = _containerClient.GetBlobClient(blobName); - return await blobClient.DeleteIfExistsAsync().ConfigureAwait(false); - } - - /// - /// Generates a SAS token for the specified blob. - /// - /// The user ID. - /// The name of the blob. - /// The expiry time for the SAS token. - /// The SAS token URL for the blob. - public async Task GenerateSasTokenAsync(string userId, string blobName, TimeSpan expiryTime) - { - if (string.IsNullOrEmpty(blobName)) - { - throw new ArgumentException("BlobName cannot be null or empty", nameof(blobName)); - } - - // Get the client for the userId - if (!_containerClients.TryGetValue(userId, out var _containerClient)) - { - throw new BlobWriterException($"BlobContainerClient not initialized for userId: {userId}. Call InitializeClientAsync first.") - { - Operation = "GenerateSasTokenAsync", - BlobName = blobName, - UserId = userId - }; - } - - try - { - var blobClient = _containerClient.GetBlobClient(blobName); - var sasBuilder = new BlobSasBuilder - { - BlobContainerName = _containerClient.Name, - BlobName = blobName, - Resource = "b", - StartsOn = DateTimeOffset.UtcNow.AddMinutes(-5), // Start 5 minutes ago to account for clock skew - ExpiresOn = DateTimeOffset.UtcNow.Add(expiryTime) - }; - sasBuilder.SetPermissions(BlobSasPermissions.Read | BlobSasPermissions.Delete); - - if (UsesMI) - { - // Get a user delegation key for the Blob service that's valid for 1 hour - var delegationKeyStartTime = DateTimeOffset.UtcNow; - var delegationKeyExpiryTime = delegationKeyStartTime.Add(TimeSpan.FromHours(1)); - - _logger.LogDebug("Requesting user delegation key for SAS token generation"); - var userDelegationKey = await _blobServiceClient - .GetUserDelegationKeyAsync(delegationKeyStartTime, delegationKeyExpiryTime) - .ConfigureAwait(false); - - // Generate the SAS token using the user delegation key - var sasQueryParameters = sasBuilder.ToSasQueryParameters(userDelegationKey.Value, _blobServiceClient.AccountName); - - // Construct the full SAS URI - var blobUriBuilder = new BlobUriBuilder(blobClient.Uri) - { - Sas = sasQueryParameters - }; - - var sasUri = blobUriBuilder.ToUri(); - _logger.LogDebug("Successfully generated user delegation SAS token for blob {BlobName}", blobName); - return sasUri.ToString(); - - } - else - { - // Check if we can use account SAS (when using connection string) - if (blobClient.CanGenerateSasUri) - { - var sasUri = blobClient.GenerateSasUri(sasBuilder); - _logger.LogDebug("Successfully generated account SAS token for blob {BlobName}", blobName); - return sasUri.ToString(); - } - else - { - throw new InvalidOperationException("Cannot generate SAS token. Either enable managed identity (UsesMI=true) or provide a connection string with account keys."); - } - } - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to generate SAS token for blob {BlobName} in container {ContainerName}", blobName, _containerClient.Name); - throw new BlobWriterException($"Failed to generate SAS token for blob {blobName} in container {_containerClient.Name}", ex) - { - Operation = "GenerateSasTokenAsync", - BlobName = blobName, - ContainerName = _containerClient.Name, - UserId = userId - }; - } - } - - /// - /// Gets the base URI for a blob without SAS token. - /// - /// The user ID. - /// The name of the blob. - /// The base URI of the blob. - public string GetBlobUri(string userId, string blobName) - { - if (string.IsNullOrEmpty(blobName)) - { - throw new ArgumentException("BlobName cannot be null or empty", nameof(blobName)); - } - - // Get the client for the userId - if (!_containerClients.TryGetValue(userId, out var _containerClient)) - { - throw new BlobWriterException($"BlobContainerClient not initialized for userId: {userId}. Call InitializeClientAsync first.") - { - Operation = "GetBlobUri", - BlobName = blobName, - UserId = userId - }; - } - - var blobClient = _containerClient.GetBlobClient(blobName); - return blobClient.Uri.ToString(); - } - - /// - /// Gets connection information for health check and diagnostics. - /// - /// A string describing the blob storage connection configuration. - public string GetConnectionInfo() - { - if (_blobServiceClient == null) - { - return "Not Initialized"; - } - - if (UsesMI) - { - return $"MI: {_blobServiceClient.Uri.Host}"; - } - else - { - return $"ConnectionString: {_blobServiceClient.Uri.Host}"; - } - } - - public void Dispose() - { - Dispose(true); - GC.SuppressFinalize(this); - } - - protected virtual void Dispose(bool disposing) - { - if (!_disposed) - { - if (disposing) - { - } - _disposed = true; - } - } - - - } -} \ No newline at end of file diff --git a/src/SimpleL7Proxy/Async/BlobStorage/BlobWriterException.cs b/src/SimpleL7Proxy/Async/BlobStorage/BlobWriterException.cs index 499257b7..8c8995b2 100644 --- a/src/SimpleL7Proxy/Async/BlobStorage/BlobWriterException.cs +++ b/src/SimpleL7Proxy/Async/BlobStorage/BlobWriterException.cs @@ -17,7 +17,6 @@ public class BlobWriterException : Exception public string Guid { get; set; } = "N/A"; public string MID { get; set; } = "N/A"; public string Operation { get; set; } = "N/A"; - public string UserId { get; set; } = "N/A"; public BlobWriterException(string message) : base(message) { } public BlobWriterException(string message, Exception innerException) : base(message, innerException) { } diff --git a/src/SimpleL7Proxy/Async/BlobStorage/BlobWriterFactory.cs b/src/SimpleL7Proxy/Async/BlobStorage/BlobWriterFactory.cs index 3f52a938..67cbb48d 100644 --- a/src/SimpleL7Proxy/Async/BlobStorage/BlobWriterFactory.cs +++ b/src/SimpleL7Proxy/Async/BlobStorage/BlobWriterFactory.cs @@ -1,4 +1,5 @@ using System.Reflection.Metadata.Ecma335; +using Azure.Core; using Azure.Storage.Blobs; using Microsoft.Extensions.Logging; using Microsoft.Extensions.Options; @@ -20,13 +21,18 @@ public class BlobWriterFactory : IBlobWriterFactory { private readonly DefaultCredential _defaultCredential; private readonly IOptionsMonitor _optionsMonitor; - private readonly ILogger _logger; + private readonly ILogger _logger; private readonly ILogger _nullBlobWriterLogger; + // Shared BlobServiceClient — owns the HTTP connection pool. Created once on first call + // so that all IBlobWriter instances (QueuedBlobWriter, BlobWriteQueue, etc.) share the same pool. + private BlobServiceClient? _sharedBlobServiceClient; + private bool _usesMI; + public BlobWriterFactory( DefaultCredential defaultCredential, IOptionsMonitor optionsMonitor, - ILogger logger, + ILogger logger, ILogger nullBlobWriterLogger) { _defaultCredential = defaultCredential; @@ -54,7 +60,7 @@ public IBlobWriter CreateBlobWriter() else { InitStatus = $"MI, {uri}"; - return CreateBlobWriterWithManagedIdentity(uri); + return CreateBlobWriterFromSharedClient(() => CreateBlobServiceClientWithManagedIdentity(uri), useMI: true); } } else @@ -66,11 +72,10 @@ public IBlobWriter CreateBlobWriter() } else { - try { InitStatus = "CS"; - return CreateBlobWriterWithConnectionString(connectionString); + return CreateBlobWriterFromSharedClient(() => CreateBlobServiceClientWithConnectionString(connectionString), useMI: false); } catch (Exception ex) { @@ -82,47 +87,63 @@ public IBlobWriter CreateBlobWriter() return new NullBlobWriter(_nullBlobWriterLogger); } - private IBlobWriter CreateBlobWriterWithManagedIdentity(string storageAccountUri) + // Returns a BlobWriter backed by the shared BlobServiceClient, creating it on first call. + private IBlobWriter CreateBlobWriterFromSharedClient(Func clientFactory, bool useMI) { - try + if (_sharedBlobServiceClient == null) { - Uri blobServiceUri; - - blobServiceUri = new Uri(storageAccountUri); + _sharedBlobServiceClient = clientFactory(); + _usesMI = useMI; + } + var writer = new AzureBlobWriter(_sharedBlobServiceClient, _logger, _optionsMonitor); + writer.UsesMI = _usesMI; + return writer; + } - // Use DefaultAzureCredential for managed identity + private BlobServiceClient CreateBlobServiceClientWithManagedIdentity(string storageAccountUri) + { + try + { + var blobServiceUri = new Uri(storageAccountUri); var credential = _defaultCredential.Credential; - var blobServiceClient = new BlobServiceClient(blobServiceUri, credential); - var blobWriter = new BlobWriter(blobServiceClient, _logger); - blobWriter.UsesMI = true; // Set on BlobWriter, not BlobServiceClient - //_logger.LogInformation("[STARTUP] ✓ BlobServiceClient created successfully with managed identity - URI: {Uri}", storageAccountUri); - - return blobWriter; + return new BlobServiceClient(blobServiceUri, credential, BuildClientOptions()); } catch (Exception ex) { _logger.LogError(ex, $"Failed to create BlobServiceClient with managed identity: {ex.Message}"); InitStatus = $"MI, error: {ex.Message}"; - return new NullBlobWriter(_nullBlobWriterLogger); + throw; } } - private IBlobWriter CreateBlobWriterWithConnectionString(string connectionString) + private BlobServiceClient CreateBlobServiceClientWithConnectionString(string connectionString) { try { - var blobServiceClient = new BlobServiceClient(connectionString); - var blobWriter = new BlobWriter(blobServiceClient, _logger); - blobWriter.UsesMI = false; // Set to false for connection string authentication - return blobWriter; + return new BlobServiceClient(connectionString, BuildClientOptions()); } catch (Exception ex) { _logger.LogError(ex, $"Failed to create BlobServiceClient with connection string: {ex.Message}"); InitStatus = $"CS, error: {ex.Message}"; - return new NullBlobWriter(_nullBlobWriterLogger); + throw; } } + + // Tightened retry / timeout policy for high-throughput small-blob writes. + // SDK defaults (3 retries, 800ms initial backoff, 60s max, 100s network timeout) are tuned + // for large-blob workloads and add seconds of latency on the first transient error. We + // shorten them so a stuck call fails fast and the BlobWriteQueue worker can move on. + private static BlobClientOptions BuildClientOptions() + { + var options = new BlobClientOptions(); + options.Retry.MaxRetries = 3; + options.Retry.Mode = RetryMode.Exponential; + options.Retry.Delay = TimeSpan.FromMilliseconds(200); + options.Retry.MaxDelay = TimeSpan.FromSeconds(5); + options.Retry.NetworkTimeout = TimeSpan.FromSeconds(15); + return options; + } } } \ No newline at end of file diff --git a/src/SimpleL7Proxy/Async/BlobStorage/IBlobWriter.cs b/src/SimpleL7Proxy/Async/BlobStorage/IBlobWriter.cs index 4834428b..15a35202 100644 --- a/src/SimpleL7Proxy/Async/BlobStorage/IBlobWriter.cs +++ b/src/SimpleL7Proxy/Async/BlobStorage/IBlobWriter.cs @@ -1,17 +1,31 @@ namespace SimpleL7Proxy.Async.BlobStorage { /// - /// Interface for blob storage operations. + /// Interface for blob storage operations. The storage layer is user-agnostic — callers + /// resolve user → container before calling. The container is the unit of isolation. /// - public interface IBlobWriter + public interface IBlobWriter : IDisposable { - Task CreateBlobAndGetOutputStreamAsync(string userId, string blobName); - Task BlobExistsAsync(string userId, string blobName); - Task ReadBlobAsStreamAsync(string userId, string blobName); - Task DeleteBlobAsync(string userId, string blobName); - Task GenerateSasTokenAsync(string userId, string blobName, TimeSpan expiryTime); - string GetBlobUri(string userId, string blobName); - Task InitClientAsync(string userId, string containerName); + Task CreateBlobAndGetOutputStreamAsync(string containerName, string blobName, CancellationToken cancellationToken = default); + + /// + /// Uploads a fully-materialized payload to the specified blob in a single PUT request + /// (1 round-trip). Prefer this over when + /// the entire payload is already in memory. + /// + Task UploadBlobAsync(string containerName, string blobName, ReadOnlyMemory data, CancellationToken cancellationToken = default); + + Task BlobExistsAsync(string containerName, string blobName); + Task ReadBlobAsStreamAsync(string containerName, string blobName); + Task DeleteBlobAsync(string containerName, string blobName); + // Task GenerateSasTokenAsync(string containerName, string blobName, TimeSpan expiryTime); + string GetBlobUri(string containerName, string blobName); + + /// + /// Ensures the container exists. Idempotent and single-flight per container name. + /// + Task InitClientAsync(string containerName); + bool IsInitialized { get; } string GetConnectionInfo(); } diff --git a/src/SimpleL7Proxy/Async/BlobStorage/NullBlobWriter.cs b/src/SimpleL7Proxy/Async/BlobStorage/NullBlobWriter.cs index 194868e5..d91d1cbb 100644 --- a/src/SimpleL7Proxy/Async/BlobStorage/NullBlobWriter.cs +++ b/src/SimpleL7Proxy/Async/BlobStorage/NullBlobWriter.cs @@ -16,26 +16,32 @@ public NullBlobWriter(ILogger logger) public bool IsInitialized => false; - public Task CreateBlobAndGetOutputStreamAsync(string userId, string blobName) + public Task CreateBlobAndGetOutputStreamAsync(string containerName, string blobName, CancellationToken cancellationToken = default) { // Return a no-op stream (Stream.Null) instead of throwing // This allows async processing to work even when blob storage is disabled return Task.FromResult(Stream.Null); } - public Task BlobExistsAsync(string userId, string blobName) + public Task UploadBlobAsync(string containerName, string blobName, ReadOnlyMemory data, CancellationToken cancellationToken = default) + { + // Blob storage disabled: silently succeed + return Task.CompletedTask; + } + + public Task BlobExistsAsync(string containerName, string blobName) { // Blob storage is disabled, so no blobs exist return Task.FromResult(false); } - public Task DeleteBlobAsync(string userId, string blobName) + public Task DeleteBlobAsync(string containerName, string blobName) { // Blob storage is disabled, deletion is a no-op (success) return Task.FromResult(true); } - public async Task GenerateSasTokenAsync(string userId, string blobName, TimeSpan expiryTime) + public async Task GenerateSasTokenAsync(string containerName, string blobName, TimeSpan expiryTime) { await Task.CompletedTask; // Return a placeholder SAS token instead of throwing @@ -43,22 +49,22 @@ public async Task GenerateSasTokenAsync(string userId, string blobName, return "null://blob-storage-disabled"; } - public string GetBlobUri(string userId, string blobName) + public string GetBlobUri(string containerName, string blobName) { // Return a placeholder URI for disabled blob storage return "null://blob-storage-disabled"; } - public async Task InitClientAsync(string userId, string containerName) + public async Task InitClientAsync(string containerName) { - _logger.LogWarning("[BlobWriter:Null] InitClientAsync called - UserId: {UserId}, Container: {ContainerName} (NULL implementation active, blob storage disabled)", - userId, containerName); + _logger.LogWarning("[BlobWriter:Null] InitClientAsync called - Container: {ContainerName} (NULL implementation active, blob storage disabled)", + containerName); // Blob storage is not enabled, but this is a valid no-op implementation. await Task.CompletedTask; return true; // Return true to indicate successful initialization (even though it's a no-op) } - public Task ReadBlobAsStreamAsync(string userId, string blobName) + public Task ReadBlobAsStreamAsync(string containerName, string blobName) { // Return an empty stream instead of throwing return Task.FromResult(Stream.Null); @@ -68,5 +74,10 @@ public string GetConnectionInfo() { return "Disabled (NullBlobWriter)"; } + + public void Dispose() + { + // No-op: NullBlobWriter has no resources to release + } } } diff --git a/src/SimpleL7Proxy/Async/BlobStorage/QueuedBlobWriter.cs b/src/SimpleL7Proxy/Async/BlobStorage/QueuedBlobWriter.cs index 99c93503..ec4d922d 100644 --- a/src/SimpleL7Proxy/Async/BlobStorage/QueuedBlobWriter.cs +++ b/src/SimpleL7Proxy/Async/BlobStorage/QueuedBlobWriter.cs @@ -16,7 +16,7 @@ namespace SimpleL7Proxy.Async.BlobStorage internal class QueuedBlobStream : Stream { private readonly MemoryStream _buffer; - private readonly BlobWriteQueue _queue; + private readonly BlobWorkerPump _queue; private readonly string _containerName; private readonly string _blobName; private readonly ILogger _logger; @@ -24,7 +24,7 @@ internal class QueuedBlobStream : Stream private readonly List> _pendingWrites = new(); public QueuedBlobStream( - BlobWriteQueue queue, + BlobWorkerPump queue, string containerName, string blobName, ILogger logger) @@ -46,14 +46,14 @@ public override long Position set => throw new NotSupportedException(); } - public override void Flush() - { - // Synchronous flush - just ensure buffer is flushed - _buffer.Flush(); - } + // Stream.Flush is abstract; this is a no-op because writes go through FlushAsync + // which is the path that actually enqueues the buffered data. + public override void Flush() { } public override async Task FlushAsync(CancellationToken cancellationToken) { + HarvestCompletedPendingWrites(); + if (_disposed || _buffer.Length == 0) return; @@ -65,32 +65,12 @@ public override async Task FlushAsync(CancellationToken cancellationToken) _buffer.SetLength(0); _buffer.Position = 0; -#if TEST_BLOB_SHUTDOWN - // TEST: Enqueue 100 copies to test shutdown flushing behavior - for (int i = 0; i < 100; i++) - { - var operation = new BlobWriteOperation - { - ContainerName = _containerName, - BlobName = $"{_blobName}-{i}", - Data = new ReadOnlyMemory(data), - Priority = 0 - }; - - await _queue.EnqueueAsync(operation, cancellationToken).ConfigureAwait(false); - _pendingWrites.Add(operation.GetResultAsync()); - } - - _logger.LogTrace( - "[QueuedBlobStream] Enqueued 100 copies ({Size}B each) for {Container}/{Blob}", - data.Length, _containerName, _blobName); -#else + var operation = new BlobWriteOperation { ContainerName = _containerName, BlobName = _blobName, Data = new ReadOnlyMemory(data), - Priority = 0 }; await _queue.EnqueueAsync(operation, cancellationToken).ConfigureAwait(false); @@ -99,7 +79,7 @@ public override async Task FlushAsync(CancellationToken cancellationToken) _logger.LogTrace( "[QueuedBlobStream] Enqueued {Size}B for {Container}/{Blob}", data.Length, _containerName, _blobName); -#endif + } public override void Write(byte[] buffer, int offset, int count) @@ -133,6 +113,8 @@ public override async ValueTask WriteAsync(ReadOnlyMemory buffer, Cancella /// public async Task WaitForPendingWritesAsync(CancellationToken cancellationToken = default) { + HarvestCompletedPendingWrites(); + if (_pendingWrites.Count == 0) return; @@ -144,6 +126,17 @@ public async Task WaitForPendingWritesAsync(CancellationToken cancellationToken _pendingWrites.Clear(); } + private void HarvestCompletedPendingWrites() + { + for (int i = _pendingWrites.Count - 1; i >= 0; i--) + { + if (_pendingWrites[i].IsCompletedSuccessfully) + { + _pendingWrites.RemoveAt(i); + } + } + } + public override int Read(byte[] buffer, int offset, int count) => throw new NotSupportedException("QueuedBlobStream does not support reading."); @@ -160,11 +153,16 @@ protected override void Dispose(bool disposing) if (disposing) { - // Synchronously flush on dispose - this will queue the operation - // The caller should have called FlushAsync before disposing ideally + // Sync Dispose path: do NOT flush here. Flushing would block on + // _queue.EnqueueAsync via sync-over-async, risking thread-pool + // starvation under load. All production callers go through + // DisposeAsync (await using). Any data still in the buffer at + // this point is dropped — fail-fast for the misuse case. if (_buffer.Length > 0) { - FlushAsync(CancellationToken.None).GetAwaiter().GetResult(); + _logger.LogWarning( + "[QueuedBlobStream] Sync Dispose called with {Bytes}B unflushed for {Container}/{Blob}; data discarded — use await using instead.", + _buffer.Length, _containerName, _blobName); } _buffer.Dispose(); } @@ -198,75 +196,69 @@ public override async ValueTask DisposeAsync() public class QueuedBlobWriter : IBlobWriter { private readonly IBlobWriter _underlyingWriter; - private readonly BlobWriteQueue _queue; + private readonly BlobWorkerPump _queue; private readonly ILogger _logger; - private readonly bool _useQueueForWrites; public QueuedBlobWriter( - IBlobWriter underlyingWriter, - BlobWriteQueue queue, - ILogger logger, - bool useQueueForWrites = true) + IBlobWriterFactory blobWriterFactory, + BlobWorkerPump queue, + ILogger logger) { - _underlyingWriter = underlyingWriter ?? throw new ArgumentNullException(nameof(underlyingWriter)); + _underlyingWriter = blobWriterFactory?.CreateBlobWriter() ?? throw new ArgumentNullException(nameof(blobWriterFactory)); _queue = queue ?? throw new ArgumentNullException(nameof(queue)); _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - _useQueueForWrites = useQueueForWrites; _logger.LogInformation( - "[QueuedBlobWriter] Initialized - QueuedWrites: {UseQueue}, Underlying: {UnderlyingType}", - _useQueueForWrites, _underlyingWriter.GetType().Name); + "[QueuedBlobWriter] Initialized - Underlying: {UnderlyingType}", + _underlyingWriter.GetType().Name); } /// - /// Creates a blob output stream. If queuing is enabled, returns a QueuedBlobStream - /// that buffers writes and enqueues them. Otherwise, returns the direct stream. + /// Returns a QueuedBlobStream that buffers writes and enqueues them on FlushAsync. /// - public async Task CreateBlobAndGetOutputStreamAsync(string userId, string blobName) + public async Task CreateBlobAndGetOutputStreamAsync(string containerName, string blobName, CancellationToken cancellationToken = default) { - if (_useQueueForWrites) - { - // Ensure container is initialized first - await _underlyingWriter.InitClientAsync(userId, userId).ConfigureAwait(false); + // Ensure container is initialized first + await _underlyingWriter.InitClientAsync(containerName).ConfigureAwait(false); - // Return a queued stream that buffers and enqueues writes - _logger.LogTrace( - "[QueuedBlobWriter] Creating queued stream for {Container}/{Blob}", - userId, blobName); + _logger.LogTrace( + "[QueuedBlobWriter] Creating queued stream for {Container}/{Blob}", + containerName, blobName); - return new QueuedBlobStream(_queue, userId, blobName, _logger); - } - else - { - // Pass through to underlying writer - return await _underlyingWriter.CreateBlobAndGetOutputStreamAsync(userId, blobName) - .ConfigureAwait(false); - } + return new QueuedBlobStream(_queue, containerName, blobName, _logger); } // Pass-through methods - these don't benefit from queuing - public Task BlobExistsAsync(string userId, string blobName) => - _underlyingWriter.BlobExistsAsync(userId, blobName); + public Task UploadBlobAsync(string containerName, string blobName, ReadOnlyMemory data, CancellationToken cancellationToken = default) => + _underlyingWriter.UploadBlobAsync(containerName, blobName, data, cancellationToken); + + public Task BlobExistsAsync(string containerName, string blobName) => + _underlyingWriter.BlobExistsAsync(containerName, blobName); - public Task ReadBlobAsStreamAsync(string userId, string blobName) => - _underlyingWriter.ReadBlobAsStreamAsync(userId, blobName); + public Task ReadBlobAsStreamAsync(string containerName, string blobName) => + _underlyingWriter.ReadBlobAsStreamAsync(containerName, blobName); - public Task DeleteBlobAsync(string userId, string blobName) => - _underlyingWriter.DeleteBlobAsync(userId, blobName); + public Task DeleteBlobAsync(string containerName, string blobName) => + _underlyingWriter.DeleteBlobAsync(containerName, blobName); - public Task GenerateSasTokenAsync(string userId, string blobName, TimeSpan expiryTime) => - _underlyingWriter.GenerateSasTokenAsync(userId, blobName, expiryTime); + // public Task GenerateSasTokenAsync(string containerName, string blobName, TimeSpan expiryTime) => + // _underlyingWriter.GenerateSasTokenAsync(containerName, blobName, expiryTime); - public string GetBlobUri(string userId, string blobName) => - _underlyingWriter.GetBlobUri(userId, blobName); + public string GetBlobUri(string containerName, string blobName) => + _underlyingWriter.GetBlobUri(containerName, blobName); - public Task InitClientAsync(string userId, string containerName) => - _underlyingWriter.InitClientAsync(userId, containerName); + public Task InitClientAsync(string containerName) => + _underlyingWriter.InitClientAsync(containerName); public bool IsInitialized => _underlyingWriter.IsInitialized; public string GetConnectionInfo() => _underlyingWriter.GetConnectionInfo() + " (Queued)"; + + public void Dispose() + { + _underlyingWriter?.Dispose(); + } } } diff --git a/src/SimpleL7Proxy/Async/Feeder/AsyncFeeder.cs b/src/SimpleL7Proxy/Async/Feeder/AsyncFeeder.cs index ac3fef26..1b1cbd44 100644 --- a/src/SimpleL7Proxy/Async/Feeder/AsyncFeeder.cs +++ b/src/SimpleL7Proxy/Async/Feeder/AsyncFeeder.cs @@ -124,7 +124,7 @@ public Task StopAsync(CancellationToken cancellationToken) { isShuttingDown = true; _cancellationTokenSource?.Cancel(); - _logger.LogInformation("[SHUTDOWN] ⏹ AsyncFeeder shutting down"); + _logger.LogInformation("[SHUTDOWN] ⏹ AsyncFeeder shutting down"); return readerTask ?? Task.CompletedTask; } diff --git a/src/SimpleL7Proxy/Async/Feeder/NormalRequest.cs b/src/SimpleL7Proxy/Async/Feeder/NormalRequest.cs index f1598fdd..896337aa 100644 --- a/src/SimpleL7Proxy/Async/Feeder/NormalRequest.cs +++ b/src/SimpleL7Proxy/Async/Feeder/NormalRequest.cs @@ -27,17 +27,17 @@ public class NormalRequest : IRequestProcessor private readonly ProxyConfig _options; private readonly ILogger _logger; private readonly IRequestDataBackupService _backupService; - private readonly IAsyncWorkerFactory _asyncWorkerFactory; + private readonly AsyncWorkerContext _asyncWorkerContext; public NormalRequest(IOptions options, IRequestDataBackupService backupService, - IAsyncWorkerFactory asyncWorkerFactory, + AsyncWorkerContext asyncWorkerContext, ILogger logger) { _options = options.Value; _backupService = backupService; - _asyncWorkerFactory = asyncWorkerFactory; + _asyncWorkerContext = asyncWorkerContext; _logger = logger; } @@ -66,7 +66,7 @@ private async Task DataFromBlob(RequestData request) _logger.LogDebug("Creating async worker for request {Guid} URL: {FullURL} UserId: {UserID} ", request.Guid, request.FullURL, request.UserID); - request.asyncWorker = await _asyncWorkerFactory.CreateAsync(request, 0).ConfigureAwait(false); + request.asyncWorker = new AsyncWorker(request, 0, _asyncWorkerContext); // let asyncworker restore the blob streams await request.asyncWorker.PrepareResponseStreamsAsync(); diff --git a/src/SimpleL7Proxy/Async/Feeder/OpenAIBackgroundRequest.cs b/src/SimpleL7Proxy/Async/Feeder/OpenAIBackgroundRequest.cs index d2821ba8..06063645 100644 --- a/src/SimpleL7Proxy/Async/Feeder/OpenAIBackgroundRequest.cs +++ b/src/SimpleL7Proxy/Async/Feeder/OpenAIBackgroundRequest.cs @@ -27,17 +27,17 @@ public class OpenAIBackgroundRequest : IRequestProcessor private readonly ProxyConfig _options; private readonly ILogger _logger; private readonly IRequestDataBackupService _backupService; - private readonly IAsyncWorkerFactory _asyncWorkerFactory; + private readonly AsyncWorkerContext _asyncWorkerContext; public OpenAIBackgroundRequest(IOptions options, IRequestDataBackupService backupService, - IAsyncWorkerFactory asyncWorkerFactory, + AsyncWorkerContext asyncWorkerContext, ILogger logger) { _options = options.Value; _backupService = backupService; - _asyncWorkerFactory = asyncWorkerFactory; + _asyncWorkerContext = asyncWorkerContext; _logger = logger; } @@ -66,7 +66,7 @@ public async Task HydrateRequestAsync(RequestData request) request.IsBackgroundCheck = true; request.runAsync = true; request.AsyncTriggered = true; - request.asyncWorker = await _asyncWorkerFactory.CreateAsync(request, 0).ConfigureAwait(false); + request.asyncWorker = new AsyncWorker(request, 0, _asyncWorkerContext); // Initialize for background check - blobs will be created lazily when first written to await request.asyncWorker.InitializeForBackgroundCheck(); diff --git a/src/SimpleL7Proxy/Async/IAsyncFileStore.cs b/src/SimpleL7Proxy/Async/IAsyncFileStore.cs new file mode 100644 index 00000000..b448b6f0 --- /dev/null +++ b/src/SimpleL7Proxy/Async/IAsyncFileStore.cs @@ -0,0 +1,47 @@ +using System.IO; + +namespace SimpleL7Proxy.Async; + +/// +/// Store for SMALL one-shot blobs (response headers, status messages, server-side request +/// backups). Writes flow through the BlobWriteQueue and are uploaded with a single PUT +/// (BlockBlobClient.UploadAsync, 1 round-trip). Suitable only for payloads that fit +/// comfortably in memory. +/// +/// The container is the unit of isolation. Callers resolve user/tenant → container name +/// before calling; this layer is user-agnostic. Use for +/// system/server-scoped blobs. +/// +/// For potentially large streamed payloads (response bodies that may be GBs) use +/// instead — it bypasses the queue and streams blocks +/// directly to storage so memory stays bounded. +/// +public interface IAsyncFileStore +{ + /// Ensures the container exists. Idempotent and single-flight per container name. + Task InitializeClientAsync(string containerName); + + (string dataBlobUri, string headerBlobUri) GetBlobUriPair( + string containerName, string dataBlobName, string headerBlobName); + + // Task<(string dataBlobUri, string headerBlobUri)> GenerateSasTokenPairAsync( + // string containerName, string dataBlobName, string headerBlobName, TimeSpan expiry); + + /// + /// Writes a fully-materialized payload to the specified blob in a single PUT + /// (BlockBlobClient.UploadAsync, 1 round-trip). Bypasses the BlobWriteQueue for + /// minimum latency. Prefer this when the bytes are already in memory. + /// + Task WriteAsync(string containerName, string blobName, ReadOnlyMemory data, CancellationToken cancellationToken = default); + + /// Opens a write stream for a small blob. Backed by the BlobWriteQueue. + Task OpenWriteStreamAsync(string containerName, string blobName); + + /// Flushes and waits for any queued writes on the supplied stream to complete. + Task CompleteWriteStreamAsync(Stream? stream, CancellationToken cancellationToken = default); + + Task BlobExistsAsync(string containerName, string blobName); + Task ReadBlobAsStreamAsync(string containerName, string blobName); + Task DeleteBlobAsync(string containerName, string blobName); +} + diff --git a/src/SimpleL7Proxy/Async/IAsyncStreamingStore.cs b/src/SimpleL7Proxy/Async/IAsyncStreamingStore.cs new file mode 100644 index 00000000..39fe7938 --- /dev/null +++ b/src/SimpleL7Proxy/Async/IAsyncStreamingStore.cs @@ -0,0 +1,25 @@ +using System.IO; + +namespace SimpleL7Proxy.Async; + +/// +/// Store for LARGE/streamed blobs (response bodies that may run to gigabytes). Writes go +/// directly to BlobClient.OpenWriteAsync, bypassing the BlobWriteQueue entirely so +/// the SDK's transfer buffer (~4 MiB by default) is the only memory used regardless of +/// total payload size. +/// +/// For small one-shot blobs (headers, status messages) use +/// instead — it batches and dedups through the queue and uploads with a single PUT. +/// +/// Container init and SAS/URI generation are owned by ; +/// the streaming store's only responsibility is producing a write stream. +/// +public interface IAsyncStreamingStore +{ + /// + /// Opens a streaming write stream for a blob in the given container. The returned + /// stream stages blocks to storage as data is written; the blob is committed when + /// the stream is disposed. + /// + Task OpenWriteStreamAsync(string containerName, string blobName, CancellationToken cancellationToken = default); +} diff --git a/src/SimpleL7Proxy/Async/ServiceBus/ServiceBusRequestService.cs b/src/SimpleL7Proxy/Async/ServiceBus/ServiceBusRequestService.cs index 0ccb01cf..cbcc91d9 100644 --- a/src/SimpleL7Proxy/Async/ServiceBus/ServiceBusRequestService.cs +++ b/src/SimpleL7Proxy/Async/ServiceBus/ServiceBusRequestService.cs @@ -1,6 +1,5 @@ using System; -using System.IO; -using System.Net.Http; +using System.Linq; using System.Text.Json; using System.Threading.Tasks; using System.Collections.Concurrent; @@ -10,26 +9,25 @@ using Azure.Messaging.ServiceBus; using SimpleL7Proxy.Config; +using SimpleL7Proxy.Messaging; namespace SimpleL7Proxy.Async.ServiceBus { - public class ServiceBusRequestService : IHostedService, IServiceBusRequestService + public class ServiceBusRequestService : IHostedService, IServiceBusRequestService, IBatchMessageTransport { private readonly ProxyConfig _options; private readonly IServiceBusFactory _senderFactory; private readonly ILogger _logger; - public static readonly ConcurrentQueue _statusQueue = new ConcurrentQueue(); - private readonly SemaphoreSlim _queueSignal = new SemaphoreSlim(0); + private readonly IBatchMessageTransport _batchTransport; + private readonly ConcurrentDictionary> _topicPumps = new(StringComparer.OrdinalIgnoreCase); private bool isRunning = false; private bool isShuttingDown = false; - private Task? writerTask; - CancellationTokenSource? _cancellationTokenSource; - // Batch tuning - private const int MaxDrainPerCycle = 50; // max messages to drain from queue per cycle - private static readonly TimeSpan FlushIntervalMs = TimeSpan.FromMilliseconds(1000); // small delay to coalesce bursts (when not shutting down) + private const int MaxDrainPerCycle = 50; + private const int FlushCountThreshold = 10; + private static readonly TimeSpan FlushIntervalMs = TimeSpan.FromSeconds(2); // Performance tracking private int _totalMessagesProcessed = 0; @@ -39,7 +37,8 @@ public ServiceBusRequestService(IOptions options, IServiceBusFactor { _options = options.Value; _senderFactory = senderFactory ?? throw new ArgumentNullException(nameof(senderFactory)); - _logger = logger; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + _batchTransport = this; } public Task StartAsync(CancellationToken cancellationToken) @@ -47,18 +46,7 @@ public Task StartAsync(CancellationToken cancellationToken) if (_options.AsyncModeEnabled) { _logger.LogInformation("[SERVICE] ✓ ServiceBusRequestService starting..."); - _cancellationTokenSource = new CancellationTokenSource(); - _cancellationTokenSource.Token.Register(() => - { - _logger.LogDebug("[SHUTDOWN] ServiceBusRequestService cancellation token triggered"); - }); - isRunning = true; - - // Start the writer task but DON'T await it - writerTask = Task.Run(() => EventWriter(_cancellationTokenSource.Token), _cancellationTokenSource.Token); - - // Return immediately - let the writer task run in the background } return Task.CompletedTask; @@ -66,261 +54,146 @@ public Task StartAsync(CancellationToken cancellationToken) public bool updateStatus(RequestData message) { + if (!isRunning || isShuttingDown) + { + return false; + } + try { + var topicName = string.IsNullOrWhiteSpace(message.SBTopicName) ? "status" : message.SBTopicName; + var pump = _topicPumps.GetOrAdd(topicName, static (key, state) => + new BatchMessagePump( + destination: key, + transport: state.Transport, + createBatchAsync: cancellationToken => state.Transport.CreateBatchAsync(key, cancellationToken), + recoverBatchAsync: cancellationToken => state.Transport.CreateBatchAsync(key, cancellationToken), + options: new BatchMessagePumpOptions + { + MaxBatchItems = MaxDrainPerCycle, + FlushCountThreshold = FlushCountThreshold, + FlushInterval = FlushIntervalMs, + WaitThreshold = state.WaitThreshold, + ShutdownDrainTimeout = TimeSpan.FromSeconds(30), + }), + (Transport: _batchTransport, WaitThreshold: _options.MaxUndrainedEvents / 4)); + + pump.StartAsync(CancellationToken.None).GetAwaiter().GetResult(); + _logger.LogDebug("[ServiceBus:{Guid}] Status update enqueued - UserId: {UserId}, Status: {Status}, Topic: {TopicName}, QueueDepth: {QueueCount}", - message.Guid, message.MID, message.SBStatus, message.SBTopicName, _statusQueue.Count + 1); - _statusQueue.Enqueue(new ServiceBusStatusMessage(message.Guid, message.SBTopicName, message.SBStatus.ToString())); - _queueSignal.Release(); + message.Guid, message.MID, message.SBStatus, topicName, GetQueueDepth() + 1); + + pump.Enqueue(new BatchMessageEnvelope(topicName, JsonSerializer.Serialize(new ServiceBusStatusMessage(message.Guid, topicName, message.SBStatus.ToString())))); return true; // Enqueue succeeded } catch (Exception ex) { - _logger.LogError(ex, "[ServiceBus:{Guid}] Failed to enqueue status update - UserId: {UserId}, Status: {Status}", - message.Guid, message.MID, message.SBStatus); + _logger.LogError(ex, "[ServiceBus:{Guid}] Failed to enqueue status update - UserId: {UserId}, Status: {Status}, Topic: {Topic}, Error: {Error}", + message.Guid, message.MID, message.SBStatus, message.SBTopicName, ex.Message); return false; // Enqueue failed } } - public async Task EventWriter(CancellationToken token) + public (int totalMessages, int totalBatches, int queueDepth, bool isEnabled, string? connectionInfo) GetStatistics() { - - _logger.LogInformation("[SERVICE] ✓ ServiceBus writer service starting..."); - - try - { - await Task.Run(() => FeederTask(token), token).ConfigureAwait(false); - } - catch (TaskCanceledException) - { - // Task was canceled, exit gracefully - _logger.LogInformation("ServiceBusRequestService task was canceled."); - } - catch (OperationCanceledException) - { - // Operation was canceled, exit gracefully - } - catch (UnauthorizedAccessException) + string? connectionInfo = null; + + if (_options.AsyncModeEnabled && _senderFactory != null) { - _logger.LogError("ServiceBusRequestService encountered an UnauthorizedAccessException. Check Service Bus connection string and permissions."); + // Get connection info from the factory (namespace endpoint) + connectionInfo = _senderFactory.GetConnectionInfo(); } - catch (Exception ex) + + return ( + totalMessages: _totalMessagesProcessed, + totalBatches: _totalBatchesSent, + queueDepth: GetQueueDepth(), + isEnabled: _options.AsyncModeEnabled && isRunning, + connectionInfo: connectionInfo + ); + } + + public async Task StopAsync(CancellationToken cancellationToken) + { + _ = cancellationToken; + if (!isRunning) { - _logger.LogError(ex, "An error occurred while sending a message to the topic.: " + ex); + return; } - finally - { - // Flush all items in batches - using var ctsSource = new CancellationTokenSource(); - var cts = ctsSource.Token; - var drained = new List(); - while (_statusQueue.TryDequeue(out var statusMessage)) - { - drained.Add(statusMessage); - } + isShuttingDown = true; - if (drained.Count > 0) - { - var byTopic = GroupByTopic(drained); - foreach (var kvp in byTopic) - { - try - { - await SendBatchesForTopicAsync(kvp.Key, kvp.Value, cts).ConfigureAwait(false); - } - catch (Exception e) - { - _logger.LogError(e, "Error while flushing service bus. Continuing."); - } - } - } + foreach (var pump in _topicPumps.Values) + { + pump.BeginShutdown(); } - _logger.LogInformation("[SHUTDOWN] ✓ ServiceBusRequestService stopped"); - } - - DateTime _lastDrainTime = DateTime.UtcNow; - - private async Task FeederTask(CancellationToken token) - { - var drained = new List(MaxDrainPerCycle); - while (!isShuttingDown || !_statusQueue.IsEmpty) + foreach (var kvp in _topicPumps) { - // don't repeat this loop more than once every FlushIntervalMs unless we are shutting down - var delta = DateTime.UtcNow - _lastDrainTime; - if (delta < FlushIntervalMs && !isShuttingDown && !token.IsCancellationRequested) + if (kvp.Value.Count > 0) { - delta = FlushIntervalMs - delta; - await Task.Delay(delta, token).ConfigureAwait(false); + _logger.LogInformation("[SHUTDOWN] ⏳ ServiceBusRequestService - topic {TopicName} has {Events} events to flush", kvp.Key, kvp.Value.Count); } - _lastDrainTime = DateTime.UtcNow; - - // Drain all available items before waiting - while (_statusQueue.TryDequeue(out var statusMessage)) - { - // Process item (e.g., add to batch, send, etc.) - drained.Add(statusMessage); - if (drained.Count >= MaxDrainPerCycle) - { - var byTopic = GroupByTopic(drained); - - foreach (var kvp in byTopic) - { - try - { - await SendBatchesForTopicAsync(kvp.Key, kvp.Value, token).ConfigureAwait(false); - _totalMessagesProcessed += kvp.Value.Count; - _totalBatchesSent++; - } - catch (ArgumentException ex) - { - _logger.LogError(ex, "[ServiceBus:FeederTask] Error sending batch to topic {TopicName} with {MessageCount} messages", - kvp.Key, kvp.Value.Count); - } - } - drained.Clear(); - } - } - - // If any remain after draining, process them - if (drained.Count > 0) - { - var byTopic = GroupByTopic(drained); - - foreach (var kvp in byTopic) - { - await SendBatchesForTopicAsync(kvp.Key, kvp.Value, token).ConfigureAwait(false); - _totalMessagesProcessed += kvp.Value.Count; - _totalBatchesSent++; - } - drained.Clear(); + } - // Log performance metrics periodically - if (_totalMessagesProcessed % 100 == 0 && _totalMessagesProcessed > 0) - { - _logger.LogInformation("[ServiceBus:Performance] Processed {TotalMessages} messages in {TotalBatches} batches, Current queue: {QueueCount}", - _totalMessagesProcessed, _totalBatchesSent, _statusQueue.Count); - } - } + await Task.WhenAll(_topicPumps.Values.Select(static pump => pump.StopAsync())).ConfigureAwait(false); - // Now wait for a signal before next round - while (_statusQueue.IsEmpty && !token.IsCancellationRequested) - { - await _queueSignal.WaitAsync(token).ConfigureAwait(false); - } - if (token.IsCancellationRequested) - { - _logger.LogDebug("[ServiceBus:FeederTask] Cancellation requested - QueueRemaining: {QueueCount}", _statusQueue.Count); - } - else - { - var timeSinceLastDrain = (DateTime.UtcNow - _lastDrainTime).TotalMilliseconds; - _logger.LogDebug("[ServiceBus:FeederTask] Woke up - Queue: {QueueCount}, IdleTime: {IdleTime}ms", - _statusQueue.Count, timeSinceLastDrain); - } + isRunning = false; + _logger.LogInformation("[SHUTDOWN] ⏹ ServiceBusRequestService stopped"); + } - } + Task IBatchMessageTransport.OpenAsync(CancellationToken cancellationToken) + { + _ = cancellationToken; + return Task.CompletedTask; } - private static Dictionary> GroupByTopic(List items) + ValueTask IBatchMessageTransport.CreateBatchAsync(string destination, CancellationToken cancellationToken) { - var map = new Dictionary>(StringComparer.OrdinalIgnoreCase); - foreach (var item in items) - { - if (!map.TryGetValue(item.topicName, out var list)) - { - list = new List(); - map[item.topicName] = list; - } - list.Add(item); - } - return map; + return _senderFactory.GetSender(destination).CreateMessageBatchAsync(cancellationToken); } - private async Task SendBatchesForTopicAsync(string topicName, List items, CancellationToken token) + bool IBatchMessageTransport.TryAdd(ServiceBusMessageBatch batch, BatchMessageEnvelope message) { - var sender = _senderFactory.GetSender(topicName); - ServiceBusMessageBatch? currentBatch = null; - int batchesSent = 0; - - try + var wasEmpty = batch.Count == 0; + var serviceBusMessage = new ServiceBusMessage(message.Payload); + var added = batch.TryAddMessage(serviceBusMessage); + if (!added && wasEmpty) { - currentBatch = await sender.CreateMessageBatchAsync(token).ConfigureAwait(false); - - foreach (var item in items) - { - var message = new ServiceBusMessage(JsonSerializer.Serialize(item)); - - if (!currentBatch.TryAddMessage(message)) - { - // Send the full batch and start a new one - await sender.SendMessagesAsync(currentBatch, token).ConfigureAwait(false); - batchesSent++; - currentBatch.Dispose(); - currentBatch = await sender.CreateMessageBatchAsync(token).ConfigureAwait(false); + _logger.LogError("[ServiceBus:Batch] Message too large for topic {TopicName}. Dropping message.", message.Destination); + } - if (!currentBatch.TryAddMessage(message)) - { - // Single message too large for an empty batch - _logger.LogError("[ServiceBus:Batch] Message too large for topic {TopicName}, Guid: {Guid}. Dropping message.", - topicName, item.RequestGuid); - } - } - } + return added; + } - if (currentBatch.Count > 0) - { - await sender.SendMessagesAsync(currentBatch, token).ConfigureAwait(false); - batchesSent++; - } + int IBatchMessageTransport.GetCount(ServiceBusMessageBatch batch) + { + return batch.Count; + } - _logger.LogTrace("[ServiceBus:Batch] Sent {MessageCount} messages in {BatchCount} batches to topic {TopicName}", - items.Count, batchesSent, topicName); - } - finally - { - currentBatch?.Dispose(); - } + async Task IBatchMessageTransport.SendAsync(string destination, ServiceBusMessageBatch batch, CancellationToken cancellationToken) + { + await _senderFactory.GetSender(destination).SendMessagesAsync(batch, cancellationToken).ConfigureAwait(false); + Interlocked.Add(ref _totalMessagesProcessed, batch.Count); + Interlocked.Increment(ref _totalBatchesSent); + _logger.LogTrace("[ServiceBus:Batch] Sent {MessageCount} messages to topic {TopicName}", batch.Count, destination); } - public (int totalMessages, int totalBatches, int queueDepth, bool isEnabled, string? connectionInfo) GetStatistics() + void IBatchMessageTransport.DisposeBatch(ServiceBusMessageBatch batch) { - string? connectionInfo = null; - - if (_options.AsyncModeEnabled && _senderFactory != null) - { - // Get connection info from the factory (namespace endpoint) - connectionInfo = _senderFactory.GetConnectionInfo(); - } - - return ( - totalMessages: _totalMessagesProcessed, - totalBatches: _totalBatchesSent, - queueDepth: _statusQueue.Count, - isEnabled: _options.AsyncModeEnabled && isRunning, - connectionInfo: connectionInfo - ); + batch.Dispose(); } - public async Task StopAsync(CancellationToken cancellationToken) + Task IBatchMessageTransport.CloseAsync(CancellationToken cancellationToken) { - isShuttingDown = true; - if (isRunning) - { - _logger.LogInformation("[SHUTDOWN] ⏹ ServiceBusRequestService stopping"); - _logger.LogInformation("[SHUTDOWN] ⏳ ServiceBusRequestService flushing {events} events before stopping", _statusQueue.Count); - while (isRunning && _statusQueue.Count > 0) - { - await Task.Delay(100).ConfigureAwait(false); - } + _ = cancellationToken; + return Task.CompletedTask; + } - _cancellationTokenSource?.Cancel(); - isRunning = false; - if (writerTask != null) - await writerTask.ConfigureAwait(false); - } + private int GetQueueDepth() + { + return _topicPumps.Values.Sum(static pump => pump.Count); } } } \ No newline at end of file diff --git a/src/SimpleL7Proxy/Async/TemplateLoader.cs b/src/SimpleL7Proxy/Async/TemplateLoader.cs new file mode 100644 index 00000000..e2205b13 --- /dev/null +++ b/src/SimpleL7Proxy/Async/TemplateLoader.cs @@ -0,0 +1,229 @@ +using System.Collections.Frozen; +using System.Text.Json; + +using Microsoft.Extensions.Hosting; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; + +using SimpleL7Proxy.Async.BackupAPI; +using SimpleL7Proxy.Async.BlobStorage; +using SimpleL7Proxy.Config; +using SimpleL7Proxy.Async.ServiceBus; +using SimpleL7Proxy.Proxy; +using SimpleL7Proxy.User; + +namespace SimpleL7Proxy.Async; + +/// +/// One-shot hosted service that wires up static references +/// for async-mode processing and loads canned message templates from blob storage. +/// Runs during the hosted-service startup phase so that all dependencies are ready +/// before Server and WorkerFactory begin accepting traffic. +/// +public sealed class TemplateLoader : IHostedService +{ + private const string TemplatesContainer = "templates"; + + /// + /// Mapping of message kind → blob name within the templates container. + /// + private static readonly FrozenDictionary s_blobNames = + new Dictionary + { + [AsyncResponseTypeEnum.Welcome] = "welcome.json", + [AsyncResponseTypeEnum.NotReady] = "notready.json", + [AsyncResponseTypeEnum.NotAuthorized] = "notauthorized.json", + }.ToFrozenDictionary(); + + private readonly IServiceBusRequestService _serviceBusRequestService; + private readonly IBackupAPIService _backupAPIService; + private readonly IUserPriorityService _userPriorityService; + private readonly IBlobWriter _blobWriter; + private readonly ProxyConfig _options; + private readonly ILogger _logger; + + private readonly Dictionary _templates = new(); + + public TemplateLoader( + IServiceBusRequestService serviceBusRequestService, + IBackupAPIService backupAPIService, + IUserPriorityService userPriorityService, + IBlobWriter blobWriter, + IOptions options, + ILogger logger) + { + _serviceBusRequestService = serviceBusRequestService; + _backupAPIService = backupAPIService; + _userPriorityService = userPriorityService; + _blobWriter = blobWriter; + _options = options.Value; + _logger = logger; + } + + private static readonly JsonSerializerOptions s_jsonOptions = new() + { + PropertyNameCaseInsensitive = true, + }; + + /// + /// Returns a rendered for . + /// Placeholder substitution (%GUID%, %MID%, %TIMESTAMP%, + /// %USERID%, %DATA_BLOB_URI%, %HEADER_BLOB_URI%) is applied + /// only to the field. All other fields are + /// taken from the caller-supplied values when provided, otherwise from the + /// loaded template literal. + /// + public AsyncMessage GetMergedMessage( + AsyncResponseTypeEnum kind, + string guid, + string mid, + string? userId = null, + string? dataBlobUri = null, + string? headerBlobUri = null) + { + if (!_templates.TryGetValue(kind, out var template)) + throw new InvalidOperationException($"Template for {kind} was not loaded."); + + return new AsyncMessage + { + Message = SubstituteMessage(template.Message, guid, mid, userId, dataBlobUri, headerBlobUri), + UserId = userId ?? template.UserId ?? string.Empty, + MID = mid ?? template.MID ?? string.Empty, + Guid = guid ?? template.Guid ?? string.Empty, + Status = template.Status, + Timestamp = DateTime.UtcNow, + DataBlobUri = dataBlobUri ?? template.DataBlobUri ?? string.Empty, + HeaderBlobUri = headerBlobUri ?? template.HeaderBlobUri ?? string.Empty, + }; + } + + private static string SubstituteMessage( + string message, + string? guid, string? mid, string? userId, + string? dataBlobUri, string? headerBlobUri) + { + if (string.IsNullOrEmpty(message)) return string.Empty; + if (message.IndexOf('%') < 0) return message; // fast path + + var sb = new System.Text.StringBuilder(message.Length + 64); + sb.Append(message); + + if (guid is { Length: > 0 } && message.Contains("%GUID%")) sb.Replace("%GUID%", guid); + if (mid is { Length: > 0 } && message.Contains("%MID%")) sb.Replace("%MID%", mid); + if (userId is { Length: > 0 } && message.Contains("%USERID%")) sb.Replace("%USERID%", userId); + if (dataBlobUri is { Length: > 0 } && message.Contains("%DATA_BLOB_URI%")) sb.Replace("%DATA_BLOB_URI%", dataBlobUri); + if (headerBlobUri is { Length: > 0 } && message.Contains("%HEADER_BLOB_URI%")) sb.Replace("%HEADER_BLOB_URI%", headerBlobUri); + if (message.Contains("%TIMESTAMP%")) sb.Replace("%TIMESTAMP%", DateTime.UtcNow.ToString("o")); + + return sb.ToString(); + } + + public async Task StartAsync(CancellationToken cancellationToken) + { + RequestData.InitializeServiceBusRequestService( + _serviceBusRequestService, + _backupAPIService, + _userPriorityService, + _options); + + _logger.LogInformation("[STARTUP] ✓ RequestData async statics initialized"); + + await LoadAllTemplatesAsync(cancellationToken).ConfigureAwait(false); + } + + public Task StopAsync(CancellationToken cancellationToken) => Task.CompletedTask; + + private async Task LoadAllTemplatesAsync(CancellationToken cancellationToken) + { + if (!_blobWriter.IsInitialized) + { + _logger.LogWarning("[STARTUP] BlobWriter not initialized; skipping load of '{Container}' templates and disabling async mode", + TemplatesContainer); + _options.AsyncModeEnabled = false; + return; + } + + try + { + await _blobWriter.InitClientAsync(TemplatesContainer).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError("[STARTUP] Failed to initialize templates container '{Container}': {Message}. Disabling async mode.", + TemplatesContainer, ex.Message); + _options.AsyncModeEnabled = false; + return; + } + + var loadTasks = s_blobNames + .Select(kvp => LoadTemplateAsync(kvp.Key, kvp.Value, cancellationToken)) + .ToList(); + var results = await Task.WhenAll(loadTasks).ConfigureAwait(false); + var successfulTemplates = results.Where(r => r.success).Select(r => r.name).ToList(); + + _logger.LogInformation("[STARTUP] ✓ Loaded {Count} templates from '{Container}': {Templates}", + successfulTemplates.Count, TemplatesContainer, string.Join(", ", successfulTemplates)); + } + + private async Task<(string name, bool success)> LoadTemplateAsync(AsyncResponseTypeEnum kind, string blobName, CancellationToken cancellationToken) + { + try + { + string? body = null; + + if (!await _blobWriter.BlobExistsAsync(TemplatesContainer, blobName).ConfigureAwait(false)) + { + _logger.LogWarning("[STARTUP] Template blob '{Container}/{Blob}' ({Kind}) not found, attempting to load from templates folder", + TemplatesContainer, blobName, kind); + + // Fallback to reading from templates folder + var templatePath = Path.Combine("templates", blobName); + if (File.Exists(templatePath)) + { + body = await File.ReadAllTextAsync(templatePath, cancellationToken).ConfigureAwait(false); + _logger.LogInformation("[STARTUP] Loaded template {Kind} from file system: {Path}", + kind, templatePath); + } + else + { + _logger.LogWarning("[STARTUP] Template file not found at '{Path}' ({Kind})", + templatePath, kind); + return (blobName, false); + } + } + else + { + using var stream = await _blobWriter.ReadBlobAsStreamAsync(TemplatesContainer, blobName).ConfigureAwait(false); + using var reader = new StreamReader(stream); + body = await reader.ReadToEndAsync(cancellationToken).ConfigureAwait(false); + } + + try + { + var parsed = JsonSerializer.Deserialize(body, s_jsonOptions); + if (parsed == null) + { + _logger.LogError("[STARTUP] Template {Kind} from '{Container}/{Blob}' deserialized to null", + kind, TemplatesContainer, blobName); + return (blobName, false); + } + + _templates[kind] = parsed; + return (blobName, true); + } + catch (JsonException ex) + { + _logger.LogError(ex, "[STARTUP] Failed to deserialize template {Kind} from '{Container}/{Blob}'", + kind, TemplatesContainer, blobName); + _logger.LogInformation(ex.StackTrace); + return (blobName, false); + } + } + catch (Exception ex) + { + _logger.LogError(ex, "[STARTUP] Failed to load template {Kind} from '{Container}/{Blob}'", + kind, TemplatesContainer, blobName); + return (blobName, false); + } + } +} diff --git a/src/SimpleL7Proxy/Backend/EndpointMonitorService.cs b/src/SimpleL7Proxy/Backend/EndpointMonitorService.cs index 509aa53e..f6f28f3b 100644 --- a/src/SimpleL7Proxy/Backend/EndpointMonitorService.cs +++ b/src/SimpleL7Proxy/Backend/EndpointMonitorService.cs @@ -103,7 +103,7 @@ public EndpointMonitorService( public Task Stop() { - _logger.LogInformation("[SHUTDOWN] ⏹ Backend health poller stopping"); + _logger.LogInformation("[SHUTDOWN] ⏹ Backend health poller stopping"); _cancellationTokenSource.Cancel(); return ExecuteTask ?? Task.CompletedTask; @@ -177,7 +177,7 @@ private async Task Run() } catch (OperationCanceledException) { - _logger.LogInformation("[SHUTDOWN] ⏹ Backend health poller cancelled — draining"); + _logger.LogInformation("[SHUTDOWN] ⏹ Backend health poller cancelled — draining"); break; } catch (Exception e) @@ -186,7 +186,7 @@ private async Task Run() } } - _logger.LogInformation("[SHUTDOWN] ✓ Backend health poller stopped"); + _logger.LogInformation("[SHUTDOWN] ⏹ Backend health poller stopped"); } catch (Exception ex) { diff --git a/src/SimpleL7Proxy/Config/ConfigParser.cs b/src/SimpleL7Proxy/Config/ConfigParser.cs index 33bfcaa9..b2abb4ad 100644 --- a/src/SimpleL7Proxy/Config/ConfigParser.cs +++ b/src/SimpleL7Proxy/Config/ConfigParser.cs @@ -14,6 +14,8 @@ public static class ConfigParser private static readonly (string envVar, string property)[] SimpleFields = [ ("AsyncBlobWorkerCount", "AsyncBlobWorkerCount"), + ("AsyncBlobMaxQueue", "AsyncBlobMaxQueue"), + ("AsyncStreamingBufferSizeBytes", "AsyncStreamingBufferSizeBytes"), ("AsyncClassNames", "AsyncClassNames"), ("AsyncClientConfigFieldName", "AsyncClientConfigFieldName"), ("AsyncClientRequestHeader", "AsyncClientRequestHeader"), @@ -678,6 +680,17 @@ private static (string connectionString, string accountUri, bool useMI) ParseBlo return (connectionString, accountUri, useMI); } + // Accept a raw Azure Storage connection string copied from the portal, e.g. + // "DefaultEndpointsProtocol=https;AccountName=acct;AccountKey=...;EndpointSuffix=core.windows.net". + // These use ';' as separators (no commas), so the cs/uri/mi composite parser would + // otherwise treat the whole thing as a single positional value. + if (LooksLikeRawConnectionString(config)) + { + connectionString = config.Trim(); + accountUri = TryDeriveBlobEndpointFromConnectionString(connectionString) ?? accountUri; + return (connectionString, accountUri, useMI: false); + } + var parts = config.Split(',').Select(p => p.Trim()).ToArray(); var keyAliases = new Dictionary { @@ -704,6 +717,52 @@ private static (string connectionString, string accountUri, bool useMI) ParseBlo return (connectionString, accountUri, useMI); } + private static bool LooksLikeRawConnectionString(string value) + { + // Heuristic: a raw storage connection string contains semicolon-delimited + // key=value pairs and at least one of the well-known keys. + if (value.IndexOf(';') < 0) return false; + return value.Contains("AccountName=", StringComparison.OrdinalIgnoreCase) + || value.Contains("DefaultEndpointsProtocol=", StringComparison.OrdinalIgnoreCase) + || value.Contains("BlobEndpoint=", StringComparison.OrdinalIgnoreCase) + || value.Contains("SharedAccessSignature=", StringComparison.OrdinalIgnoreCase); + } + + private static string? TryDeriveBlobEndpointFromConnectionString(string connectionString) + { + string? accountName = null; + string? endpointSuffix = null; + string? protocol = null; + string? blobEndpoint = null; + + foreach (var raw in connectionString.Split(';', StringSplitOptions.RemoveEmptyEntries)) + { + var idx = raw.IndexOf('='); + if (idx <= 0) continue; + var key = raw[..idx].Trim(); + var val = raw[(idx + 1)..].Trim(); + + if (key.Equals("AccountName", StringComparison.OrdinalIgnoreCase)) accountName = val; + else if (key.Equals("EndpointSuffix", StringComparison.OrdinalIgnoreCase)) endpointSuffix = val; + else if (key.Equals("DefaultEndpointsProtocol", StringComparison.OrdinalIgnoreCase)) protocol = val; + else if (key.Equals("BlobEndpoint", StringComparison.OrdinalIgnoreCase)) blobEndpoint = val; + } + + if (!string.IsNullOrEmpty(blobEndpoint)) + { + return blobEndpoint!.EndsWith('/') ? blobEndpoint : blobEndpoint + "/"; + } + + if (!string.IsNullOrEmpty(accountName)) + { + var scheme = string.IsNullOrEmpty(protocol) ? "https" : protocol; + var suffix = string.IsNullOrEmpty(endpointSuffix) ? "core.windows.net" : endpointSuffix; + return $"{scheme}://{accountName}.blob.{suffix}/"; + } + + return null; + } + /// /// Creates and assigns an on this /// instance, configured from the transport-related properties (keep-alive, HTTP/2, SSL). diff --git a/src/SimpleL7Proxy/Config/ProxyConfig.cs b/src/SimpleL7Proxy/Config/ProxyConfig.cs index 922d4e57..1f5ffdbd 100644 --- a/src/SimpleL7Proxy/Config/ProxyConfig.cs +++ b/src/SimpleL7Proxy/Config/ProxyConfig.cs @@ -136,9 +136,19 @@ public class ProxyConfig // ── Async ── [ConfigOption("Async:Storage:BlobConfig", ConfigName = "AsyncBlobStorageConfig", Mode = ConfigMode.Cold)] - public string AsyncBlobStorageConfig { get; set; } = "uri=https://mystorageaccount.blob.core.windows.net,mi=true"; + public string AsyncBlobStorageConfig { get; set; } = "";//"uri=https://mystorageaccount.blob.core.windows.net,mi=true"; [ConfigOption("Async:Storage:Workers", ConfigName = "AsyncBlobWorkerCount", Mode = ConfigMode.Cold)] public int AsyncBlobWorkerCount { get; set; } = 2; + [ConfigOption("Async:Storage:MaxQueue", ConfigName = "AsyncBlobMaxQueue", Mode = ConfigMode.Cold)] + public int AsyncBlobMaxQueue { get; set; } = 200; + /// + /// Transfer buffer size used by BlobClient.OpenWriteAsync in the streaming-store + /// path (large/streamed response bodies). Larger buffer = fewer StageBlock round trips + /// but more memory per concurrent worker. 0 = use SDK default (~4 MiB). Typical tuning: + /// 8388608 (8 MiB) or 16777216 (16 MiB) for gigabyte-scale payloads. + /// + [ConfigOption("Async:Storage:StreamingBufferSizeBytes", ConfigName = "AsyncStreamingBufferSizeBytes", Mode = ConfigMode.Cold)] + public long AsyncStreamingBufferSizeBytes { get; set; } = 0; [ConfigOption("Async:ClassNames", ConfigName = "AsyncClassNames", Mode = ConfigMode.Cold)] public string AsyncClassNames { get; set; } = ""; [ConfigOption("Async:Enabled", ConfigName = "AsyncModeEnabled", Mode = ConfigMode.Cold)] diff --git a/src/SimpleL7Proxy/Constants.cs b/src/SimpleL7Proxy/Constants.cs index a3cbf901..6a7dc35c 100644 --- a/src/SimpleL7Proxy/Constants.cs +++ b/src/SimpleL7Proxy/Constants.cs @@ -14,7 +14,7 @@ public static class Constants public const string RoundRobin = "roundrobin"; public const string Random = "random"; public const string Server = "simplel7proxy"; - public const string VERSION = "2.2.10.7"; + public const string VERSION = "2.2.11.0"; public const int AnyPriority = -1; diff --git a/src/SimpleL7Proxy/CoordinatedShutdownService.cs b/src/SimpleL7Proxy/CoordinatedShutdownService.cs index 05f154a5..95e244b7 100644 --- a/src/SimpleL7Proxy/CoordinatedShutdownService.cs +++ b/src/SimpleL7Proxy/CoordinatedShutdownService.cs @@ -35,8 +35,7 @@ public class CoordinatedShutdownService : IHostedService private readonly IEndpointMonitorService _backends; private readonly IAsyncFeeder _asyncFeeder; private readonly IRequeueWorker _requeueWorker; - private readonly BlobWriteQueue? _blobWriteQueue; - private readonly BlobWriter? _blobWriter; + private readonly BlobWorkerPump? _blobWriteQueue; private readonly IEnumerable _shutdownParticipants; private readonly ProbeServer _probeServer; private readonly CompositeEventClient _compositeEventClient; @@ -69,8 +68,7 @@ public CoordinatedShutdownService(IHostApplicationLifetime appLifetime, _asyncFeeder = asyncFeeder; _backendTokenProvider = backendTokenProvider; _requeueWorker = requeueWorker; - _blobWriteQueue = serviceProvider.GetService(); - _blobWriter = serviceProvider.GetService(); + _blobWriteQueue = serviceProvider.GetService(); _shutdownParticipants = serviceProvider.GetServices(); _probeServer = probeServer; _options = backendOptions.Value; @@ -145,7 +143,7 @@ public async Task StopAsync(CancellationToken cancellationToken) } else { - _logger.LogInformation("[SHUTDOWN] ✓ All tasks completed"); + _logger.LogInformation("[SHUTDOWN] ⏹ All tasks completed"); } await _queue!.StopAsync().ConfigureAwait(false); @@ -168,7 +166,7 @@ public async Task StopAsync(CancellationToken cancellationToken) // Same pattern as IHostedService — register as IShutdownParticipant in DI, get discovered here. foreach (var participant in _shutdownParticipants.OrderBy(p => p.ShutdownOrder)) { - _logger.LogInformation("[SHUTDOWN] ⏹ Shutting down {Service} (order {Order})", + _logger.LogInformation("[SHUTDOWN] ⏹ Shutting down {Service} (order {Order})", participant.GetType().Name, participant.ShutdownOrder); await participant.ShutdownAsync(cancellationToken).ConfigureAwait(false); } @@ -189,10 +187,10 @@ public async Task StopAsync(CancellationToken cancellationToken) // are guaranteed to be done at this point, so no more enqueues will happen if (_blobWriteQueue != null) { - _logger.LogInformation("[SHUTDOWN] ⏹ Stopping BlobWriteQueue (final flush)"); + _logger.LogInformation("[SHUTDOWN] ⏹ Stopping BlobWriteQueue (final flush)"); await _blobWriteQueue.StopAsync(CancellationToken.None).ConfigureAwait(false); - // Dispose underlying BlobWriter after the queue has flushed - _blobWriter?.Dispose(); + // Underlying BlobWriter instances (in QueuedBlobWriter and AsyncStreamingStore) + // are DI singletons — the host disposes them on container shutdown. } // Health probes are stopped at the VERY END so the container orchestrator @@ -205,6 +203,7 @@ public async Task StopAsync(CancellationToken cancellationToken) catch (Exception ex) { _logger.LogError(ex, "[SHUTDOWN] ❌ Shutdown failed"); + _logger.LogInformation(ex.StackTrace); } finally { diff --git a/src/SimpleL7Proxy/DTO/RequestDataBackupService.cs b/src/SimpleL7Proxy/DTO/RequestDataBackupService.cs index 687a401b..317a41e6 100644 --- a/src/SimpleL7Proxy/DTO/RequestDataBackupService.cs +++ b/src/SimpleL7Proxy/DTO/RequestDataBackupService.cs @@ -3,24 +3,47 @@ using System.Text.Json.Serialization; using Microsoft.Extensions.Logging; +using SimpleL7Proxy.Async; using SimpleL7Proxy.Async.BlobStorage; namespace SimpleL7Proxy.DTO { public class RequestDataBackupService : IRequestDataBackupService { - private readonly IBlobWriter _blobWriter; + private readonly IAsyncFileStore _requestStore; private readonly ILogger _logger; - public RequestDataBackupService(IBlobWriter blobWriter, ILogger logger) + // Single-flight init for the Server backup container. The storage layer is + // user-agnostic, so this service owns initialization of Constants.Server. + private Task? _serverInitTask; + private readonly object _serverInitLock = new(); + + public RequestDataBackupService(IAsyncFileStore requestStore, ILogger logger) { _logger = logger; _logger.LogDebug("[STARTUP] BackupAPI Service starting"); - _blobWriter = blobWriter; + _requestStore = requestStore; + } + + private Task EnsureServerContainerInitializedAsync() + { + var existing = _serverInitTask; + if (existing != null && !existing.IsFaulted && !existing.IsCanceled) + return existing; + + lock (_serverInitLock) + { + if (_serverInitTask == null || _serverInitTask.IsFaulted || _serverInitTask.IsCanceled) + { + _serverInitTask = _requestStore.InitializeClientAsync(Constants.Server); + } + return _serverInitTask; + } } public async Task RestoreIntoAsync(RequestData rdata) { + await EnsureServerContainerInitializedAsync().ConfigureAwait(false); string blobname = rdata.Guid.ToString(); _logger.LogTrace($"[BLOB-TRACE] BackupService.RestoreIntoAsync | Action: Start | Guid: {rdata.Guid} | Container: {Constants.Server} | Blob: {blobname} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); @@ -28,7 +51,7 @@ public async Task RestoreIntoAsync(RequestData rdata) try { // Console.WriteLine("RequestDataBackupService: Reading blob from " + Constants.Server + " with name " + blobname); - using Stream stream = await _blobWriter.ReadBlobAsStreamAsync(Constants.Server, blobname); + using Stream stream = await _requestStore.ReadBlobAsStreamAsync(Constants.Server, blobname); var streamReader = new StreamReader(stream); var json = await streamReader.ReadToEndAsync(); var data = RequestDataConverter.DeserializeWithVersionHandling(json); @@ -46,10 +69,10 @@ public async Task RestoreIntoAsync(RequestData rdata) // read body bytes if present var bodyBlobName = blobname + ".body"; - if (await _blobWriter.BlobExistsAsync(Constants.Server, bodyBlobName)) + if (await _requestStore.BlobExistsAsync(Constants.Server, bodyBlobName)) { //_logger.LogTrace($"[BLOB-TRACE] BackupService.RestoreIntoAsync | Action: ReadBody | Guid: {rdata.Guid} | Container: {Constants.Server} | Blob: {bodyBlobName} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); - using Stream bodyStream = await _blobWriter.ReadBlobAsStreamAsync(Constants.Server, bodyBlobName); + using Stream bodyStream = await _requestStore.ReadBlobAsStreamAsync(Constants.Server, bodyBlobName); var bodyStreamReader = new StreamReader(bodyStream); var datastr = await bodyStreamReader.ReadToEndAsync(); rdata.setBody(Encoding.UTF8.GetBytes(datastr)); @@ -86,6 +109,7 @@ public async Task RestoreIntoAsync(RequestData rdata) public async Task BackupAsync(RequestData requestData) { + await EnsureServerContainerInitializedAsync().ConfigureAwait(false); var operation = "Creating blob"; //_logger.LogTrace($"[BLOB-TRACE] BackupService.BackupAsync | Action: Start | Guid: {requestData.Guid} | Container: {Constants.Server} | Blob: {requestData.Guid} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); @@ -104,19 +128,14 @@ public async Task BackupAsync(RequestData requestData) var jsonBytes = System.Text.Encoding.UTF8.GetBytes(json); operation = "Writing to blob"; - await using (var stream = await _blobWriter.CreateBlobAndGetOutputStreamAsync(Constants.Server, requestData.Guid.ToString())) - await using (var writer = new BufferedStream(stream)) - { - await writer.WriteAsync(jsonBytes, 0, jsonBytes.Length); - await writer.FlushAsync(); - _logger.LogTrace($"[BLOB-TRACE] BackupService.BackupAsync | Action: Written | Guid: {requestData.Guid} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); - } + await _requestStore.WriteAsync(Constants.Server, requestData.Guid.ToString(), jsonBytes).ConfigureAwait(false); + _logger.LogTrace($"[BLOB-TRACE] BackupService.BackupAsync | Action: Written | Guid: {requestData.Guid} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); // Only write out the body bytes blob the first time.. The body does not change on retries if (requestData.BodyBytes != null) { var bodyBlobName = requestData.Guid.ToString() + ".body"; - var exists = await _blobWriter.BlobExistsAsync(Constants.Server, bodyBlobName); + var exists = await _requestStore.BlobExistsAsync(Constants.Server, bodyBlobName); if (exists) { _logger.LogTrace($"[BLOB-TRACE] BackupService.BackupAsync | Action: BodyExists-Skip | Guid: {requestData.Guid} | Blob: {bodyBlobName} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); @@ -125,13 +144,8 @@ public async Task BackupAsync(RequestData requestData) } _logger.LogTrace($"[BLOB-TRACE] BackupService.BackupAsync | Action: WriteBody | Guid: {requestData.Guid} | Container: {Constants.Server} | Blob: {bodyBlobName} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); - await using (var stream = await _blobWriter.CreateBlobAndGetOutputStreamAsync(Constants.Server, bodyBlobName)) - await using (var writer = new BufferedStream(stream)) - { - await writer.WriteAsync(requestData.BodyBytes, 0, requestData.BodyBytes.Length); - await writer.FlushAsync(); - _logger.LogTrace($"[BLOB-TRACE] BackupService.BackupAsync | Action: WriteBody-Complete | Guid: {requestData.Guid} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); - } + await _requestStore.WriteAsync(Constants.Server, bodyBlobName, requestData.BodyBytes).ConfigureAwait(false); + _logger.LogTrace($"[BLOB-TRACE] BackupService.BackupAsync | Action: WriteBody-Complete | Guid: {requestData.Guid} | Time: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss.fff}"); } @@ -150,8 +164,9 @@ public async Task DeleteBackupAsync(string blobname) { try { + await EnsureServerContainerInitializedAsync().ConfigureAwait(false); _logger.LogCritical($"RequestDataBackupService: Deleting backup for blob {blobname}"); - await _blobWriter.DeleteBlobAsync(Constants.Server, blobname); + await _requestStore.DeleteBlobAsync(Constants.Server, blobname); return true; } catch (Exception ex) diff --git a/src/SimpleL7Proxy/Events/EventHubBatchTransport.cs b/src/SimpleL7Proxy/Events/EventHubBatchTransport.cs new file mode 100644 index 00000000..4c7592b8 --- /dev/null +++ b/src/SimpleL7Proxy/Events/EventHubBatchTransport.cs @@ -0,0 +1,101 @@ +using Azure.Messaging.EventHubs; +using Azure.Messaging.EventHubs.Producer; +using System.Text; + +using SimpleL7Proxy.Config; +using SimpleL7Proxy.Messaging; + +namespace SimpleL7Proxy.Events; + +internal sealed class EventHubBatchTransport : IBatchMessageTransport +{ + private readonly EventHubConfig _config; + private readonly DefaultCredential _defaultCredential; + private readonly ILogger _logger; + private EventHubProducerClient? _producerClient; + + public EventHubBatchTransport(EventHubConfig config, DefaultCredential defaultCredential, ILogger logger) + { + _config = config ?? throw new ArgumentNullException(nameof(config)); + _defaultCredential = defaultCredential ?? throw new ArgumentNullException(nameof(defaultCredential)); + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public Task OpenAsync(CancellationToken cancellationToken) + { + if (_producerClient is not null) + { + return Task.CompletedTask; + } + + if (!string.IsNullOrEmpty(_config.ConnectionString)) + { + _logger.LogInformation("[EVENT HUB] connecting via connection string, eventhubname :{EventHubName}", _config.EventHubName); + _producerClient = new EventHubProducerClient(_config.ConnectionString, _config.EventHubName); + return Task.CompletedTask; + } + + if (!string.IsNullOrEmpty(_config.EventHubNamespace)) + { + var credential = _defaultCredential.Credential; + var fullyQualifiedNamespace = _config.EventHubNamespace; + if (!fullyQualifiedNamespace.EndsWith(".servicebus.windows.net") && + !fullyQualifiedNamespace.EndsWith(".servicebus.usgovcloudapi.net")) + { + fullyQualifiedNamespace = $"{_config.EventHubNamespace}.servicebus.windows.net"; + } + + _producerClient = new EventHubProducerClient(fullyQualifiedNamespace, _config.EventHubName, credential); + return Task.CompletedTask; + } + + throw new InvalidOperationException("Event Hub connection details are not configured."); + } + + public ValueTask CreateBatchAsync(string destination, CancellationToken cancellationToken) + { + if (_producerClient is not { } producerClient) + { + throw new InvalidOperationException("Event Hub producer is not initialized."); + } + + return producerClient.CreateBatchAsync(cancellationToken); + } + + public bool TryAdd(EventDataBatch batch, BatchMessageEnvelope message) + { + return batch.TryAdd(new EventData(Encoding.UTF8.GetBytes(message.Payload))); + } + + public int GetCount(EventDataBatch batch) + { + return batch.Count; + } + + public Task SendAsync(string destination, EventDataBatch batch, CancellationToken cancellationToken) + { + if (_producerClient is not { } producerClient) + { + throw new InvalidOperationException("Event Hub producer is not initialized."); + } + + return producerClient.SendAsync(batch, cancellationToken); + } + + public void DisposeBatch(EventDataBatch batch) + { + batch.Dispose(); + } + + public async Task CloseAsync(CancellationToken cancellationToken) + { + if (_producerClient is null) + { + return; + } + + var producerClient = _producerClient; + _producerClient = null; + await producerClient.CloseAsync(cancellationToken).ConfigureAwait(false); + } +} \ No newline at end of file diff --git a/src/SimpleL7Proxy/Events/EventHubClient.cs b/src/SimpleL7Proxy/Events/EventHubClient.cs index ef3e9c17..ad1b02f1 100644 --- a/src/SimpleL7Proxy/Events/EventHubClient.cs +++ b/src/SimpleL7Proxy/Events/EventHubClient.cs @@ -1,14 +1,10 @@ -using Azure.Messaging.EventHubs; using Azure.Messaging.EventHubs.Producer; -using System.Collections.Concurrent; -using System.Text; -using System.Text.Json; using Microsoft.Extensions.Hosting; using Microsoft.Extensions.Logging; using Microsoft.Extensions.Options; -using System.Threading.Tasks; using SimpleL7Proxy.Config; +using SimpleL7Proxy.Messaging; namespace SimpleL7Proxy.Events; @@ -17,28 +13,13 @@ public class EventHubClient : IEventClient, IHostedService, IDisposable private bool _disposed = false; private readonly EventHubConfig? _config; - private readonly DefaultCredential _defaultCredential; - private EventHubProducerClient? _producerClient; - private EventDataBatch? _batchData; + private readonly IBatchMessageTransport? _transport; + private readonly BatchMessagePump? _pump; private readonly ILogger _logger; private readonly CompositeEventClient _composite; - private readonly CancellationTokenSource cancellationTokenSource = new(); - private CancellationToken workerCancelToken; - private volatile bool isRunning = false; - private volatile bool isShuttingDown = false; - private volatile bool beginShutdown = false; - private Task? writerTask; - private readonly ConcurrentQueue _logBuffer = new(); - // Connection parameters retained for reconnection + private const string DefaultDestination = "eventhub"; - private static int entryCount = 0; public static int ReconnectCount = 0; - private readonly int _eventThreshold; - private int _flushedThisMinute; - private int _flushedLastMinute; - private long _currentMinuteTicks; - - //public EventHubClient(string connectionString, string eventHubName, ILogger? logger = null) public EventHubClient(CompositeEventClient composite, IOptions options, @@ -46,7 +27,7 @@ public EventHubClient(CompositeEventClient composite, DefaultCredential defaultCredential) { var BackendOptions = options?.Value ?? throw new ArgumentNullException(nameof(options)); - _defaultCredential = defaultCredential ?? throw new ArgumentNullException(nameof(defaultCredential)); + ArgumentNullException.ThrowIfNull(defaultCredential); try { _config = new EventHubConfig(BackendOptions); @@ -57,382 +38,137 @@ public EventHubClient(CompositeEventClient composite, _config = null; } - - _eventThreshold = BackendOptions.MaxUndrainedEvents / 4; // Start flushing more aggressively at 25% capacity to avoid hitting the max and dropping events _composite = composite ?? throw new ArgumentNullException(nameof(composite)); _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - // All initialization happens in StartAsync + + if (_config is not null) + { + var transport = new EventHubBatchTransport(_config, defaultCredential, _logger); + _transport = transport; + _pump = new BatchMessagePump( + destination: DefaultDestination, + transport: transport, + createBatchAsync: cancellationToken => transport.CreateBatchAsync(DefaultDestination, cancellationToken), + recoverBatchAsync: RecoverBatchAsync, + options: new BatchMessagePumpOptions + { + FlushCountThreshold = 10, + FlushInterval = TimeSpan.FromSeconds(2), + WaitThreshold = BackendOptions.MaxUndrainedEvents / 4, + ShutdownDrainTimeout = TimeSpan.FromSeconds(30), + }); + } } - public int Count => _logBuffer.Count; - public int FlushedLastMinute => Volatile.Read(ref _flushedLastMinute); - public string ClientType => isRunning ? "EventHub" : "EventHub (Disabled)"; + public int Count => _pump?.Count ?? 0; + public int FlushedLastMinute => _pump?.FlushedLastMinute ?? 0; + public string ClientType => _pump?.IsRunning == true ? "EventHub" : "EventHub (Disabled)"; public bool IsHealthy() { - return isRunning && ReconnectCount == 0 && !isShuttingDown; + return _pump is not null && _pump.IsRunning && ReconnectCount == 0 && !_pump.IsShuttingDown; } public void BeginShutdown() { - beginShutdown = true; + _pump?.BeginShutdown(); } public async Task StartAsync(CancellationToken cancellationToken) { - // If config failed to initialize (constructor threw), skip startup gracefully if (_config == null) { _logger.LogInformation("EventHubClient configuration is null. EventHub will not be started."); return; } + if (_pump == null) + { + _logger.LogInformation("EventHubClient pump is null. EventHub will not be started."); + return; + } + using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(_config.StartupSeconds)); try { - - await ReconnectAsync(cancellationToken: cts.Token).ConfigureAwait(false); - isRunning = true; - workerCancelToken = cancellationTokenSource.Token; + await _pump.StartAsync(cts.Token).ConfigureAwait(false); _composite.Add(this); var ConnString = string.IsNullOrEmpty(_config.ConnectionString) ? "Not Set" : "Set"; _logger.LogInformation("[EVENTHB] ✓ EventHub Client started: ConnectionString: {ConnString}, Name: {EventHubName}, Namespace: {EventHubNamespace}", ConnString, _config.EventHubName, _config.EventHubNamespace); - - writerTask = Task.Run(() => EventWriter(workerCancelToken), workerCancelToken); } catch (OperationCanceledException) { _logger.LogError("EventHubClient setup timed out after {Seconds} seconds. EventHub logging will be disabled.", _config.StartupSeconds); - // Don't throw — other event clients (e.g. LogFileEventClient) should continue running } catch (Exception ex) { _logger.LogError(ex, "Failed to setup EventHubClient. EventHub logging will be disabled."); - // Don't throw — other event clients (e.g. LogFileEventClient) should continue running } } - public async Task StopAsync(CancellationToken cancellationToken) + public Task StopAsync(CancellationToken cancellationToken) { - await StopTimerAsync().ConfigureAwait(false); + _ = cancellationToken; + // Shutdown is owned by CompositeEventClient to preserve ordering during coordinated stop. + return Task.CompletedTask; } public async Task StopTimerAsync() { - isShuttingDown = true; - var drainDeadline = DateTime.UtcNow.AddSeconds(30); - while (isRunning && _logBuffer.Count > 0 && DateTime.UtcNow < drainDeadline) - { - await Task.Delay(100).ConfigureAwait(false); - } - - if (_logBuffer.Count > 0) - _logger.LogWarning("[SHUTDOWN] EventHubClient stopped with {Count} items still in queue.", _logBuffer.Count); - - cancellationTokenSource.Cancel(); - isRunning = false; - if (writerTask != null) - await writerTask.ConfigureAwait(false); - } - - public async Task EventWriter(CancellationToken lifetimeToken) - { - var pendingTasks = new List<(Task Task, List Items, int Count, EventDataBatch Batch)>(); - var pendingItems = new List(); - using var timer = new PeriodicTimer(TimeSpan.FromMilliseconds(500)); - var lastSendTime = DateTime.UtcNow; - - if (_batchData is null || _producerClient is null) + if (_pump == null) { - isRunning = false; return; } - // Phase 1: Normal processing until cancelled - while (!lifetimeToken.IsCancellationRequested) - { - HarvestCompletedSends(pendingTasks); - GetNextBatch(99, pendingItems); - - var elapsed = DateTime.UtcNow - lastSendTime; - var shouldFlush = _batchData.Count >= 10 || (elapsed.TotalSeconds >= 2 && _batchData.Count > 0); + await _pump.StopAsync().ConfigureAwait(false); - if (shouldFlush) - { - var (success, newItems) = await FlushBatchAsync(pendingTasks, pendingItems).ConfigureAwait(false); - pendingItems = newItems; - lastSendTime = DateTime.UtcNow; - if (!success) - { - await ReconnectAsync(throwOnFailure: true).ConfigureAwait(false); - continue; - } - } - - if (!beginShutdown && entryCount <= _eventThreshold) - { - try - { - await timer.WaitForNextTickAsync(lifetimeToken).ConfigureAwait(false); - } - catch (OperationCanceledException) - { - break; // exit, we're shutting down - } - } - } - - await DrainAndCloseAsync(pendingTasks, pendingItems).ConfigureAwait(false); - } - - private async Task DrainAndCloseAsync(List<(Task Task, List Items, int Count, EventDataBatch Batch)> pendingTasks, List pendingItems) - { - _logger.LogInformation("[SHUTDOWN] ✓ EventHubClient draining remaining items"); - isRunning = false; - - while (true) + if (_pump.Count > 0) { - HarvestCompletedSends(pendingTasks); - GetNextBatch(99, pendingItems); - - if (_batchData is not null && _batchData.Count > 0) - { - var (success, newItems) = await FlushBatchAsync(pendingTasks, pendingItems).ConfigureAwait(false); - pendingItems = newItems; - if (!success) - { - try { await ReconnectAsync(throwOnFailure: false).ConfigureAwait(false); } - catch { break; } - } - } - else if (pendingTasks.Count > 0) - { - // Nothing to batch — wait for all in-flight sends to settle - foreach (var (task, items, count, batch) in pendingTasks) - { - try - { - await task.ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "EventHubClient: SendAsync failed during shutdown, re-enqueuing {Count} items.", items.Count); - ReEnqueueItems(items); - } - finally - { - batch.Dispose(); - } - } - pendingTasks.Clear(); - // Loop back — failures may have re-enqueued items to batch - } - else - { - // Nothing in batch, nothing in-flight — done - break; - } + _logger.LogWarning("[SHUTDOWN] EventHubClient stopped with {Count} items still in queue.", _pump.Count); } - - if (_producerClient is not null) - await _producerClient.CloseAsync().ConfigureAwait(false); } - - private void ReEnqueueItems(List items) + public void SendData(string? value) { - foreach (var item in items) + if (value != null) { - _logBuffer.Enqueue(item); - Interlocked.Increment(ref entryCount); + _pump?.Enqueue(new BatchMessageEnvelope(DefaultDestination, value)); } } - private async Task<(bool Success, List NewPendingItems)> FlushBatchAsync( - List<(Task Task, List Items, int Count, EventDataBatch Batch)> pendingTasks, - List pendingItems) + private async ValueTask RecoverBatchAsync(CancellationToken cancellationToken) { - var flushedCount = _batchData!.Count; - var sentBatch = _batchData; - _batchData = null; // Detach — sentBatch is now exclusively owned by the in-flight send - - try + if (_transport == null || _config == null) { - var sendTask = _producerClient!.SendAsync(sentBatch, CancellationToken.None); - pendingTasks.Add((sendTask, pendingItems, flushedCount, sentBatch)); - } - catch (Exception ex) - { - // SendAsync threw synchronously — send never started, safe to re-enqueue and dispose - _logger.LogWarning(ex, "EventHubClient: SendAsync failed synchronously, reconnecting."); - sentBatch.Dispose(); - ReEnqueueItems(pendingItems); - return (false, new List()); + throw new InvalidOperationException("EventHub transport is not initialized."); } - // Send is in-flight and tracked in pendingTasks. - // HarvestCompletedSends will re-enqueue on failure — do NOT re-enqueue here. try { - _batchData = await _producerClient!.CreateBatchAsync(CancellationToken.None).ConfigureAwait(false); - return (true, new List()); + await _transport.CloseAsync(cancellationToken).ConfigureAwait(false); } - catch (Exception ex) + catch { - _logger.LogWarning(ex, "EventHubClient: CreateBatchAsync failed after send, reconnecting."); - return (false, new List()); } - } - - // Check pending send tasks for completion, update flush counts, and re-enqueue items for any failed sends - private void HarvestCompletedSends(List<(Task Task, List Items, int Count, EventDataBatch Batch)> pendingTasks) - { - for (int i = pendingTasks.Count - 1; i >= 0; i--) - { - var (task, items, count, batch) = pendingTasks[i]; - if (!task.IsCompleted) continue; - - pendingTasks.RemoveAt(i); - batch.Dispose(); - if (task.IsCompletedSuccessfully) - { - var now = DateTime.UtcNow; - var nowMinute = now.Ticks / TimeSpan.TicksPerMinute; - if (nowMinute != _currentMinuteTicks) - { - _flushedLastMinute = _flushedThisMinute; - _flushedThisMinute = count; - _currentMinuteTicks = nowMinute; - } - else - { - _flushedThisMinute += count; - } - } - else - { - _logger.LogWarning(task.Exception?.InnerException, "EventHubClient: SendAsync failed, re-enqueuing {Count} items.", items.Count); - ReEnqueueItems(items); - } - } - } - - private async Task ReconnectAsync(bool throwOnFailure = true, CancellationToken cancellationToken = default) - { - async Task ConnectAsync() - { - if (!string.IsNullOrEmpty(_config!.ConnectionString)) - { - _logger.LogInformation("[EVENT HUB] connecting via connection string, eventhubname :" + _config.EventHubName); - _producerClient = new EventHubProducerClient(_config.ConnectionString, _config.EventHubName); - } - else if (!string.IsNullOrEmpty(_config.EventHubNamespace)) - { - var credential = _defaultCredential.Credential; - var fullyQualifiedNamespace = _config.EventHubNamespace; - if (!fullyQualifiedNamespace.EndsWith(".servicebus.windows.net") && - !fullyQualifiedNamespace.EndsWith(".servicebus.usgovcloudapi.net")) - fullyQualifiedNamespace = $"{_config.EventHubNamespace}.servicebus.windows.net"; - _producerClient = new EventHubProducerClient(fullyQualifiedNamespace, _config.EventHubName, credential); - } - }; - - try - { - if (_producerClient is not null) - await _producerClient.CloseAsync().ConfigureAwait(false); - } - catch { /* best effort close */ } Interlocked.Exchange(ref ReconnectCount, 0); - for (int attempt = 1; attempt <= _config!.MaxReconnectAttempts; attempt++) + for (int attempt = 1; attempt <= _config.MaxReconnectAttempts; attempt++) { Interlocked.Increment(ref ReconnectCount); try { - await ConnectAsync().ConfigureAwait(false); - _batchData?.Dispose(); - _batchData = await _producerClient!.CreateBatchAsync().ConfigureAwait(false); + await _transport.OpenAsync(cancellationToken).ConfigureAwait(false); + var batch = await _transport.CreateBatchAsync(DefaultDestination, cancellationToken).ConfigureAwait(false); Interlocked.Exchange(ref ReconnectCount, 0); - return; - } - catch (Exception ex) - { - Console.WriteLine($"EventHubClient: Reconnect failed: {ex.Message}"); - await Task.Delay(500 * attempt, cancellationToken).ConfigureAwait(false); // Wait for attempt/2 seconds before retrying - } - } - - if ( throwOnFailure) - throw new Exception("EventHubClient: Failed to reconnect after multiple attempts."); - } - - // Add the log to the batch up to count number at a time - private int GetNextBatch(int count, List pendingItems) - { - if (_batchData is null) - return 0; - - int initialCount = count; - - for (int i = 0; i < initialCount; i++) - { - if (!_logBuffer.TryDequeue(out string? log)) - { - break; - } - - EventData eventData; - try - { - eventData = new EventData(Encoding.UTF8.GetBytes(log)); + return batch; } catch (Exception ex) { - // Drop the item — it cannot be encoded; re-enqueuing would loop forever. - Interlocked.Decrement(ref entryCount); - Console.WriteLine($"EventHubClient: Failed to encode log entry, dropping: {ex.Message}"); - continue; - } - - if (_batchData.TryAdd(eventData)) - { - Interlocked.Decrement(ref entryCount); - pendingItems.Add(log); - } - else - { - if (_batchData.Count == 0) - { - // Batch is empty and still can't fit — item is genuinely too large. Drop it. - Interlocked.Decrement(ref entryCount); - _logger.LogError("EventHubClient: Log entry too large for batch, dropping ({Bytes} bytes).", Encoding.UTF8.GetByteCount(log)); - } - else - { - // Batch is full — put the item back and send what we have - _logBuffer.Enqueue(log); - } - break; + _logger.LogWarning(ex, "EventHubClient: Reconnect attempt {Attempt} failed.", attempt); + await Task.Delay(500 * attempt, cancellationToken).ConfigureAwait(false); } } - return _batchData.Count; - } - - // bool nop=true; - public void SendData(string? value) - { - if (!isRunning || isShuttingDown) return; - - if (value == null) return; - - // if ( nop) return; - - if (value.StartsWith("\n\n")) - value = value.Substring(2); - - Interlocked.Increment(ref entryCount); - _logBuffer.Enqueue(value); + throw new Exception("EventHubClient: Failed to reconnect after multiple attempts."); } protected virtual void Dispose(bool disposing) @@ -441,7 +177,7 @@ protected virtual void Dispose(bool disposing) { if (disposing) { - cancellationTokenSource.Dispose(); + _pump?.Dispose(); } _disposed = true; } diff --git a/src/SimpleL7Proxy/Events/LogFileEventClient.cs b/src/SimpleL7Proxy/Events/LogFileEventClient.cs index 71cf683f..1068733e 100644 --- a/src/SimpleL7Proxy/Events/LogFileEventClient.cs +++ b/src/SimpleL7Proxy/Events/LogFileEventClient.cs @@ -1,194 +1,158 @@ -using Azure.Messaging.EventHubs; -using Azure.Messaging.EventHubs.Producer; -using System.Collections.Concurrent; using System.Text; -using System.Text.Json; using Microsoft.Extensions.Hosting; -using Microsoft.Extensions.Logging; using Microsoft.Extensions.Options; using SimpleL7Proxy.Config; +using SimpleL7Proxy.Messaging; namespace SimpleL7Proxy.Events; -public class LogFileEventClient : IEventClient, IHostedService +public class LogFileEventClient : IEventClient, IHostedService, IBatchMessageTransport, BatchMessageEnvelope> { + private IBatchMessageTransport, BatchMessageEnvelope> BatchTransport => this; - private static CancellationTokenSource cancellationTokenSource = new CancellationTokenSource(); - private CancellationToken workerCancelToken; private bool isRunning = false; - private bool isShuttingDown = false; - private bool beginShutdown = false; - private Task? writerTask; - private ConcurrentQueue _logBuffer = new ConcurrentQueue(); - - public bool IsRunning { get => isRunning; set => isRunning = value; } - public int GetEntryCount() => entryCount; - private static int entryCount = 0; - private int _flushedThisMinute; - private int _flushedLastMinute; - private long _currentMinuteTicks; + private readonly BatchMessagePump, BatchMessageEnvelope> _pump; + private const string DefaultDestination = "file"; + + public bool IsRunning { get => _pump.IsRunning || isRunning; set => isRunning = value; } + public int GetEntryCount() => _pump.EntryCount; private readonly CompositeEventClient _composite; private readonly StringBuilder _sb = new(); private static Stream log = null!; private static StreamWriter writer = null!; - - public LogFileEventClient(string filename, CompositeEventClient composite, IOptions options ) + + public LogFileEventClient(string filename, CompositeEventClient composite, IOptions options) { + var proxyOptions = options?.Value ?? throw new ArgumentNullException(nameof(options)); _composite = composite ?? throw new ArgumentNullException(nameof(composite)); - // create file stream to a log file + log = new FileStream(filename, FileMode.OpenOrCreate, FileAccess.Write); writer = new StreamWriter(log) { - AutoFlush = true + AutoFlush = true, }; - workerCancelToken = cancellationTokenSource.Token; - - - return; + _pump = new BatchMessagePump, BatchMessageEnvelope>( + destination: DefaultDestination, + transport: this, + createBatchAsync: cancellationToken => BatchTransport.CreateBatchAsync(DefaultDestination, cancellationToken), + recoverBatchAsync: cancellationToken => BatchTransport.CreateBatchAsync(DefaultDestination, cancellationToken), + options: new BatchMessagePumpOptions + { + FlushCountThreshold = 10, + FlushInterval = TimeSpan.FromSeconds(2), + WaitThreshold = proxyOptions.MaxUndrainedEvents / 4, + ShutdownDrainTimeout = TimeSpan.FromSeconds(30), + }); } - public int Count => _logBuffer.Count; - public int FlushedLastMinute => Volatile.Read(ref _flushedLastMinute); + public int Count => _pump.Count; + public int FlushedLastMinute => _pump.FlushedLastMinute; public string ClientType => "LogFile"; public bool IsHealthy() { - return isRunning && !isShuttingDown; + return _pump.IsRunning && !_pump.IsShuttingDown; } - - public Task StartAsync(CancellationToken cancellationToken) + public async Task StartAsync(CancellationToken cancellationToken) { Console.WriteLine("[STARTUP] ✓ Local File Logger starting"); - workerCancelToken = cancellationTokenSource.Token; - if (!isRunning) + if (!_pump.IsRunning) { + await _pump.StartAsync(cancellationToken).ConfigureAwait(false); _composite.Add(this); - writerTask = Task.Run(() => EventWriter(workerCancelToken)); + isRunning = true; } - return Task.CompletedTask; } public void BeginShutdown() { - beginShutdown = true; + _pump.BeginShutdown(); } - public async Task StopAsync(CancellationToken cancellationToken) + public Task StopAsync(CancellationToken cancellationToken) { - await StopTimerAsync().ConfigureAwait(false); + _ = cancellationToken; + // Shutdown is owned by CompositeEventClient to preserve ordering during coordinated stop. + return Task.CompletedTask; } - - public async Task EventWriter(CancellationToken token) + public async Task StopTimerAsync() { - isRunning = true; - try + if (!_pump.IsRunning && !isRunning) { - using var timer = new PeriodicTimer(TimeSpan.FromMilliseconds(500)); - while (!token.IsCancellationRequested) - { - LogNextBatch(99); + Console.WriteLine("LogFileEventClient: StopTimerAsync called but the logger is already stopped."); + return; + } - if (!beginShutdown) - { - await timer.WaitForNextTickAsync(token).ConfigureAwait(false); - } - } - Console.WriteLine("[SHUTDOWN] ✓ LogFileEventClient exiting"); + await _pump.StopAsync().ConfigureAwait(false); - } - catch (TaskCanceledException) + isRunning = false; + + if (_pump.Count > 0) { - // Ignore + Console.WriteLine($"[SHUTDOWN] LogFileEventClient stopped with {_pump.Count} items still in queue."); } - finally - { - while (true) - { - if (LogNextBatch(99) == 0) - break; - } - - await Task.Delay(500).ConfigureAwait(false); // Wait for 1/2 second - // make sure event hub client is closed + } - writer.Flush(); - writer.Dispose(); - log?.Close(); - log?.Dispose(); + public void SendData(string? value) + { + if (value != null) + { + _pump.Enqueue(new BatchMessageEnvelope(DefaultDestination, value)); } } - // Add the log to the batch up to count number at a time - private int LogNextBatch(int count) + Task IBatchMessageTransport, BatchMessageEnvelope>.OpenAsync(CancellationToken cancellationToken) { - _sb.Clear(); - int drained = 0; + return Task.CompletedTask; + } - while (drained < count && _logBuffer.TryDequeue(out string? line)) - { - _sb.AppendLine(line); - drained++; - } + ValueTask> IBatchMessageTransport, BatchMessageEnvelope>.CreateBatchAsync(string destination, CancellationToken cancellationToken) + { + return ValueTask.FromResult(new List()); + } - if (drained > 0) - { - writer.Write(_sb); - writer.Flush(); - Interlocked.Add(ref entryCount, -drained); + bool IBatchMessageTransport, BatchMessageEnvelope>.TryAdd(List batch, BatchMessageEnvelope message) + { + batch.Add(message); + return true; + } - // Track events flushed per wall-clock minute - var nowMinute = DateTime.UtcNow.Ticks / TimeSpan.TicksPerMinute; - if (nowMinute != _currentMinuteTicks) - { - _flushedLastMinute = _flushedThisMinute; - _flushedThisMinute = drained; - _currentMinuteTicks = nowMinute; - } - else - { - _flushedThisMinute += drained; - } - } - return drained; + int IBatchMessageTransport, BatchMessageEnvelope>.GetCount(List batch) + { + return batch.Count; } - public async Task StopTimerAsync() + Task IBatchMessageTransport, BatchMessageEnvelope>.SendAsync(string destination, List batch, CancellationToken cancellationToken) { - if (writerTask == null) - { - Console.WriteLine("LogFileEventClient: StopTimerAsync called but writerTask is null"); - return; - } - isShuttingDown = true; - while (isRunning && _logBuffer.Count > 0) + _sb.Clear(); + foreach (var message in batch) { - await Task.Delay(100).ConfigureAwait(false); + _sb.AppendLine(message.Payload); } - cancellationTokenSource.Cancel(); - - await writerTask.ConfigureAwait(false); - isRunning = false; + writer.Write(_sb); + writer.Flush(); + return Task.CompletedTask; } - public void SendData(string? value) + void IBatchMessageTransport, BatchMessageEnvelope>.DisposeBatch(List batch) { - if (!isRunning || isShuttingDown) return; - - if (value == null) return; - - if (value.StartsWith("\n\n")) - value = value.Substring(2); - - Interlocked.Increment(ref entryCount); + batch.Clear(); + } - _logBuffer.Enqueue(value); + Task IBatchMessageTransport, BatchMessageEnvelope>.CloseAsync(CancellationToken cancellationToken) + { + writer.Flush(); + writer.Dispose(); + log?.Close(); + log?.Dispose(); + return Task.CompletedTask; } // public void SendData(Dictionary eventData) diff --git a/src/SimpleL7Proxy/HealthCheckService.cs b/src/SimpleL7Proxy/HealthCheckService.cs index 73685fb0..92a636be 100644 --- a/src/SimpleL7Proxy/HealthCheckService.cs +++ b/src/SimpleL7Proxy/HealthCheckService.cs @@ -32,6 +32,7 @@ public class HealthCheckService private readonly IBackupAPIService? _backupAPIService; private readonly IServiceBusRequestService? _serviceBusRequestService; private readonly IBlobWriter? _blobWriter; + private readonly BlobWorkerPump? _blobWriteQueue; private readonly IUserProfileService? _userProfileService; private readonly AppConfigService _appConfigService; private readonly Func _getWorkerState; @@ -71,7 +72,6 @@ public class HealthCheckService private DateTime _lastFinalizerDrain = DateTime.UtcNow; private static TimeSpan s_finalizerDrainInterval; - public HealthCheckService( IEndpointMonitorService backends, IOptions options, @@ -82,6 +82,7 @@ public HealthCheckService( AppConfigService appConfigService, IServiceBusRequestService? serviceBusRequestService = null, IBlobWriter? blobWriter = null, + BlobWorkerPump? blobWriteQueue = null, IBackupAPIService? backupAPIService = null, IUserProfileService? userProfileService = null) { @@ -94,6 +95,7 @@ public HealthCheckService( _eventClient = eventClient; _serviceBusRequestService = serviceBusRequestService; _blobWriter = blobWriter; + _blobWriteQueue = blobWriteQueue; _backupAPIService = backupAPIService; _getWorkerState = GetWorkerState; _logger = logger; @@ -335,13 +337,14 @@ public void BuildHealthResponse(string path, int hostCount, bool hasFailedHosts, .Append(' ').Append(shared).Append('\n'); // Probes - var (startupStatus, readinessStatus, undrainedEvents) = GetStatus(); + var (startupStatus, readinessStatus, undrainedEvents, blobQueueDepth) = GetStatus(); _stringBuilder .Append('\n') .Append("─── Probes ────────────────────────────────────────────────────\n") .Append(" /startup : ").Append(startupStatus == HealthStatusEnum.StartupReady ? "200 OK" : "503 " + startupStatus).Append('\n') .Append(" /readiness : ").Append(readinessStatus == HealthStatusEnum.ReadinessReady ? "200 OK" : "503 " + readinessStatus).Append('\n') - .Append(" Undrained : ").Append(undrainedEvents).Append(" / ").Append(_options.MaxUndrainedEvents).Append('\n'); + .Append(" Undrained : ").Append(undrainedEvents).Append(" / ").Append(_options.MaxUndrainedEvents).Append('\n') + .Append(" Blob Queue : ").Append(blobQueueDepth).Append(" / ").Append(_options.AsyncBlobMaxQueue).Append('\n'); // Workers _stringBuilder @@ -580,7 +583,7 @@ public void RunPeriodicGC() // Method to get overall health status for probes, used by ProbeServer // Returns a tuple of (startupStatus, readinessStatus, activeUndrainedEvents) for more detailed monitoring - public (HealthStatusEnum, HealthStatusEnum, int) GetStatus() + public (HealthStatusEnum, HealthStatusEnum, int, int) GetStatus() { int hostCount = _backends.ActiveHostCount(); bool hasFailed = _backends.CheckFailedStatusAsync(true).Result; @@ -589,7 +592,9 @@ public void RunPeriodicGC() int activeEvents = _eventClient?.Count ?? 0; bool tooManyEvents = activeEvents > _options.MaxUndrainedEvents; bool eventsAreHealthy = _eventClient?.IsHealthy() == true; - var isReady = IsReadyToWork && backendsStarted && profilesReady && !tooManyEvents; + int blobQueueDepth = _blobWriteQueue?.QueueDepth ?? 0; + bool blobQueueHealthy = blobQueueDepth <= _options.AsyncBlobMaxQueue; + var isReady = IsReadyToWork && backendsStarted && profilesReady && !tooManyEvents && blobQueueHealthy; if (isReady && firstHealthCheck) { @@ -602,24 +607,24 @@ public void RunPeriodicGC() if (!isReady) { - return (HealthStatusEnum.StartupZeroHosts, HealthStatusEnum.ReadinessZeroHosts, activeEvents); + return (HealthStatusEnum.StartupZeroHosts, HealthStatusEnum.ReadinessZeroHosts, activeEvents, blobQueueDepth); } if (hostCount == 0) { - return (HealthStatusEnum.StartupZeroHosts, HealthStatusEnum.ReadinessZeroHosts, activeEvents); + return (HealthStatusEnum.StartupZeroHosts, HealthStatusEnum.ReadinessZeroHosts, activeEvents, blobQueueDepth); } if (hasFailed) { - return (HealthStatusEnum.StartupFailedHosts, HealthStatusEnum.ReadinessFailedHosts, activeEvents); + return (HealthStatusEnum.StartupFailedHosts, HealthStatusEnum.ReadinessFailedHosts, activeEvents, blobQueueDepth); } if (!eventsAreHealthy) { - return (HealthStatusEnum.StartupFailedHosts, HealthStatusEnum.ReadinessFailedHosts, activeEvents); + return (HealthStatusEnum.StartupFailedHosts, HealthStatusEnum.ReadinessFailedHosts, activeEvents, blobQueueDepth); } - return (HealthStatusEnum.StartupReady, HealthStatusEnum.ReadinessReady, activeEvents); + return (HealthStatusEnum.StartupReady, HealthStatusEnum.ReadinessReady, activeEvents, blobQueueDepth); } } diff --git a/src/SimpleL7Proxy/Messaging/BatchMessagePump.cs b/src/SimpleL7Proxy/Messaging/BatchMessagePump.cs new file mode 100644 index 00000000..0a626f9c --- /dev/null +++ b/src/SimpleL7Proxy/Messaging/BatchMessagePump.cs @@ -0,0 +1,382 @@ +using System.Collections.Concurrent; + +namespace SimpleL7Proxy.Messaging; + +internal sealed class BatchMessagePumpOptions +{ + public int MaxBatchItems { get; init; } = 99; + public int FlushCountThreshold { get; init; } = 1; + public TimeSpan FlushInterval { get; init; } = TimeSpan.FromMilliseconds(500); + public int WaitThreshold { get; init; } = int.MaxValue; + public TimeSpan ShutdownDrainTimeout { get; init; } = TimeSpan.FromSeconds(30); +} + +internal sealed class BatchMessagePump : IDisposable + where TBatch : class +{ + private static readonly TimeSpan NormalWaitInterval = TimeSpan.FromMilliseconds(500); + + private readonly string _destination; + private readonly IBatchMessageTransport _transport; + private readonly Func> _createBatchAsync; + private readonly Func> _recoverBatchAsync; + private readonly BatchMessagePumpOptions _options; + private readonly ConcurrentQueue _queue = new(); + private readonly CancellationTokenSource _cancellationTokenSource = new(); + private readonly SemaphoreSlim _startLock = new(1, 1); + + private Task? _writerTask; + private TBatch? _batch; + private volatile bool _isRunning; + private volatile bool _isShuttingDown; + private volatile bool _beginShutdown; + private int _entryCount; + private int _flushedThisMinute; + private int _flushedLastMinute; + private long _currentMinuteTicks; + private bool _disposed; + + public BatchMessagePump( + string destination, + IBatchMessageTransport transport, + Func> createBatchAsync, + Func> recoverBatchAsync, + BatchMessagePumpOptions? options = null) + { + _destination = destination ?? throw new ArgumentNullException(nameof(destination)); + _transport = transport ?? throw new ArgumentNullException(nameof(transport)); + _createBatchAsync = createBatchAsync ?? throw new ArgumentNullException(nameof(createBatchAsync)); + _recoverBatchAsync = recoverBatchAsync ?? throw new ArgumentNullException(nameof(recoverBatchAsync)); + _options = options ?? new BatchMessagePumpOptions(); + } + + public int Count => _queue.Count; + + public int EntryCount => Volatile.Read(ref _entryCount); + + public int FlushedLastMinute => Volatile.Read(ref _flushedLastMinute); + + public bool IsRunning => _isRunning; + + public bool IsShuttingDown => _isShuttingDown; + + public void BeginShutdown() + { + _beginShutdown = true; + } + + public void Enqueue(TItem? item) + { + if (item == null || !_isRunning || _isShuttingDown) + { + return; + } + + // For string items, trim leading newlines (backward compatibility) + if (item is string str) + { + if (str.StartsWith("\n\n", StringComparison.Ordinal)) + { + item = (TItem)(object)str.Substring(2); + } + } + + Interlocked.Increment(ref _entryCount); + _queue.Enqueue(item); + } + + public async Task StartAsync(CancellationToken cancellationToken) + { + if (_isRunning) + { + return; + } + + await _startLock.WaitAsync(cancellationToken).ConfigureAwait(false); + try + { + if (_isRunning) + { + return; + } + + _batch = await _recoverBatchAsync(cancellationToken).ConfigureAwait(false); + _isRunning = true; + _writerTask = Task.Run(() => RunAsync(_cancellationTokenSource.Token), _cancellationTokenSource.Token); + } + finally + { + _startLock.Release(); + } + } + + public async Task StopAsync() + { + if (_writerTask == null) + { + return; + } + + _isShuttingDown = true; + + var drainDeadline = DateTime.UtcNow.Add(_options.ShutdownDrainTimeout); + while (_isRunning && Count > 0 && DateTime.UtcNow < drainDeadline) + { + await Task.Delay(100).ConfigureAwait(false); + } + + _cancellationTokenSource.Cancel(); + await _writerTask.ConfigureAwait(false); + _isRunning = false; + } + + private async Task RunAsync(CancellationToken lifetimeToken) + { + var pendingTasks = new List<(Task Task, List Items, int Count, TBatch Batch)>(); + var pendingItems = new List(); + using var timer = new PeriodicTimer(NormalWaitInterval); + var lastSendTime = DateTime.UtcNow; + + try + { + while (!lifetimeToken.IsCancellationRequested) + { + HarvestCompletedSends(pendingTasks); + FillCurrentBatch(_options.MaxBatchItems, pendingItems); + + var currentBatch = _batch; + if (currentBatch is not null) + { + var batchCount = _transport.GetCount(currentBatch); + if (ShouldFlush(batchCount, DateTime.UtcNow - lastSendTime)) + { + var (success, newItems) = await FlushBatchAsync(pendingTasks, pendingItems).ConfigureAwait(false); + pendingItems = newItems; + lastSendTime = DateTime.UtcNow; + if (!success) + { + _batch = await _recoverBatchAsync(CancellationToken.None).ConfigureAwait(false); + continue; + } + } + } + + if (!_beginShutdown && EntryCount <= _options.WaitThreshold) + { + try + { + await timer.WaitForNextTickAsync(lifetimeToken).ConfigureAwait(false); + } + catch (OperationCanceledException) + { + break; + } + } + } + } + catch (TaskCanceledException) + { + } + finally + { + await DrainAndCloseAsync(pendingTasks, pendingItems).ConfigureAwait(false); + _isRunning = false; + } + } + + private async Task DrainAndCloseAsync( + List<(Task Task, List Items, int Count, TBatch Batch)> pendingTasks, + List pendingItems) + { + while (true) + { + HarvestCompletedSends(pendingTasks); + FillCurrentBatch(_options.MaxBatchItems, pendingItems); + + var currentBatch = _batch; + if (currentBatch is not null && _transport.GetCount(currentBatch) > 0) + { + var (success, newItems) = await FlushBatchAsync(pendingTasks, pendingItems).ConfigureAwait(false); + pendingItems = newItems; + if (!success) + { + _batch = await _recoverBatchAsync(CancellationToken.None).ConfigureAwait(false); + } + } + else if (pendingTasks.Count > 0) + { + foreach (var (task, items, _, batch) in pendingTasks) + { + try + { + await task.ConfigureAwait(false); + } + catch + { + ReEnqueueItems(items); + } + finally + { + _transport.DisposeBatch(batch); + } + } + + pendingTasks.Clear(); + } + else + { + break; + } + } + + if (_batch is not null) + { + _transport.DisposeBatch(_batch); + _batch = null; + } + + try + { + await _transport.CloseAsync(CancellationToken.None).ConfigureAwait(false); + } + catch + { + } + } + + private async Task<(bool Success, List NewPendingItems)> FlushBatchAsync( + List<(Task Task, List Items, int Count, TBatch Batch)> pendingTasks, + List pendingItems) + { + if (_batch is not { } sentBatch) + { + return (true, pendingItems); + } + + var flushedCount = _transport.GetCount(sentBatch); + _batch = null; + + try + { + var sendTask = _transport.SendAsync(_destination, sentBatch, CancellationToken.None); + pendingTasks.Add((sendTask, pendingItems, flushedCount, sentBatch)); + } + catch + { + _transport.DisposeBatch(sentBatch); + ReEnqueueItems(pendingItems); + return (false, new List()); + } + + _batch = await _createBatchAsync(CancellationToken.None).ConfigureAwait(false); + return (true, new List()); + } + + private void HarvestCompletedSends(List<(Task Task, List Items, int Count, TBatch Batch)> pendingTasks) + { + for (int i = pendingTasks.Count - 1; i >= 0; i--) + { + var (task, items, count, batch) = pendingTasks[i]; + if (!task.IsCompleted) + { + continue; + } + + pendingTasks.RemoveAt(i); + _transport.DisposeBatch(batch); + if (task.IsCompletedSuccessfully) + { + UpdateFlushMetrics(count); + } + else + { + ReEnqueueItems(items); + } + } + } + + private void FillCurrentBatch(int count, List pendingItems) + { + if (_batch is not { } currentBatch) + { + return; + } + + for (int i = 0; i < count; i++) + { + if (!_queue.TryDequeue(out var message)) + { + break; + } + + if (_transport.TryAdd(currentBatch, message)) + { + Interlocked.Decrement(ref _entryCount); + pendingItems.Add(message); + continue; + } + + if (_transport.GetCount(currentBatch) == 0) + { + Interlocked.Decrement(ref _entryCount); + } + else + { + _queue.Enqueue(message); + } + + break; + } + } + + private void ReEnqueueItems(List items) + { + foreach (var item in items) + { + _queue.Enqueue(item); + Interlocked.Increment(ref _entryCount); + } + } + + private void UpdateFlushMetrics(int count) + { + var nowMinute = DateTime.UtcNow.Ticks / TimeSpan.TicksPerMinute; + if (nowMinute != _currentMinuteTicks) + { + _flushedLastMinute = _flushedThisMinute; + _flushedThisMinute = count; + _currentMinuteTicks = nowMinute; + } + else + { + _flushedThisMinute += count; + } + } + + private bool ShouldFlush(int batchCount, TimeSpan elapsed) + { + if (batchCount == 0) + { + return false; + } + + if (batchCount >= _options.FlushCountThreshold) + { + return true; + } + + return elapsed >= _options.FlushInterval; + } + + public void Dispose() + { + if (_disposed) + { + return; + } + + _cancellationTokenSource.Dispose(); + _startLock.Dispose(); + _disposed = true; + } +} \ No newline at end of file diff --git a/src/SimpleL7Proxy/Messaging/IBatchMessageTransport.cs b/src/SimpleL7Proxy/Messaging/IBatchMessageTransport.cs new file mode 100644 index 00000000..b6b57adb --- /dev/null +++ b/src/SimpleL7Proxy/Messaging/IBatchMessageTransport.cs @@ -0,0 +1,17 @@ +using System.Threading; +using System.Threading.Tasks; + +namespace SimpleL7Proxy.Messaging; + +internal readonly record struct BatchMessageEnvelope(string Destination, string Payload); + +internal interface IBatchMessageTransport +{ + Task OpenAsync(CancellationToken cancellationToken); + ValueTask CreateBatchAsync(string destination, CancellationToken cancellationToken); + bool TryAdd(TBatch batch, TItem message); + int GetCount(TBatch batch); + Task SendAsync(string destination, TBatch batch, CancellationToken cancellationToken); + void DisposeBatch(TBatch batch); + Task CloseAsync(CancellationToken cancellationToken); +} \ No newline at end of file diff --git a/src/SimpleL7Proxy/ProbeServer.cs b/src/SimpleL7Proxy/ProbeServer.cs index 5a24ca73..91dfc38b 100644 --- a/src/SimpleL7Proxy/ProbeServer.cs +++ b/src/SimpleL7Proxy/ProbeServer.cs @@ -34,11 +34,15 @@ public class ProbeServer : BackgroundService, IConfigChangeSubscriber private static HealthStatusEnum _readinessStatus = HealthStatusEnum.ReadinessZeroHosts; private static HealthStatusEnum _startupStatus = HealthStatusEnum.StartupZeroHosts; private static int _activeUndrainedEvents = 0; - - + private static int _blobQueueDepth = 0; + // Active snapshots published to readers (use Volatile.Read/Write for memory ordering) private Timer? _probeTimer; + private int _tickCounter = 0; + private int _tickCounter2 = 0; + private const int SidecarPushTickInterval = 5; // push every 5 timer ticks (5s when timer interval is 1s) + private const int GCTickInterval = 10; // check for GC cleanup 60 ticks private readonly ProxyConfig _backendOptions; private HttpClient? _selfCheckClient; private IEventClient? _eventClient; @@ -99,18 +103,23 @@ private void StartProbeServer() // Single timer for status updates and optional sidecar push _probeTimer = new Timer(_ => { - (_startupStatus, _readinessStatus, _activeUndrainedEvents) = _healthService.GetStatus(); + (_startupStatus, _readinessStatus, _activeUndrainedEvents, _blobQueueDepth) = _healthService.GetStatus(); - // Push to sidecar if enabled (fire-and-forget async to avoid blocking threadpool) + // Push to sidecar if enabled, throttled to once per SidecarPushTickInterval ticks var client = _selfCheckClient; - if (client != null) + if (client != null && ++_tickCounter >= SidecarPushTickInterval) { + _tickCounter = 0; _ = PushStatusToSidecarAsync(client); } - _healthService.RunPeriodicGC(); + // Run periodic GC check every GCTickInterval ticks + if ( ++_tickCounter2 >= GCTickInterval) { + _tickCounter2 = 0; + _healthService.RunPeriodicGC(); + } - }, null, TimeSpan.FromSeconds(3), TimeSpan.FromSeconds(10)); // initial delay, interval + }, null, TimeSpan.FromSeconds(3), TimeSpan.FromSeconds(1)); // initial delay, interval FailedAttempts = 0; } @@ -156,6 +165,7 @@ private async Task PushStatusToSidecarAsync(HttpClient selfCheckClient) } } + public int BlobQueueDepth => _blobQueueDepth; public int EventCount => _activeUndrainedEvents; // TODO: no need for stopwatch any longer diff --git a/src/SimpleL7Proxy/Program.cs b/src/SimpleL7Proxy/Program.cs index df2f153c..3c21eaa8 100644 --- a/src/SimpleL7Proxy/Program.cs +++ b/src/SimpleL7Proxy/Program.cs @@ -13,6 +13,7 @@ using Azure.Messaging.ServiceBus; +using SimpleL7Proxy.Async; using SimpleL7Proxy.Config; using SimpleL7Proxy.Events; using SimpleL7Proxy.Proxy; @@ -152,6 +153,7 @@ private static ILogger InitializeRuntime(IServiceProvider serviceProvider, AppCo serviceProvider.GetRequiredService(); serviceProvider.GetRequiredService(); + HostConfig.Initialize(backendTokenProvider, logger, serviceProvider); var hostCollection = serviceProvider.GetRequiredService(); @@ -160,6 +162,17 @@ private static ILogger InitializeRuntime(IServiceProvider serviceProvider, AppCo var healthService = serviceProvider.GetRequiredService(); Task healthCheck = healthService.BeginStartupMonitoring(); + // Initialize AsyncWorker static dependencies (only if async mode is enabled) + if (options.Value.AsyncModeEnabled) + { + var fileStore = serviceProvider.GetRequiredService(); + var streamingStore = serviceProvider.GetRequiredService(); + var asyncWorkerLogger = loggerFactory.CreateLogger(); + var probeService = serviceProvider.GetRequiredService(); + var messages = serviceProvider.GetRequiredService(); + AsyncWorker.Initialize(fileStore, streamingStore, asyncWorkerLogger, messages, options.Value, probeService); + } + var appLifetime = serviceProvider.GetRequiredService(); appLifetime.ApplicationStarted.Register(async () => { @@ -232,10 +245,12 @@ private static void ConfigureDI(IServiceCollection services, ILoggerFactory star if (backendOptions.AsyncModeEnabled) RegisterAsyncDI(services, startupLogger, backendOptions); else { - services.AddTransient(); services.AddSingleton(); services.AddSingleton(); services.AddSingleton(); + // AsyncWorkerContext, IAsyncRequestStore, and TemplateLoader are intentionally + // not registered when async mode is disabled — WorkerContext.AsyncWorkerContext + // resolves to null and is never read because request.runAsync stays false. } services.AddSingleton(); @@ -302,11 +317,28 @@ private static void ConfigureDI(IServiceCollection services, ILoggerFactory star private static void RegisterAsyncDI(IServiceCollection services, ILogger startupLogger, ProxyConfig backendOptions) { + // ───────────────────────────────────────────────────────────────────────────── + // Async DI map (interface → implementation), resolved by reflection below. + // + // IBlobWriter → QueuedBlobWriter is the canonical binding for async mode: + // every consumer that takes IBlobWriter (TemplateLoader, Server, + // HealthCheckService, AsyncFileStore, …) gets the queued decorator, so all + // small-blob writes flow through the BlobWriteQueue automatically. + // + // The non-async branch in ConfigureDI binds IBlobWriter → NullBlobWriter. + // These are the ONLY two bindings of IBlobWriter in the app — keep it that + // way. Adding a second binding here will silently shadow the queued path + // for whichever consumer DI happens to resolve last. + // + // AsyncStreamingStore is the deliberate exception: it does NOT take + // IBlobWriter. It calls IBlobWriterFactory.CreateBlobWriter() to get a raw + // BlobWriter so multi-GB response bodies bypass the queue. + // ───────────────────────────────────────────────────────────────────────────── const string asyncClassesRaw = "IServiceBusFactory:ServiceBusFactory, IServiceBusRequestService:ServiceBusRequestService, " + - "IBackupAPIService:BackupAPIService, IBlobWriterFactory:BlobWriterFactory"; + "IBackupAPIService:BackupAPIService, IBlobWriterFactory:BlobWriterFactory, IBlobWriter:QueuedBlobWriter"; - // "IBlobWriter:QueuedBlobWriter, IAsyncFeeder:AsyncFeeder, " + + // "IAsyncFeeder:AsyncFeeder, " + // "IRequestProcessor:NormalRequest, IRequestProcessor:OpenAIBackgroundRequest"; var assembly = typeof(Program).Assembly; @@ -352,22 +384,12 @@ private static void RegisterAsyncDI(IServiceCollection services, ILogger startup services.AddSingleton(iType, cType); } - services.AddSingleton(); services.AddSingleton(); services.AddSingleton(); services.AddSingleton(); - // BlobWriter infrastructure — QueuedBlobWriter wraps BlobWriter (circular IBlobWriter dep), - // so these must remain as explicit factory registrations and override the dictionary entry. - services.AddSingleton(provider => - { - var factory = provider.GetRequiredService(); - var blobWriter = factory.CreateBlobWriter() as BlobWriter; - var logger = provider.GetRequiredService>(); - logger.LogInformation("[STARTUP] ✓ BlobWriter initialized: {BlobWriterType} (Status: {Status})", blobWriter?.GetType().Name ?? "Unknown", factory.InitStatus); - return blobWriter!; - }); - + // BlobWriteQueue tuning (worker count, batch size, dedup) — consumed by + // BlobWriteQueue's ctor below. services.AddSingleton(provider => { return new BlobWriteQueueOptions @@ -382,35 +404,26 @@ private static void RegisterAsyncDI(IServiceCollection services, ILogger startup }; }); - services.AddSingleton(); + services.AddSingleton(); - // Override dictionary's simple IBlobWriter registration — QueuedBlobWriter needs - // the concrete BlobWriter (not IBlobWriter) to avoid circular dependency. - services.AddSingleton(provider => - { - var underlyingWriter = provider.GetRequiredService(); - var queue = provider.GetRequiredService(); - var logger = provider.GetRequiredService>(); + // Two-store split (both backed by the existing IBlobWriter → QueuedBlobWriter mapping above): + // AsyncFileStore → small one-shot blobs through the BlobWriteQueue (headers, + // status messages, server-scope request snapshots). 1-RT + // UploadAsync per item. + // AsyncStreamingStore → large/streamed response bodies bypassing the queue. Owns + // a dedicated raw BlobWriter (from IBlobWriterFactory) whose + // write path is BlobClient.OpenWriteAsync (~4 MiB transfer + // buffer, no full-payload buffering — safe for multi-GB). + services.AddSingleton(); + services.AddSingleton(); - var queuedWriter = new QueuedBlobWriter(underlyingWriter, queue, logger, useQueueForWrites: true); - - var programLogger = provider.GetRequiredService>(); - programLogger.LogInformation("[STARTUP] ✓ QueuedBlobWriter initialized (wrapping {UnderlyingType})", - underlyingWriter.GetType().Name); - - return queuedWriter; - }); + services.AddSingleton(); + services.AddSingleton(); + services.AddSingleton(); + services.AddHostedService(sp => sp.GetRequiredService()); - services.AddTransient(); if (asyncClasses.ContainsKey("IAsyncFeeder")) services.AddHostedService(sp => (AsyncFeeder)sp.GetRequiredService()); - - services.AddSingleton(); - - // Initialize RequestData static references once all async singletons are resolvable. - // This runs at first resolution time via a hosted-service initializer that fires before - // the proxy starts accepting traffic (Server/WorkerFactory come after this in registration order). - // services.AddHostedService(); } private static void RegisterEventHeaders(IServiceCollection services, ILogger startupLogger, ProxyConfig backendOptions) diff --git a/src/SimpleL7Proxy/Proxy/AsyncWorker.cs b/src/SimpleL7Proxy/Proxy/AsyncWorker.cs index aa0ce0b2..f53297a4 100644 --- a/src/SimpleL7Proxy/Proxy/AsyncWorker.cs +++ b/src/SimpleL7Proxy/Proxy/AsyncWorker.cs @@ -10,7 +10,7 @@ using Microsoft.Extensions.Logging; using SimpleL7Proxy; -using SimpleL7Proxy.Async.BlobStorage; +using SimpleL7Proxy.Async; using SimpleL7Proxy.Config; using SimpleL7Proxy.Events; using SimpleL7Proxy.DTO; @@ -18,8 +18,6 @@ using Shared.RequestAPI.Models; // using SimpleL7Proxy.BackupAPI; -using System.Data.Common; - namespace SimpleL7Proxy.Proxy { /// @@ -37,46 +35,58 @@ public class AsyncWorker : IAsyncDisposable private string _dataBlobUri { get; set; } = ""; private Stream? _hos { get; set; } = null!; private string _userId { get; set; } = ""; - private readonly IBlobWriter _blobWriter; - private readonly ILogger _logger; - private readonly IRequestDataBackupService _requestBackupService; - private readonly ProxyConfig _options; - // private readonly IBackupAPIService _backupAPIService; + private IRequestDataBackupService? _backupService; public bool ShouldReprocess { get; set; } = false; public string ErrorMessage { get; set; } = ""; string dataBlobName = ""; string headerBlobName = ""; private int AsyncTimeout; private readonly bool _generateSasTokens; + private static TemplateLoader _messages = null!; + private static IAsyncFileStore _fileStore = null!; + private static IAsyncStreamingStore _streamingStore = null!; + private static ILogger _logger = null!; + private static ProbeServer _probeServer = null!; + private static ProxyConfig _options = null!; + private static JsonSerializerOptions SerializeOptions=null!; - // Track raw QueuedBlobStream references so we can await their completion - // before sending "Completed" status to the client - private QueuedBlobStream? _dataQueuedStream; - private QueuedBlobStream? _headerQueuedStream; - private static readonly JsonSerializerOptions SerializeOptions = new() + /// + /// Initializes static dependencies. Call this once at application startup before creating instances. + /// Two stores are required: handles small one-shot blobs through + /// the BlobWriteQueue (headers, status); handles potentially- + /// gigabyte response bodies by streaming directly to storage (bypassing the queue). + /// + public static void Initialize(IAsyncFileStore fileStore, + IAsyncStreamingStore streamingStore, + ILogger logger, + TemplateLoader messages, + ProxyConfig options, + ProbeServer probeService) { - WriteIndented = true, - DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, - Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping // This prevents URL encoding of & characters + _fileStore = fileStore ?? throw new ArgumentNullException(nameof(fileStore)); + _streamingStore = streamingStore ?? throw new ArgumentNullException(nameof(streamingStore)); + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + _messages = messages ?? throw new ArgumentNullException(nameof(messages)); + _options = options ?? throw new ArgumentNullException(nameof(options)); + _probeServer = probeService ?? throw new ArgumentNullException(nameof(probeService)); - }; + SerializeOptions = new() + { + WriteIndented = true, + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, + Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping // This prevents URL encoding of & characters + }; + } /// /// Initializes a new instance of the class. /// /// The request data. - /// The blob writer instance. + /// The async request store instance. /// The logger instance. - public AsyncWorker(RequestData data, int AsyncTriggerTimeout, - IBlobWriter blobWriter, - ILogger logger, - IRequestDataBackupService requestBackupService, ProxyConfig backendOptions) + public AsyncWorker(RequestData data, int AsyncTriggerTimeout) { - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); _requestData = data ?? throw new ArgumentNullException(nameof(data)); - _blobWriter = blobWriter ?? throw new ArgumentNullException(nameof(blobWriter)); - _requestBackupService = requestBackupService ?? throw new ArgumentNullException(nameof(requestBackupService)); - _options = backendOptions ?? throw new ArgumentNullException(nameof(backendOptions)); // _backupAPIService = backupAPIService ?? throw new ArgumentNullException(nameof(backupAPIService)); _userId = data.profileUserId; AsyncTimeout = AsyncTriggerTimeout; @@ -95,13 +105,27 @@ public AsyncWorker(RequestData data, int AsyncTriggerTimeout, } + /// + /// Convenience constructor that pulls construction-time dependencies from a shared + /// . Preferred over the multi-arg ctor — callers + /// inject the singleton context once and forward it on each new AsyncWorker. + /// + public AsyncWorker(RequestData data, int AsyncTriggerTimeout, AsyncWorkerContext context) + : this(data, AsyncTriggerTimeout) + { + _messages = context.Messages; + _backupService = context.BackupService; + } + /// /// Initializes the blob client asynchronously. This must be called after construction. /// /// A task that represents the asynchronous initialization operation. public async Task InitializeAsync() { - var result = await _blobWriter.InitClientAsync(_userId, _requestData.BlobContainerName).ConfigureAwait(false); + // BlobWriter cache (BlobWriter._containerClients) is static, so initializing on either + // store warms the container client for the streaming store too. + var result = await _fileStore.InitializeClientAsync(_requestData.BlobContainerName).ConfigureAwait(false); if (!result) { ErrorMessage = "Failed to initialize BlobWriter for AsyncWorker."; @@ -142,8 +166,7 @@ public async Task PrepareResponseStreamsAsync(bool isBackground = false) SetBlobNames(isBackground); // Generate base blob URIs (OAuth will handle authentication - no SAS tokens) - _dataBlobUri = _blobWriter.GetBlobUri(_userId, dataBlobName); - _headerBlobUri = _blobWriter.GetBlobUri(_userId, headerBlobName); + (_dataBlobUri, _headerBlobUri) = _fileStore.GetBlobUriPair(_requestData.BlobContainerName, dataBlobName, headerBlobName); _logger.LogDebug("[AsyncWorker:{Guid}] Base blob URIs configured - OAuth authentication required", _requestData.Guid); @@ -193,8 +216,7 @@ public async Task InitializeForBackgroundCheck() SetBlobNames(isBackground: true); // Always use OAuth (consistent with StartAsync and PrepareResponseStreamsAsync) - _dataBlobUri = _blobWriter.GetBlobUri(_userId, dataBlobName); - _headerBlobUri = _blobWriter.GetBlobUri(_userId, headerBlobName); + (_dataBlobUri, _headerBlobUri) = _fileStore.GetBlobUriPair(_requestData.BlobContainerName, dataBlobName, headerBlobName); _logger.LogDebug("[AsyncWorker:{Guid}] Base blob URIs configured - OAuth authentication required", _requestData.Guid); @@ -227,37 +249,6 @@ private void SetBlobNames(bool isBackground = false) _requestData.Guid, dataBlobName, headerBlobName); } - /// - /// Creates the user data and header blobs for async response storage. - /// - /// Whether this is for background response (uses different blob naming). - /// A tuple containing the data stream and header stream. - private async Task<(Stream dataStream, Stream headerStream)> CreateUserBlobsAsync(bool isBackground = false) - { - dataBlobName = _requestData.Guid.ToString(); - if (isBackground) - { - dataBlobName += "-BackgroundResponse"; - } - headerBlobName = dataBlobName + "-Headers"; - - //_logger.LogInformation("[BLOB-TRACE] AsyncWorker.CreateUserBlobs | Action: CreateBlobs | Guid: {Guid} | UserId: {UserId} | DataBlob: {DataBlob} | HeaderBlob: {HeaderBlob} | IsBackground: {IsBackground}", - // _requestData.Guid, _userId, dataBlobName, headerBlobName, isBackground); - - // Create both blobs in parallel - var dataStreamTask = _blobWriter.CreateBlobAndGetOutputStreamAsync(_userId, dataBlobName); - var headerStreamTask = _blobWriter.CreateBlobAndGetOutputStreamAsync(_userId, headerBlobName); - - await Task.WhenAll(dataStreamTask, headerStreamTask).ConfigureAwait(false); - - var dataStream = await dataStreamTask; - var headerStream = await headerStreamTask; - - //_logger.LogInformation("[BLOB-TRACE] AsyncWorker.CreateUserBlobs | Action: CreateBlobs-Complete | Guid: {Guid}", _requestData.Guid); - - return (dataStream, headerStream); - } - /// /// Generates and configures SAS URIs or base blob URIs for the created blobs. /// Optionally adds headers to the HTTP response context. @@ -265,60 +256,63 @@ private void SetBlobNames(bool isBackground = false) /// Whether to add the URIs to the HTTP response headers. private async Task ConfigureBlobUrisAsync(bool addToResponseHeaders = false) { - if (_generateSasTokens) - { - try - { - _logger.LogDebug("[AsyncWorker:{Guid}] Generating SAS tokens for blobs", _requestData.Guid); - _dataBlobUri = await _blobWriter.GenerateSasTokenAsync(_userId, dataBlobName, TimeSpan.FromSeconds(_requestData.AsyncBlobAccessTimeoutSecs)); - _headerBlobUri = await _blobWriter.GenerateSasTokenAsync(_userId, headerBlobName, TimeSpan.FromSeconds(_requestData.AsyncBlobAccessTimeoutSecs)); - _logger.LogTrace("[AsyncWorker:{Guid}] SAS tokens generated successfully", _requestData.Guid); + // if (_generateSasTokens) + // { + // try + // { + // _logger.LogDebug("[AsyncWorker:{Guid}] Generating SAS tokens for blobs", _requestData.Guid); + // (_dataBlobUri, _headerBlobUri) = await _fileStore.GenerateSasTokenPairAsync(_requestData.BlobContainerName, dataBlobName, headerBlobName, TimeSpan.FromSeconds(_requestData.AsyncBlobAccessTimeoutSecs)).ConfigureAwait(false); + // _logger.LogTrace("[AsyncWorker:{Guid}] SAS tokens generated successfully", _requestData.Guid); - if (addToResponseHeaders && _requestData.Context != null) - { - _requestData.Context.Response.Headers.Add("x-Data-Blob-SAS-URI", _dataBlobUri); - _requestData.Context.Response.Headers.Add("x-Header-Blob-SAS-URI", _headerBlobUri); - } - } - catch (Exception sasEx) - { - _logger.LogError(sasEx, "[AsyncWorker:{Guid}] Failed to create SAS token", _requestData.Guid); - ErrorMessage = "Failed to create SAS token: " + sasEx.Message; - throw; - } - } - else - { - _logger.LogDebug("[AsyncWorker:{Guid}] SAS token generation skipped - providing base blob URIs", _requestData.Guid); - _dataBlobUri = _blobWriter.GetBlobUri(_userId, dataBlobName); - _headerBlobUri = _blobWriter.GetBlobUri(_userId, headerBlobName); + // if (addToResponseHeaders && _requestData.Context != null) + // { + // _requestData.Context.Response.Headers.Add("x-Data-Blob-SAS-URI", _dataBlobUri); + // _requestData.Context.Response.Headers.Add("x-Header-Blob-SAS-URI", _headerBlobUri); + // } + // } + // catch (Exception sasEx) + // { + // _logger.LogError(sasEx, "[AsyncWorker:{Guid}] Failed to create SAS token", _requestData.Guid); + // ErrorMessage = "Failed to create SAS token: " + sasEx.Message; + // throw; + // } + // } + // else + // { + // _logger.LogDebug("[AsyncWorker:{Guid}] SAS token generation skipped - providing base blob URIs", _requestData.Guid); + (_dataBlobUri, _headerBlobUri) = _fileStore.GetBlobUriPair(_requestData.BlobContainerName, dataBlobName, headerBlobName); if (addToResponseHeaders && _requestData.Context != null) { _requestData.Context.Response.Headers.Add("x-Data-Blob-URI", _dataBlobUri); _requestData.Context.Response.Headers.Add("x-Header-Blob-URI", _headerBlobUri); } - } + //} } /// /// Gets or creates the data output stream lazily. Only creates the blob when first accessed. /// /// The output stream for writing response data. - public async Task GetOrCreateDataStreamAsync() + public async Task GetResponseDataStreamAsync() { if (_requestData.OutputStream == null) { - //_logger.LogInformation("[BLOB-TRACE] AsyncWorker.GetOrCreateDataStream | Action: LazyCreate | Guid: {Guid} | DataBlob: {DataBlob}", _requestData.Guid, dataBlobName); + //_logger.LogInformation("[BLOB-TRACE] AsyncWorker.GetResponseDataStream | Action: LazyCreate | Guid: {Guid} | DataBlob: {DataBlob}", _requestData.Guid, dataBlobName); try { - var dataStream = await _blobWriter.CreateBlobAndGetOutputStreamAsync(_userId, dataBlobName); - if (dataStream is QueuedBlobStream qbs) - _dataQueuedStream = qbs; - _requestData.OutputStream = new BufferedStream(dataStream); + // Data path is potentially gigabytes — go straight to BlobClient.OpenWriteAsync. + // The SDK transfer buffer (~4 MiB by default, tunable via AsyncStreamingBufferSizeBytes) + // caps memory regardless of total size. + // + // Intentionally do NOT pass the worker CTS token: cancelling an in-flight blob + // upload would leave a partial/empty blob and break correctness for the client + // that already received the 202 with this blob URI. Writes must run to completion + // even during shutdown. + _requestData.OutputStream = await _streamingStore.OpenWriteStreamAsync(_requestData.BlobContainerName, dataBlobName, CancellationToken.None); - //_logger.LogInformation("[BLOB-TRACE] AsyncWorker.GetOrCreateDataStream | Action: Created | Guid: {Guid}", _requestData.Guid); + //_logger.LogInformation("[BLOB-TRACE] AsyncWorker.GetResponseDataStream | Action: Created | Guid: {Guid}", _requestData.Guid); } catch (Exception ex) { @@ -351,7 +345,7 @@ public async Task StartAsync() //_logger.LogInformation($"AsyncWorker: Delayed for {AsyncTimeout} ms"); // Atomically set to running (1) only if not started (0) [ ITETCOBO: aboted or ACTIVE !! ] - if (Interlocked.CompareExchange(ref _beginStartup, 1, 0) == 0) + if ( _probeServer.BlobQueueDepth < _options.AsyncBlobMaxQueue && Interlocked.CompareExchange(ref _beginStartup, 1, 0) == 0) { _requestData.SBStatus = ServiceBusMessageStatusEnum.AsyncProcessing; @@ -373,23 +367,20 @@ public async Task StartAsync() { _requestData.RequestAPIStatus = RequestAPIStatusEnum.New; - _logger.LogTrace("[AsyncWorker:{Guid}] Calling InitializeAsync", _requestData.Guid); await InitializeAsync().ConfigureAwait(false); - _logger.LogTrace("[AsyncWorker:{Guid}] InitializeAsync completed", _requestData.Guid); operation = "Set Blob Names"; // Only set blob names, don't create blobs yet (lazy creation for better performance) SetBlobNames(isBackground: false); // Generate base blob URIs (OAuth will handle authentication - no SAS tokens) - _dataBlobUri = _blobWriter.GetBlobUri(_userId, dataBlobName); - _headerBlobUri = _blobWriter.GetBlobUri(_userId, headerBlobName); + (_dataBlobUri, _headerBlobUri) = _fileStore.GetBlobUriPair(_requestData.BlobContainerName, dataBlobName, headerBlobName); _logger.LogDebug("[AsyncWorker:{Guid}] Base blob URIs configured - OAuth authentication required", _requestData.Guid); operation = "Backup Request"; // Backup the request data - await UpdateBackup().ConfigureAwait(false); + await PersistRequestStateAsync().ConfigureAwait(false); } catch (Exception ex) @@ -403,6 +394,7 @@ public async Task StartAsync() Type = EventType.Exception, ["Error"] = ErrorMessage, ["Operation"] = operation, + ["StackTrace"] = ex.StackTrace ?? string.Empty, Exception = ex }; @@ -412,17 +404,16 @@ public async Task StartAsync() return; } - AsyncMessage Statusmessage = new() - { - Status = 202, - Message = "Your request has been accepted for async processing. The final result will be available at the blob URIs. Use OAuth for authentication.", - MID = _requestData.MID, - UserId = _requestData.UserID, - Guid = _requestData.Guid.ToString(), - DataBlobUri = _dataBlobUri, - HeaderBlobUri = _headerBlobUri, - Timestamp = DateTime.UtcNow - }; + AsyncMessage Statusmessage = _messages.GetMergedMessage( + AsyncResponseTypeEnum.Welcome, + _requestData.Guid.ToString(), + _requestData.MID, + _requestData.UserID, + _dataBlobUri, + _headerBlobUri); + + // Timestamp is always "now" — overwrite whatever the template carried. + Statusmessage.Timestamp = DateTime.UtcNow; try { @@ -438,7 +429,7 @@ public async Task StartAsync() // CRITICAL: Clear the OutputStream after sending 202 response // The client connection is now closed, so the original OutputStream is invalid. - // GetOrCreateDataStreamAsync() checks if OutputStream is null to decide whether + // GetResponseDataStreamAsync() checks if OutputStream is null to decide whether // to create a new blob stream. Without this, it would return the closed client // stream instead of creating a blob stream, causing data to be lost. _requestData.OutputStream = null; @@ -462,7 +453,8 @@ public async Task StartAsync() } else { - _logger.LogDebug("[AsyncWorker:{Guid}] Startup already in progress or completed", _requestData.Guid); + _requestData.runAsync = false; + _logger.LogDebug("[AsyncWorker:{Guid}] Startup already in progress or completed or blob queue depth exceeds maximum threshold", _requestData.Guid); // Worker has already started, do nothing } @@ -485,9 +477,9 @@ public async Task StartAsync() } - public Task UpdateBackup() + public Task PersistRequestStateAsync() { - return _requestBackupService.BackupAsync(_requestData); + return _backupService?.BackupAsync(_requestData) ?? Task.CompletedTask; } /// @@ -497,128 +489,75 @@ public Task UpdateBackup() /// public async Task WaitForBlobWritesAsync(CancellationToken cancellationToken = default) { - // Data stream - if (_dataQueuedStream != null) + // Data stream is a raw SDK OpenWriteAsync stream — staged blocks are committed on + // Dispose, so we must dispose here (before sending Completed) to ensure the blob is + // visible. Subsequent dispose attempts in cleanup paths are no-ops via the catch. + if (_requestData?.OutputStream != null) { - await _dataQueuedStream.WaitForPendingWritesAsync(cancellationToken).ConfigureAwait(false); + try + { + await _requestData.OutputStream.FlushAsync(cancellationToken).ConfigureAwait(false); + await _requestData.OutputStream.DisposeAsync().ConfigureAwait(false); + } + catch (ObjectDisposedException) { } + _requestData.OutputStream = null; } - // Header stream - if (_headerQueuedStream != null) - { - await _headerQueuedStream.WaitForPendingWritesAsync(cancellationToken).ConfigureAwait(false); - } + // Header stream goes through the BlobWriteQueue — wait for the enqueued operation + // to land in storage. + await _fileStore.CompleteWriteStreamAsync(_hos, cancellationToken).ConfigureAwait(false); } /// - /// Writes HTTP headers to the blob storage asynchronously with retry logic. + /// Writes HTTP headers to the blob storage. The underlying QueuedBlobStream buffers + /// the payload and enqueues it on FlushAsync; the BlobWriteQueue worker performs the + /// actual upload with SDK-level retry on transient failures. No local retry needed. /// /// The HTTP status code to write. /// The HTTP headers to write. - /// True if headers were successfully written; otherwise, false. - public async Task WriteHeaders(HttpStatusCode status, WebHeaderCollection headers) + /// True if the payload was successfully enqueued; otherwise, false. + public async Task SaveResponseHeadersAsync(HttpStatusCode status, WebHeaderCollection headers) { - const int MaxRetryAttempts = 5; - const int BaseRetryDelayMs = 500; - - for (int attempt = 0; attempt < MaxRetryAttempts; attempt++) + try { - try + if (_hos == null) { - // Create or recreate the stream if needed - if (_hos == null) - { - //_logger.LogInformation("[BLOB-TRACE] AsyncWorker.WriteHeaders | Action: RecreateStream | Guid: {Guid} | UserId: {UserId} | HeaderBlob: {HeaderBlob} | Attempt: {Attempt}/{MaxAttempts}", - // _requestData.Guid, _userId, headerBlobName, attempt + 1, MaxRetryAttempts); - - var stream = await _blobWriter.CreateBlobAndGetOutputStreamAsync(_userId, headerBlobName) - .ConfigureAwait(false); - - if (stream == null) - { - _logger.LogError("Failed to create header stream on attempt {Attempt}", attempt + 1); - await Task.Delay(GetBackoffDelay(attempt, BaseRetryDelayMs)).ConfigureAwait(false); - continue; - } - - _hos = stream; - if (stream is QueuedBlobStream qbs) - _headerQueuedStream = qbs; - //_logger.LogInformation("[BLOB-TRACE] AsyncWorker.WriteHeaders | Action: RecreateStream-Complete | Guid: {Guid}", _requestData.Guid); - } - - // Convert WebHeaderCollection to Dictionary for proper JSON serialization - var headersDictionary = new Dictionary(); - foreach (string headerName in headers.AllKeys) - { - headersDictionary[headerName] = headers[headerName] ?? ""; - } - - // Prepare the message - var headerMessage = new AsyncHeaders - { - Status = status.ToString(), - // Serialize headers to a string - Headers = headersDictionary, - UserId = _requestData.UserID, - MID = _requestData.MID, - Guid = _requestData.Guid.ToString(), - Timestamp = DateTime.UtcNow, - BlobUri = _dataBlobUri - }; - - // Serialize the message - byte[] serializedMessage = Encoding.UTF8.GetBytes( - JsonSerializer.Serialize(headerMessage, SerializeOptions) + "\n"); - - // Write to the stream - using (var bufferStream = new BufferedStream(_hos)) - { - await bufferStream.WriteAsync(serializedMessage).ConfigureAwait(false); - await bufferStream.FlushAsync().ConfigureAwait(false); - return true; - } + // Headers are small and one-shot — go through the queued/UploadAsync path. + _hos = await _fileStore.OpenWriteStreamAsync(_requestData.BlobContainerName, headerBlobName) + .ConfigureAwait(false); } - catch (OutOfMemoryException e) + + var headersDictionary = new Dictionary(headers.Count); + foreach (string headerName in headers.AllKeys) { - _logger.LogError(e, "[AsyncWorker:{Guid}] Out of memory while writing headers (attempt {Attempt}/{Max}) - Blob: {HeaderBlob}", - _requestData.Guid, attempt + 1, MaxRetryAttempts, headerBlobName); - GC.Collect(); - GC.WaitForPendingFinalizers(); - - // Exponential backoff for memory issues - await Task.Delay(GetBackoffDelay(attempt, BaseRetryDelayMs, true)).ConfigureAwait(false); - await ResetStreamAsync().ConfigureAwait(false); + headersDictionary[headerName] = headers[headerName] ?? ""; } - catch (IOException e) - { - _logger.LogError(e, "[AsyncWorker:{Guid}] IO error while writing headers (attempt {Attempt}/{Max}) - Blob: {HeaderBlob}", - _requestData.Guid, attempt + 1, MaxRetryAttempts, headerBlobName); - - if (e.InnerException is ObjectDisposedException) - { - _logger.LogWarning("[AsyncWorker:{Guid}] Stream was disposed, will recreate on next attempt - InnerException: {InnerException}", - _requestData.Guid, e.InnerException.Message); - } - await Task.Delay(GetBackoffDelay(attempt, BaseRetryDelayMs)).ConfigureAwait(false); - await ResetStreamAsync().ConfigureAwait(false); - } - catch (Exception ex) + var headerMessage = new AsyncHeaders { - _logger.LogError(ex, "[AsyncWorker:{Guid}] Failed to write headers (attempt {Attempt}/{Max}) - Blob: {HeaderBlob} - Type: {ExceptionType}", - _requestData.Guid, attempt + 1, MaxRetryAttempts, headerBlobName, ex.GetType().FullName); - _logger.LogDebug("[AsyncWorker:{Guid}] Exception stack trace: {StackTrace}", _requestData.Guid, ex.StackTrace); + Status = status.ToString(), + Headers = headersDictionary, + UserId = _requestData.UserID, + MID = _requestData.MID, + Guid = _requestData.Guid.ToString(), + Timestamp = DateTime.UtcNow, + BlobUri = _dataBlobUri + }; - // Don't retry for general exceptions - await ResetStreamAsync().ConfigureAwait(false); - return false; - } - } + byte[] serializedMessage = Encoding.UTF8.GetBytes( + JsonSerializer.Serialize(headerMessage, SerializeOptions) + "\n"); - _logger.LogError("[AsyncWorker:{Guid}] Failed to write headers after {MaxAttempts} retry attempts - Blob: {HeaderBlob}", - _requestData.Guid, MaxRetryAttempts, headerBlobName); - return false; + await _hos.WriteAsync(serializedMessage).ConfigureAwait(false); + await _hos.FlushAsync().ConfigureAwait(false); + return true; + } + catch (Exception ex) + { + _logger.LogError(ex, "[AsyncWorker:{Guid}] Failed to write headers - Blob: {HeaderBlob} - Type: {ExceptionType}", + _requestData.Guid, headerBlobName, ex.GetType().FullName); + await ResetStreamAsync().ConfigureAwait(false); + return false; + } } /// @@ -649,7 +588,6 @@ private async Task ResetStreamAsync() finally { _hos = null; - _headerQueuedStream = null; } } @@ -680,21 +618,6 @@ private async Task ResetStreamAsync() } } - /// - /// Calculates the appropriate backoff delay for retries. - /// - /// The current attempt number (0-based). - /// The base delay in milliseconds. - /// Whether to use exponential backoff instead of linear. - /// The delay time in milliseconds. - private static int GetBackoffDelay(int attempt, int baseDelayMs, bool useExponential = false) - { - return useExponential - ? (int)Math.Pow(2, attempt) * baseDelayMs - : baseDelayMs * (attempt + 1); - } - - /// /// Synchronizes with the worker's lifecycle by either terminating it before startup or waiting for completion. /// If the worker hasn't started yet, this method will cancel it. If it has already started, @@ -708,6 +631,7 @@ public async Task Synchronize() if (Interlocked.CompareExchange(ref _beginStartup, -1, 0) == 0) { _cancellationTokenSource?.Cancel(); + _requestData.runAsync = false; // Async Worker has not started, Terminate it return true; // Worker was not started, so we terminated it @@ -719,6 +643,7 @@ public async Task Synchronize() if (!_requestData.AsyncTriggered) { await DisposeAsync().ConfigureAwait(false); + _requestData.runAsync = false; return false; // Worker failed to start } @@ -758,7 +683,7 @@ public async Task AbortAsync() // _logger.LogError("Worker was started but no RequestAPIDocument was found to update."); // } - await UpdateBackup(); + await PersistRequestStateAsync(); await DisposeAsync().ConfigureAwait(false); } diff --git a/src/SimpleL7Proxy/Proxy/AsyncWorkerFactory.cs b/src/SimpleL7Proxy/Proxy/AsyncWorkerFactory.cs deleted file mode 100644 index 8017f7ce..00000000 --- a/src/SimpleL7Proxy/Proxy/AsyncWorkerFactory.cs +++ /dev/null @@ -1,68 +0,0 @@ -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; - -using SimpleL7Proxy.Async.BlobStorage; -using SimpleL7Proxy.DTO; -using SimpleL7Proxy.Config; -using SimpleL7Proxy.Async.BackupAPI; - -namespace SimpleL7Proxy.Proxy -{ - public class AsyncWorkerFactory : IAsyncWorkerFactory - { - private readonly IBlobWriter _blobWriter; - private readonly ILogger _logger; - private readonly IRequestDataBackupService _requestBackupService; - private readonly IBackupAPIService _backupAPIService; - - private readonly ProxyConfig _backendOptions; - private readonly SemaphoreSlim _initLock = new(1, 1); - private bool _initialized; - - public AsyncWorkerFactory(IBlobWriter blobWriter, - ILogger logger, - IRequestDataBackupService requestBackupService, - IOptions backendOptions, - IBackupAPIService backupAPIService) - { - _blobWriter = blobWriter; - _logger = logger; - _requestBackupService = requestBackupService; - _backendOptions = backendOptions.Value; - _backupAPIService = backupAPIService; - } - - private async Task EnsureInitializedAsync() - { - if (_initialized) return; - - await _initLock.WaitAsync().ConfigureAwait(false); - try - { - if (_initialized) return; - await _blobWriter.InitClientAsync(Constants.Server, Constants.Server).ConfigureAwait(false); - _initialized = true; - } - catch (BlobWriterException ex) - { - _backendOptions.AsyncModeEnabled = false; - _logger.LogError(ex, "Failed to initialize BlobWriter in AsyncWorkerFactory, disabling Async mode"); - } - finally - { - _initLock.Release(); - } - } - - public async Task CreateAsync(RequestData requestData, int AsyncTriggerTimeout) - { - // Ensure blob client is initialized (lazy, thread-safe, one-time) - await EnsureInitializedAsync().ConfigureAwait(false); - - _logger.LogDebug("[AsyncWorkerFactory] Creating AsyncWorker for request {Guid} with timeout {Timeout}s", - requestData.Guid, AsyncTriggerTimeout); - - return new AsyncWorker(requestData, AsyncTriggerTimeout, _blobWriter, _logger, _requestBackupService, _backendOptions); - } - } -} \ No newline at end of file diff --git a/src/SimpleL7Proxy/Proxy/IAsyncWorkerFactory.cs b/src/SimpleL7Proxy/Proxy/IAsyncWorkerFactory.cs deleted file mode 100644 index 5b402d06..00000000 --- a/src/SimpleL7Proxy/Proxy/IAsyncWorkerFactory.cs +++ /dev/null @@ -1,5 +0,0 @@ - namespace SimpleL7Proxy.Proxy; - public interface IAsyncWorkerFactory - { - Task CreateAsync(RequestData requestData, int AsyncTriggerTimeout); - } \ No newline at end of file diff --git a/src/SimpleL7Proxy/Proxy/NullAsyncWorkerFactory.cs b/src/SimpleL7Proxy/Proxy/NullAsyncWorkerFactory.cs deleted file mode 100644 index 491d4e80..00000000 --- a/src/SimpleL7Proxy/Proxy/NullAsyncWorkerFactory.cs +++ /dev/null @@ -1,10 +0,0 @@ -namespace SimpleL7Proxy.Proxy; - -public class NullAsyncWorkerFactory: IAsyncWorkerFactory -{ - public Task CreateAsync(RequestData requestData, int AsyncTriggerTimeout) - { - //NOP - return Task.FromResult(null!); - } -} \ No newline at end of file diff --git a/src/SimpleL7Proxy/Proxy/ProxyWorker.cs b/src/SimpleL7Proxy/Proxy/ProxyWorker.cs index 474d9fae..7b8df4a4 100644 --- a/src/SimpleL7Proxy/Proxy/ProxyWorker.cs +++ b/src/SimpleL7Proxy/Proxy/ProxyWorker.cs @@ -154,7 +154,7 @@ public Task OnConfigChangedAsync( /// │ │ └─ WriteResponseAsync() ──► StreamResponseAsync() │ │ /// │ │ │ │ /// │ │ 7. FINALIZE │ │ - /// │ │ └─ FinalizeStatus() + asyncWorker?.UpdateBackup() │ │ + /// │ │ └─ FinalizeStatus() + asyncWorker?.PersistRequestStateAsync() │ │ /// │ └───────────────────────────────────────────────────────────────────────────┘ │ /// │ │ │ /// │ EXCEPTION HANDLERS: │ │ @@ -373,7 +373,10 @@ public async Task TaskRunnerAsync() } _lifecycleManager.FinalizeStatus(incomingRequest, isSuccessfulResponse); - incomingRequest.asyncWorker?.UpdateBackup(); + if (incomingRequest.asyncWorker != null) + { + await incomingRequest.asyncWorker.PersistRequestStateAsync().ConfigureAwait(false); + } } // Background check requests skip ShouldFinalize but still need // Completed status after blob writes confirm @@ -383,7 +386,7 @@ public async Task TaskRunnerAsync() { await incomingRequest.asyncWorker.WaitForBlobWritesAsync().ConfigureAwait(false); _lifecycleManager.FinalizeBackgroundCheckStatus(incomingRequest); - incomingRequest.asyncWorker?.UpdateBackup(); + await incomingRequest.asyncWorker.PersistRequestStateAsync().ConfigureAwait(false); } } @@ -576,7 +579,7 @@ public async Task TaskRunnerAsync() HealthCheckService.DecrementActiveWorkers(_id); - _logger.LogInformation("[SHUTDOWN] ✓ Worker {IdStr} stopped", _idStr); + //_logger.LogInformation("[SHUTDOWN] ⏹ Worker {IdStr} stopped", _idStr); } @@ -924,6 +927,29 @@ public async Task ProxyToBackEndAsync(RequestData request) // Read the body stream once and reuse it byte[] bodyBytes = await request.CacheBodyAsync().ConfigureAwait(false); + if (request.runAsync && + !request.AsyncTriggered && + !request.Requeued && + request.BackendAttempts == 1) + { + requestState = "Persist Request Before Send"; + + // Persist the request as soon as the body has been materialized so + // rehydration still has the original payload if the process stops + // after the first backend send but before the async trigger fires. + var preSendAsyncWorker = request.asyncWorker; + if (preSendAsyncWorker == null) + { + var timeLeft = _options.AsyncTriggerTimeout - (int)(DateTime.UtcNow - request.EnqueueTime).TotalMilliseconds; + timeLeft = Math.Max(1, timeLeft); + preSendAsyncWorker = new AsyncWorker(request, timeLeft, _wrkCntxt.AsyncWorkerContext!); + request.asyncWorker = preSendAsyncWorker; + _ = preSendAsyncWorker.StartAsync(); + } + + await preSendAsyncWorker.PersistRequestStateAsync().ConfigureAwait(false); + } + requestState = "Create Backend Request"; using (ByteArrayContent bodyContent = new(bodyBytes)) @@ -1164,7 +1190,7 @@ public async Task ProxyToBackEndAsync(RequestData request) catch (TaskCanceledException) when (_isEvictingAsyncRequest) { TriggerHostCB = false; - _logger.LogWarning("[Worker:{Id}] Request {Guid} was intentionally expelled to prioritize a new async request.", _id, request.Guid); + _logger.LogWarning("[Worker:{Id}] Request {Guid} was intentionally expelled.", _id, request.Guid); // Handle async expel case - request being evicted from memory if (request.asyncWorker != null) { @@ -1244,7 +1270,7 @@ public async Task ProxyToBackEndAsync(RequestData request) hostIterator?.RecordResult(host, SuccessfulRequest); // Track host status for circuit breaker - if (intCode != 412 && intCode != 429) + if (intCode != 412 && intCode != 429 && !_isEvictingAsyncRequest) host.Config.TrackStatus(intCode, TriggerHostCB, "Attempt-" + request.BackendAttempts); if (!SuccessfulRequest) @@ -1404,7 +1430,7 @@ private async Task CaptureResponseStream(HttpResponseMessage proxyResponse, Requ if (request.AsyncTriggered && !request.IsBackgroundCheck) { _logger.LogDebug("[GetProxyResponseAsync:{Guid}] Writing headers to AsyncWorker blob", request.Guid); - if (!await request.asyncWorker!.WriteHeaders(proxyResponse.StatusCode, pr.Headers)) + if (!await request.asyncWorker!.SaveResponseHeadersAsync(proxyResponse.StatusCode, pr.Headers)) { throw new ProxyErrorException(ProxyErrorException.ErrorType.AsyncWorkerError, HttpStatusCode.InternalServerError, "Failed to write headers to async worker"); @@ -1606,12 +1632,12 @@ private async Task WriteExhaustedHostsErrorAsync(RequestData request, HttpStatus ["Attempts"] = request.BackendAttempts.ToString() }; - await request.asyncWorker.WriteHeaders(statusCode, errorHeaders); + await request.asyncWorker.SaveResponseHeadersAsync(statusCode, errorHeaders); var errorBytes = Encoding.UTF8.GetBytes(errorBody); if (request.IsBackgroundCheck) { - var outputStream = await request.asyncWorker.GetOrCreateDataStreamAsync(); + var outputStream = await request.asyncWorker.GetResponseDataStreamAsync(); await outputStream.WriteAsync(errorBytes).ConfigureAwait(false); await outputStream.FlushAsync().ConfigureAwait(false); } @@ -1749,7 +1775,7 @@ private async Task StreamResponseAsync(RequestData request, ProxyData pr) { destinationType = "async blob"; needsFlush = true; // QueuedBlobStream requires FlushAsync to enqueue data - destination = await request.asyncWorker.GetOrCreateDataStreamAsync().ConfigureAwait(false); + destination = await request.asyncWorker.GetResponseDataStreamAsync().ConfigureAwait(false); } else if (request.OutputStream != null) { @@ -1841,12 +1867,12 @@ private async Task HandleBackgroundCheckResultAsync( ProxyHelperUtils.CopyResponseHeaders(proxyResponse, pr); if (pr.Headers != null && request.asyncWorker != null) { - await request.asyncWorker.WriteHeaders(proxyResponse.StatusCode!, pr.Headers); + await request.asyncWorker.SaveResponseHeadersAsync(proxyResponse.StatusCode!, pr.Headers); } if (request.asyncWorker != null) { - var outputStream = await request.asyncWorker.GetOrCreateDataStreamAsync(); + var outputStream = await request.asyncWorker.GetResponseDataStreamAsync(); memoryBuffer.Position = 0; await memoryBuffer.CopyToAsync(outputStream).ConfigureAwait(false); await outputStream.FlushAsync().ConfigureAwait(false); @@ -1871,7 +1897,7 @@ private async Task HandleBackgroundCheckResultAsync( { var timeLeft = _options.AsyncTriggerTimeout - (int)(DateTime.UtcNow - request.EnqueueTime).TotalMilliseconds; timeLeft = Math.Max(1, timeLeft); - request.asyncWorker = await _wrkCntxt.AsyncWorkerFactory.CreateAsync(request, timeLeft).ConfigureAwait(false); + request.asyncWorker = new AsyncWorker(request, timeLeft, _wrkCntxt.AsyncWorkerContext!); _ = request.asyncWorker.StartAsync(); } diff --git a/src/SimpleL7Proxy/Proxy/RequestLifecycleManager.cs b/src/SimpleL7Proxy/Proxy/RequestLifecycleManager.cs index fcc7f0a7..dd494919 100644 --- a/src/SimpleL7Proxy/Proxy/RequestLifecycleManager.cs +++ b/src/SimpleL7Proxy/Proxy/RequestLifecycleManager.cs @@ -219,7 +219,7 @@ public async Task HandleBackgroundRequestLifecycle(RequestData request, SimpleL7 // Mark as BackgroundProcessing to trigger periodic polling _logger.LogDebug("Updating async worker for background GUID: {Guid}, BackgroundRequestId: {BackgroundRequestId}", request.Guid, processor.BackgroundRequestId); - await request.asyncWorker.UpdateBackup().ConfigureAwait(false); + await request.asyncWorker.PersistRequestStateAsync().ConfigureAwait(false); request.BackgroundRequestCompleted = false; } else if (processor.BackgroundCompleted) diff --git a/src/SimpleL7Proxy/Proxy/WorkerContext.cs b/src/SimpleL7Proxy/Proxy/WorkerContext.cs index 1987155d..775d0b77 100644 --- a/src/SimpleL7Proxy/Proxy/WorkerContext.cs +++ b/src/SimpleL7Proxy/Proxy/WorkerContext.cs @@ -1,5 +1,6 @@ namespace SimpleL7Proxy.Proxy; +using SimpleL7Proxy.Async; using SimpleL7Proxy.User; using SimpleL7Proxy.Events; using SimpleL7Proxy.Queue; @@ -17,7 +18,7 @@ public class WorkerContext public ConfigChangeNotifier ConfigChangeNotifier { get; } public EventDataBuilder EventDataBuilder { get; } public HealthCheckService HealthCheckService { get; } - public IAsyncWorkerFactory AsyncWorkerFactory { get; } + public AsyncWorkerContext? AsyncWorkerContext { get; } public IEndpointMonitorService Backends { get; } public IConcurrentPriQueue Queue { get; } public IEventClient EventClient { get; } @@ -38,12 +39,12 @@ public WorkerContext( IRequeueWorker requeueWorker, IEventClient eventClient, ILogger logger, - IAsyncWorkerFactory asyncWorkerFactory, StreamProcessorFactory streamProcessorFactory, RequestLifecycleManager lifecycleManager, EventDataBuilder eventDataBuilder, HealthCheckService healthCheckService, ConfigChangeNotifier configChangeNotifier, + AsyncWorkerContext? asyncWorkerContext = null, ISharedIteratorRegistry? sharedIteratorRegistry = null) { @@ -55,7 +56,6 @@ public WorkerContext( ArgumentNullException.ThrowIfNull(requeueWorker); ArgumentNullException.ThrowIfNull(eventClient); ArgumentNullException.ThrowIfNull(logger); - ArgumentNullException.ThrowIfNull(asyncWorkerFactory); ArgumentNullException.ThrowIfNull(streamProcessorFactory); ArgumentNullException.ThrowIfNull(lifecycleManager); ArgumentNullException.ThrowIfNull(eventDataBuilder); @@ -71,7 +71,7 @@ public WorkerContext( RequeueWorker = requeueWorker; EventClient = eventClient; Logger = logger; - AsyncWorkerFactory = asyncWorkerFactory; + AsyncWorkerContext = asyncWorkerContext; StreamProcessorFactory = streamProcessorFactory; LifecycleManager = lifecycleManager; EventDataBuilder = eventDataBuilder; diff --git a/src/SimpleL7Proxy/Proxy/WorkerFactory.cs b/src/SimpleL7Proxy/Proxy/WorkerFactory.cs index 2f2b60fc..5bdbdc6a 100644 --- a/src/SimpleL7Proxy/Proxy/WorkerFactory.cs +++ b/src/SimpleL7Proxy/Proxy/WorkerFactory.cs @@ -63,13 +63,30 @@ protected override async Task ExecuteAsync(CancellationToken cancellationToken) _workers.Add(new(wrkrNum, workerPriority, _context, _internalCancellationTokenSource.Token)); } + _logger.LogInformation($"[WORKER] ✓ Total: {_workers.Count} | Priority distribution: {string.Join(",", workerPriorities)}"); foreach (var pw in _workers) _tasks.Add(Task.Run(() => pw.TaskRunnerAsync(), cancellationToken)); - await Task.WhenAll(_tasks).ConfigureAwait(false); - - _logger.LogInformation($"[WORKER] ✓ Total: {_workers.Count} | Priority distribution: {string.Join(",", workerPriorities)}"); - + await _shutdownSemaphore.WaitAsync(cancellationToken).ConfigureAwait(false); + + // Wait for all workers to complete with periodic logging + var allTasksCompletion = Task.WhenAll(_tasks); + var logInterval = TimeSpan.FromSeconds(.5); + + while (!allTasksCompletion.IsCompleted) + { + var completedTask = await Task.WhenAny(allTasksCompletion, Task.Delay(logInterval, cancellationToken)) + .ConfigureAwait(false); + + if (completedTask != allTasksCompletion) + { + _logger.LogInformation("[WORKER] ⏳ Waiting for {count} workers to complete...", _tasks.Count(t => !t.IsCompleted)); + } + } + + await allTasksCompletion.ConfigureAwait(false); + + _logger.LogInformation("[WORKER] ✓ All {count} workers have completed.", _tasks.Count); return; } @@ -81,9 +98,12 @@ public static void ExpelAsyncRequests() } } + private static readonly SemaphoreSlim _shutdownSemaphore = new(0,1); public static void RequestWorkerShutdown() { _internalCancellationTokenSource.Cancel(); + _shutdownSemaphore.Release(); + } public static List GetAllTasks() diff --git a/src/SimpleL7Proxy/User/UserProfile.cs b/src/SimpleL7Proxy/User/UserProfile.cs index eda32ae8..e05042c2 100644 --- a/src/SimpleL7Proxy/User/UserProfile.cs +++ b/src/SimpleL7Proxy/User/UserProfile.cs @@ -147,7 +147,7 @@ protected override async Task ExecuteAsync(CancellationToken stoppingToken) _cancellationTokenSource = CancellationTokenSource.CreateLinkedTokenSource(stoppingToken); stoppingToken.Register(() => { - _logger.LogInformation("[SHUTDOWN] ⏹ User Profile Reader stopping"); + _logger.LogInformation("[SHUTDOWN] ⏹ User Profile Reader stopping"); }); // // Initialize User Profiles @@ -258,7 +258,7 @@ protected override async Task ExecuteAsync(CancellationToken stoppingToken) else await _ErrorTimer.WaitForNextTickAsync(stoppingToken).ConfigureAwait(false); } - catch (TaskCanceledException) when (stoppingToken.IsCancellationRequested) + catch (OperationCanceledException) when (stoppingToken.IsCancellationRequested || _cancellationTokenSource.IsCancellationRequested) { break; } diff --git a/src/SimpleL7Proxy/c.txt b/src/SimpleL7Proxy/c.txt deleted file mode 100644 index edc9115c..00000000 --- a/src/SimpleL7Proxy/c.txt +++ /dev/null @@ -1,551 +0,0 @@ -Experiment: #__ 1 _ -Config Change: __ log to file instead of eventhub _ -Time since change: 30 min - -2 MINS AFTER STARTUP: - Total Managed Memory: 12.88 MB - Working Set: 126.33 MB - Private Memory: 199.30 MB - Heap Size: 0.00 MB - Fragmented: 0.00 MB - Gen0 Collections: 0 - Gen1 Collections: 0 - Gen2 Collections: 0 - High Memory Load: 0.00 MB - -BEFORE GC: - Total Managed Memory: 10.71 MB - Working Set: 164.34 MB - Private Memory: 233.82 MB - Heap Size: 4.47 MB - Fragmented: 1.24 MB - Gen0 Collections: 4 - Gen1 Collections: 2 - Gen2 Collections: 2 - High Memory Load: 92.16 MB - -AFTER GC (forcegc + 2min wait): - Total Managed Memory: 7.11 MB - Working Set: 143.39 MB - Private Memory: 210.88 MB - Heap Size: 3.69 MB - Fragmented: 0.74 MB - Gen0 Collections: 6 - Gen1 Collections: 4 - Gen2 Collections: 4 - High Memory Load: 71.68 MB - -Delta from baseline (Private): ___ MB - -==== OBSERVATIONS ==== -- First experiment; establishes file-log baseline: 199 MB startup, 211 MB post-GC -- GC reclaimed ~23 MB private (234→211), indicating significant managed/finalizable churn at idle -- Managed memory dropped only 3.6 MB (10.7→7.1) yet private dropped 23 MB — most reclaimable memory is native -- Gen counts 6/4/4 after forced GC — moderate collection activity for 30 min idle -====================== - -------------------------- - -Experiment: #__ 2 _ -Config Change: __ turn off app insights, re-enable eventhub _ -Time since change: 30 min - -2 MINS AFTER STARTUP: - Total Managed Memory: 8.82 MB - Working Set: 113.16 MB - Private Memory: 177.08 MB - Heap Size: 0.00 MB - Fragmented: 0.00 MB - Gen0 Collections: 0 - Gen1 Collections: 0 - Gen2 Collections: 0 - High Memory Load: 0.00 MB - -BEFORE GC: - Total Managed Memory: 7.06 MB - Working Set: 148.66 MB - Private Memory: 215.71 MB - Heap Size: 3.78 MB - Fragmented: 1.74 MB - Gen0 Collections: 4 - Gen1 Collections: 2 - Gen2 Collections: 2 - High Memory Load: 81.92 MB - -AFTER GC (forcegc + 5min wait): - Total Managed Memory: 11.86 MB - Working Set: 133.17 MB - Private Memory: 199.29 MB - Heap Size: 2.42 MB - Fragmented: 0.73 MB - Gen0 Collections: 6 - Gen1 Collections: 4 - Gen2 Collections: 4 - High Memory Load: 61.44 MB - -Delta from baseline (Private): ___ MB - -==== OBSERVATIONS ==== -- Startup dropped 22 MB vs Exp 1 (199→177) — removing App Insights is a large savings -- Post-GC dropped 12 MB (211→199) — AI holds ~12 MB of non-reclaimable native state -- Pre-GC growth similar (~39 MB vs ~35 MB from startup) — underlying churn rate unchanged without AI -- EventHub alone (still enabled) adds only ~2-3 MB vs the fully-stripped floor seen later -====================== - -------------------------- - -Experiment: #__ 3 _ -Config Change: __re-enable app insights, set poller to 300 seconds _ -Time since change: 30 min - -2 MINS AFTER STARTUP: - Total Managed Memory: 13.32 MB - Working Set: 135.14 MB - Private Memory: 199.98 MB - Heap Size: 0.00 MB - Fragmented: 0.00 MB - Gen0 Collections: 0 - Gen1 Collections: 0 - Gen2 Collections: 0 - High Memory Load: 0.00 MB - -BEFORE GC: - Total Managed Memory: 21.79 MB - Working Set: 166.27 MB - Private Memory: 232.50 MB - Heap Size: 4.83 MB - Fragmented: 1.61 MB - Gen0 Collections: 4 - Gen1 Collections: 2 - Gen2 Collections: 2 - High Memory Load: 92.16 MB - -AFTER GC (forcegc + 2min wait): - Total Managed Memory: 10.38 MB - Working Set: 147.56 MB - Private Memory: 213.15 MB - Heap Size: 3.79 MB - Fragmented: 0.74 MB - Gen0 Collections: 6 - Gen1 Collections: 4 - Gen2 Collections: 4 - High Memory Load: 71.68 MB - -Delta from baseline (Private): ___ MB - -==== OBSERVATIONS ==== -- Startup identical to Exp 1 (199→200 MB) — AI and EH both re-enabled, as expected -- Post-GC nearly identical to Exp 1 (211→213) — slowing poller from 15s→300s made no difference -- Managed before-GC highest yet (21.79 MB) — something is accumulating managed objects aggressively -- Poller frequency doesn't matter; cost appears to be connection pool maintenance, not per-request -====================== - -------------------------- - -Experiment: #__ 4 _ -Config Change: __ reset poller, disble sidecar health probe _ -Time since change: 30 min - -2 MINS AFTER STARTUP: - Total Managed Memory: 12.88 MB - Working Set: 126.33 MB - Private Memory: 199.30 MB - Heap Size: 0.00 MB - Fragmented: 0.00 MB - Gen0 Collections: 0 - Gen1 Collections: 0 - Gen2 Collections: 0 - High Memory Load: 0.00 MB - -BEFORE GC: - Total Managed Memory: 18.03 MB - Working Set: 173.69 MB - Private Memory: 235.98 MB - Heap Size: 4.93 MB - Fragmented: 1.59 MB - Gen0 Collections: 4 - Gen1 Collections: 2 - Gen2 Collections: 2 - High Memory Load: 102.40 MB - -AFTER GC (forcegc + 2min wait): - Managed: ___ MB - Working: ___ MB - Private: ___ MB - Heap: ___ MB - -Delta from baseline (Private): ___ MB - -==== OBSERVATIONS ==== -- Startup identical to Exp 1 (199 MB) — removing sidecar has no startup cost -- Before-GC rose to 236 MB — highest seen; disabling sidecar didn't reduce pressure -- Post-GC not captured, limiting comparison value -- Surprising: disabling sidecar didn't help — the problem may be in disposal, not connection count -====================== - -------------------------- - -Experiment: #__ 5 _ -Config Change: __disable eventhub, appinsights and sidecar _ -Time since change: 30 min - -2 MINS AFTER STARTUP: - Total Managed Memory: 8.39 MB - Working Set: 104.73 MB - Private Memory: 172.09 MB - Heap Size: 0.00 MB - Fragmented: 0.00 MB - Gen0 Collections: 0 - Gen1 Collections: 0 - Gen2 Collections: 0 - High Memory Load: 0.00 MB - -BEFORE GC: ( at 1:31 ) - Total Managed Memory: 5.60 MB - Working Set: 136.09 MB - Private Memory: 215.32 MB - Heap Size: 4.10 MB - Fragmented: 1.73 MB - Gen0 Collections: 3 - Gen1 Collections: 2 - Gen2 Collections: 2 - High Memory Load: 71.68 MB - -AFTER GC (forcegc + 6min wait): - Total Managed Memory: 9.78 MB - Working Set: 121.18 MB - Private Memory: 196.57 MB - Heap Size: 2.28 MB - Fragmented: 0.72 MB - Gen0 Collections: 5 - Gen1 Collections: 4 - Gen2 Collections: 4 - High Memory Load: 51.20 MB - -Delta from baseline (Private): ___ MB - -==== OBSERVATIONS ==== -- Lowest startup yet (172 MB) — stripping EH, AI, and sidecar removes ~27 MB from Exp 1 baseline -- Vs Exp 2 (AI-only removed, 177 MB), another 5 MB saved — sidecar + EH native cost combined -- Growth is WORST at +24.5 MB — paradox: fewer services yet more growth? -- Possible explanation: AI/EH background activity was triggering GC in prior exps, masking accumulation -- Only the poller (15s) and idle workers remain — they alone drive significant native growth -====================== - -------------------------- - -Experiment: #__ 6 _ -Config Change: __re-enable sidecar with 30s updates, re-enable app insights, re-enable poller to 15 s, re-enable eventhub _ -Time since change: 30 min - -2 MINS AFTER STARTUP: - Total Managed Memory: 13.31 MB - Working Set: 133.81 MB - Private Memory: 201.43 MB - Heap Size: 0.00 MB - Fragmented: 0.00 MB - Gen0 Collections: 0 - Gen1 Collections: 0 - Gen2 Collections: 0 - High Memory Load: 0.00 MB - -BEFORE GC: - Total Managed Memory: 6.78 MB - Working Set: 175.99 MB - Private Memory: 235.43 MB - Heap Size: 4.57 MB - Fragmented: 1.37 MB - Gen0 Collections: 4 - Gen1 Collections: 2 - Gen2 Collections: 2 - High Memory Load: 102.40 MB - -AFTER GC (forcegc + 3min wait): - Total Managed Memory: 9.25 MB - Working Set: 154.68 MB - Private Memory: 214.25 MB - Heap Size: 3.85 MB - Fragmented: 0.75 MB - Gen0 Collections: 6 - Gen1 Collections: 4 - Gen2 Collections: 4 - High Memory Load: 81.92 MB - -Delta from baseline (Private): ___ MB - -==== OBSERVATIONS ==== -- Startup rose 29 MB vs Exp 5 (172→201) — re-enabling all services (AI ~22, EH ~3, sidecar ~4) -- Post-GC rose 18 MB (197→214) — services contribute ~18 MB of non-reclaimable native state -- Growth LOWER than Exp 5 (+12.8 vs +24.5) — confirms AI's background flushing triggers GC, masking growth -- Establishes the "all-on" baseline: 201 startup, 214 post-GC, +12.8 MB growth in 30 min -====================== - -------------------------- - -Experiment: #__ 7 _ -Config Change: __disabple sidecar, no sidecar, poller=1hr, hosts=direct, eventhub-return-early _ -Time since change: 30 min - -2 MINS AFTER STARTUP: - Total Managed Memory: 8.02 MB - Working Set: 106.34 MB - Private Memory: 172.63 MB - Heap Size: 0.00 MB - Fragmented: 0.00 MB - Gen0 Collections: 0 - Gen1 Collections: 0 - Gen2 Collections: 0 - High Memory Load: 0.00 MB - -BEFORE GC: - Total Managed Memory: 23.94 MB - Working Set: 146.31 MB - Private Memory: 211.46 MB - Heap Size: 4.28 MB - Fragmented: 0.73 MB - Gen0 Collections: 2 - Gen1 Collections: 1 - Gen2 Collections: 0 - High Memory Load: 71.68 MB - -AFTER GC (forcegc + 2min wait): - Total Managed Memory: 5.59 MB - Working Set: 124.86 MB - Private Memory: 186.68 MB - Heap Size: 2.34 MB - Fragmented: 0.74 MB - Gen0 Collections: 4 - Gen1 Collections: 3 - Gen2 Collections: 2 - High Memory Load: 51.20 MB - -Delta from baseline (Private): ___ MB - -==== OBSERVATIONS ==== -- Startup identical to Exp 5 (172→173) — same minimal config, just poller pushed to 1hr -- Post-GC dropped 10 MB vs Exp 5 (197→187) — poller TLS connection pool costs ~10 MB native -- Managed before-GC spiked to 24 MB — highest managed reading in any experiment -- Something in the idle workers is allocating heavily; poller was barely active at 1hr interval -- Gen counts 2/1/0 before GC — very few collections, so managed objects just accumulated -====================== - -------------------------- - -Experiment: #__ 8 _ -Config Change: _ signalWorker => 4000 _ -Time since change: 30 min - -5 MINS AFTER STARTUP: - Total Managed Memory: 5.42 MB - Working Set: 103.98 MB - Private Memory: 171.95 MB - Heap Size: 0.00 MB - Fragmented: 0.00 MB - Gen0 Collections: 0 - Gen1 Collections: 0 - Gen2 Collections: 0 - High Memory Load: 0.00 MB - -BEFORE GC: - Total Managed Memory: 4.28 MB - Working Set: 129.33 MB - Private Memory: 201.37 MB - Heap Size: 5.04 MB - Fragmented: 3.37 MB - Gen0 Collections: 1 - Gen1 Collections: 1 - Gen2 Collections: 1 - High Memory Load: 61.44 MB - -AFTER GC (forcegc + 2min wait): - Total Managed Memory: 2.44 MB - Working Set: 120.99 MB - Private Memory: 186.34 MB - Heap Size: 2.35 MB - Fragmented: 0.74 MB - Gen0 Collections: 3 - Gen1 Collections: 3 - Gen2 Collections: 3 - High Memory Load: 51.20 MB - -Delta from baseline (Private): ___ MB - -==== OBSERVATIONS ==== -- Managed before-GC plummeted from 24 MB (Exp 7) to 4 MB — single variable change (40ms→4000ms) -- Confirms the 24 MB managed spike was entirely timer allocation churn from SemaphoreSlim.WaitAsync(TimeSpan) -- Pre-GC private dropped 10 MB (211→201) — fewer live managed objects = less native overhead -- Post-GC identical to Exp 7 (187→186) — timer objects were transient, not a permanent leak -- Gen counts dropped to 1/1/1 from 2/1/0 — dramatically less GC pressure -====================== - -Experiment: #__ 9 _ -Config Change: _ make signalworker without delay, added agressive GC to dockerfile _ -Time since change: 30 min - -3 MINS AFTER STARTUP: - Total Managed Memory: 5.65 MB - Working Set: 105.39 MB - Private Memory: 170.30 MB - Heap Size: 0.00 MB - Fragmented: 0.00 MB - Gen0 Collections: 0 - Gen1 Collections: 0 - Gen2 Collections: 0 - High Memory Load: 0.00 MB - -Before GC: - Total Managed Memory: 8.58 MB - Working Set: 127.01 MB - Private Memory: 192.44 MB - Heap Size: 5.09 MB - Fragmented: 3.62 MB - Gen0 Collections: 1 - Gen1 Collections: 1 - Gen2 Collections: 1 - High Memory Load: 51.20 MB - -AFTER GC (forcegc + 2min wait): - Total Managed Memory: 2.25 MB - Working Set: 118.36 MB - Private Memory: 181.79 MB - Heap Size: 2.30 MB - Fragmented: 0.74 MB - Gen0 Collections: 3 - Gen1 Collections: 3 - Gen2 Collections: 3 - High Memory Load: 51.20 MB - -==== OBSERVATIONS ==== -- Post-GC dropped 5 MB vs Exp 8 (186→182) — Workstation GC decommits pages more eagerly -- Pre-GC dropped 9 MB (201→192) — pure signal (no timeout) generates zero timer overhead -- Lowest post-GC floor yet (182 MB) — likely near the irreducible minimum for this app -- Two variables changed (signal + WS GC); combined savings ~9 MB post-GC vs Exp 7 -- Gen counts identical to Exp 8 (1/1/1) — no change in collection frequency -====================== - -Experiment: #__ 10 _ -Config Change: _ eventlogs, apim mode, poller=15s, appinsights, sidecar _ -Time since change: overnight - -2 MINS AFTER STARTUP: 17:24:37 - Total Managed Memory: 11.16 MB - Working Set: 129.34 MB - Private Memory: 198.27 MB - Heap Size: 0.00 MB - Fragmented: 0.00 MB - Gen0 Collections: 0 - Gen1 Collections: 0 - Gen2 Collections: 0 - High Memory Load: 0.00 MB - -Next morning: 8:53:23 AM - Total Managed Memory: 12.74 MB - Working Set: 171.14 MB - Private Memory: 235.27 MB - Heap Size: 4.56 MB - Fragmented: 1.73 MB - Gen0 Collections: 61 - Gen1 Collections: 61 - Gen2 Collections: 61 - High Memory Load: 102.40 MB - -==== OBSERVATIONS ==== -- Overnight growth: +37 MB in 15.5 hrs = ~2.4 MB/hr with all services on -- Signal fix (Exp 8/9) reduced short-term churn, but long-term growth persists -- GC 61:61:61 = every collection is full Gen2 — GCConserveMemory=9 working as intended -- Despite aggressive Gen2 collection, native memory still grew — remaining source is not managed -- Managed stayed flat (11→13 MB) — confirms the growth is purely native/unmanaged -====================== - -Experiment: #__ 11 _ -Config Change: _ apply using fixes on poller, updated poller to 1s _ -Time since change: 30 mins - -2 MINS AFTER STARTUP: - Total Managed Memory: 14.98 MB - Working Set: 135.44 MB - Private Memory: 202.44 MB - Heap Size: 0.00 MB - Fragmented: 0.00 MB - Gen0 Collections: 0 - Gen1 Collections: 0 - Gen2 Collections: 0 - High Memory Load: 0.00 MB - -Before GC: - Total Managed Memory: 10.89 MB - Working Set: 160.45 MB - Private Memory: 233.92 MB - Heap Size: 5.39 MB - Fragmented: 2.24 MB - Gen0 Collections: 5 - Gen1 Collections: 3 - Gen2 Collections: 2 - High Memory Load: 92.16 MB - -AFTER GC (forcegc + 2min wait): - Total Managed Memory: 9.56 MB - Working Set: 138.89 MB - Private Memory: 212.15 MB - Heap Size: 3.94 MB - Fragmented: 0.73 MB - Gen0 Collections: 7 - Gen1 Collections: 5 - Gen2 Collections: 4 - High Memory Load: 61.44 MB - -==== OBSERVATIONS ==== -- Despite 60x more polling (1s vs 15s), post-GC growth only +9.7 MB — LESS than Exp 6 baseline (+12.8) -- `using var request` fix is working — more requests yet less growth -- Before-GC pressure similar to Exp 6 (234 vs 235) — Task.Delay timers at 1s fill the gap left by fix -- Gen counts 7/5/4 vs 6/4/4 (Exp 6) — slightly more GC from higher allocation rate -- Remaining growth likely from Task.Delay creating ~1,800 TimerQueueTimer objects in 30 min -====================== - -Experiment: #__ 12 _ -Config Change: _ refactor poller timer to use interval _ -Time since change: 70 mins - -2 MINS AFTER STARTUP: - Total Managed Memory: 15.58 MB - Working Set: 134.52 MB - Private Memory: 203.48 MB - Heap Size: 0.00 MB - Fragmented: 0.00 MB - Gen0 Collections: 0 - Gen1 Collections: 0 - Gen2 Collections: 0 - High Memory Load: 0.00 MB - - -Before GC: - Total Managed Memory: 15.14 MB - Working Set: 165.99 MB - Private Memory: 240.44 MB - Heap Size: 5.61 MB - Fragmented: 0.86 MB - Gen0 Collections: 13 - Gen1 Collections: 6 - Gen2 Collections: 5 - High Memory Load: 92.16 MB - -AFTER GC (forcegc + 3min wait): - Total Managed Memory: 12.56 MB - Working Set: 148.33 MB - Private Memory: 221.37 MB - Heap Size: 4.03 MB - Fragmented: 0.73 MB - Gen0 Collections: 15 - Gen1 Collections: 8 - Gen2 Collections: 7 - High Memory Load: 71.68 MB - -==== OBSERVATIONS ==== -- 70-min run (vs 30 min standard) — longer observation window -- Startup comparable to Exp 11 (203 vs 202 MB) — PeriodicTimer has no startup cost difference -- Before-GC: 240 vs 234 (Exp 11) — higher, but 70 min vs 30 min; rate is ~0.53 MB/min vs 1.05 MB/min -- Post-GC: 221 vs 212 (Exp 11) — +18 MB growth vs +9.7 MB, but over 2.3x the duration -- Growth rate per minute: 0.26 MB/min vs 0.32 MB/min (Exp 11) — slight improvement -- Gen counts 15/8/7 vs 7/5/4 (Exp 11) — proportional to longer runtime -- PeriodicTimer eliminated per-tick timer allocations; residual growth may be Azure SDK pipeline state -====================== diff --git a/src/SimpleL7Proxy/server.cs b/src/SimpleL7Proxy/server.cs index 542fdea2..d4e7b575 100644 --- a/src/SimpleL7Proxy/server.cs +++ b/src/SimpleL7Proxy/server.cs @@ -181,7 +181,7 @@ public async Task StopListening(CancellationToken cancellationToken) { _isShuttingDown = true; _cancellationTokenSource?.Cancel(); - _logger.LogInformation("[SHUTDOWN] ⏹ Server stopped accepting new requests (probes still active)"); + _logger.LogInformation("[SHUTDOWN] ⏹ Server stopped accepting new requests (probes still active)"); } /// @@ -192,7 +192,7 @@ public async Task StopListening(CancellationToken cancellationToken) public async Task StopProbes(CancellationToken cancellationToken) { _probesCts?.Cancel(); - _logger.LogInformation("[SHUTDOWN] ⏹ Health probe serving stopped"); + _logger.LogInformation("[SHUTDOWN] ⏹ Health probe serving stopped"); // Wait for the Run() loop to actually exit if (ExecuteTask != null) @@ -200,6 +200,19 @@ public async Task StopProbes(CancellationToken cancellationToken) try { await ExecuteTask.ConfigureAwait(false); } catch (OperationCanceledException) { /* expected */ } } + + // Explicitly release the listener so the port is available on the next startup. + try + { + if (_httpListener.IsListening) + { + _httpListener.Stop(); + } + } + finally + { + _httpListener.Close(); + } } // public ConcurrentPriQueue Queue() { @@ -262,6 +275,8 @@ public async Task Run(CancellationToken cancellationToken) _probe.Type = EventType.Probe; + _logger.LogInformation("SERVER --- ASYNC MODE IS " + (doAsync ? "ENABLED" : "DISABLED") + " --- BlobWriter: " + _blobWriter.GetType().Name); + // Hoist TCS + cancellation registration outside the loop — the TCS stays // incomplete until cancellation fires, so one instance serves all iterations. var tcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously); @@ -568,9 +583,11 @@ public async Task Run(CancellationToken cancellationToken) // ASYNC: Determine if the request is allowed async operation if (doAsync && bool.TryParse(rd.Headers[_options.AsyncClientRequestHeader], out var asyncEnabled) && asyncEnabled) { + // Console.WriteLine($"[ASYNC] Request {rd.MID} has async header enabled, checking user profile for async config...------"); var clientInfo = _userProfile.GetAsyncParams(rd.profileUserId); if (clientInfo != null) { + // Console.WriteLine($"[ASYNC] Async config found for user {rd.profileUserId}: Container={clientInfo.ContainerName}, Topic={clientInfo.SBTopicName}, Timeout={clientInfo.AsyncBlobAccessTimeoutSecs}s, GenerateSAS={clientInfo.GenerateSasTokens} -----"); rd.runAsync = true; rd.AsyncBlobAccessTimeoutSecs = clientInfo.AsyncBlobAccessTimeoutSecs; rd.BlobContainerName = clientInfo.ContainerName; @@ -629,6 +646,12 @@ public async Task Run(CancellationToken cancellationToken) rd.CalculateExpiration(_options.DefaultTTLSecs, _options.TTLHeader); ed["DefaultTimeout"] = rd.defaultTimeout.ToString(); + // Publish Queued before handing the request to workers so it cannot race with Processing. + if (rd.runAsync) + { + rd.SBStatus = ServiceBusMessageStatusEnum.Queued; + } + // Enqueue the request if (!_requestsQueue.Enqueue(rd, priority, userPriorityBoost, rd.EnqueueTime)) { @@ -637,12 +660,11 @@ public async Task Run(CancellationToken cancellationToken) retrymsg = ed["Message"] = "Failed to enqueue request"; logmsg = "Failed to enqueue request => 429:"; - } - // ASYNC: If the request is allowed to run async, set the status - if (!notEnqued && doAsync) - { - rd.SBStatus = ServiceBusMessageStatusEnum.Queued; + if (rd.runAsync) + { + rd.SBStatus = ServiceBusMessageStatusEnum.Failed; + } } } diff --git a/src/SimpleL7Proxy/templates/notauthorized.json b/src/SimpleL7Proxy/templates/notauthorized.json new file mode 100644 index 00000000..237cb2f2 --- /dev/null +++ b/src/SimpleL7Proxy/templates/notauthorized.json @@ -0,0 +1,9 @@ +{ + "Message": "Sorry, you do not have authorization to access this resource.", + "UserId": "%USERID%", + "MID": "%MID%", + "Guid": "%GUID%", + "Status": 425, + "DataBlobUri": "%BLOBURI%/%BLOBCONTAINER%/%GUID%", + "HeaderBlobUri": "%BLOBURI%/%BLOBCONTAINER%/%GUID%-Headers" +} \ No newline at end of file diff --git a/src/SimpleL7Proxy/templates/notready.json b/src/SimpleL7Proxy/templates/notready.json new file mode 100644 index 00000000..04ef07ee --- /dev/null +++ b/src/SimpleL7Proxy/templates/notready.json @@ -0,0 +1,9 @@ +{ + "Message": "Your request is still processing. Intentional delay added to response: %DELAY_S% seconds", + "UserId": "%USERID%", + "MID": "%MID%", + "Guid": "%GUID%", + "Status": 425, + "DataBlobUri": "%BLOBURI%/%BLOBCONTAINER%/%GUID%", + "HeaderBlobUri": "%BLOBURI%/%BLOBCONTAINER%/%GUID%-Headers" +} \ No newline at end of file diff --git a/src/SimpleL7Proxy/templates/welcome.json b/src/SimpleL7Proxy/templates/welcome.json new file mode 100644 index 00000000..f4e6dc79 --- /dev/null +++ b/src/SimpleL7Proxy/templates/welcome.json @@ -0,0 +1,9 @@ +{ + "Message": "Your request has been accepted for async processing. The final result will be available at the blob URIs. Use OAuth for authentication.", + "UserId": "%USERID%", + "MID": "%MID%", + "Guid": "%GUID%", + "Status": 202, + "DataBlobUri": "%BLOBURI%/%BLOBCONTAINER%/%GUID%", + "HeaderBlobUri": "%BLOBURI%/%BLOBCONTAINER%/%GUID%-Headers" +} \ No newline at end of file diff --git a/test/ProxyWorkerTests/Tests.csproj b/test/ProxyWorkerTests/Tests.csproj index 2f0215f9..f787ff7d 100644 --- a/test/ProxyWorkerTests/Tests.csproj +++ b/test/ProxyWorkerTests/Tests.csproj @@ -1,6 +1,6 @@ - net9.0 + net10.0 enable enable diff --git a/test/StorageBlob/Program.cs b/test/StorageBlob/Program.cs new file mode 100644 index 00000000..887afef1 --- /dev/null +++ b/test/StorageBlob/Program.cs @@ -0,0 +1,54 @@ +using Azure.Identity; +using Azure.Storage.Blobs; +using Azure.Storage.Blobs.Models; +using System.Text; + +// Configuration — set via environment variables or edit inline +var storageAccountName = Environment.GetEnvironmentVariable("STORAGE_ACCOUNT_NAME") ?? "mystorageaccount"; +var containerName = Environment.GetEnvironmentVariable("STORAGE_CONTAINER_NAME") ?? "sample-container"; +var connectionString = Environment.GetEnvironmentVariable("STORAGE_CONNECTION_STRING"); + +Console.WriteLine($"[CONFIG] Storage Account: {storageAccountName}"); +Console.WriteLine($"[CONFIG] Container Name: {containerName}"); +Console.WriteLine($"[CONFIG] Connection String: {(string.IsNullOrEmpty(connectionString) ? "NOT SET" : "SET")}"); + +BlobServiceClient blobServiceClient; + +if (!string.IsNullOrEmpty(connectionString)) { + // Use connection string (local dev / emulator) + blobServiceClient = new BlobServiceClient(connectionString); + Console.WriteLine("[INIT] Using connection string authentication"); +} else { + // Use DefaultAzureCredential (managed identity, az cli, etc.) + var uri = new Uri($"https://{storageAccountName}.blob.core.windows.net"); + blobServiceClient = new BlobServiceClient(uri, new DefaultAzureCredential()); + Console.WriteLine($"[INIT] Using DefaultAzureCredential for {storageAccountName}"); +} + +// Create the container if it doesn't exist +var containerClient = blobServiceClient.GetBlobContainerClient(containerName); +await containerClient.CreateIfNotExistsAsync(PublicAccessType.None); +Console.WriteLine($"[CONTAINER] Ensured container '{containerName}' exists"); + +// Upload a sample blob +var blobName = $"sample-{DateTime.UtcNow:yyyyMMdd-HHmmss}.txt"; +var blobClient = containerClient.GetBlobClient(blobName); + +var content = $"Hello from StorageBlob test at {DateTime.UtcNow:O}"; +using var stream = new MemoryStream(Encoding.UTF8.GetBytes(content)); +await blobClient.UploadAsync(stream, overwrite: true); +Console.WriteLine($"[UPLOAD] Uploaded blob '{blobName}' ({content.Length} bytes)"); + +// Verify by reading it back +BlobDownloadInfo download = await blobClient.DownloadAsync(); +using var reader = new StreamReader(download.Content); +var downloaded = await reader.ReadToEndAsync(); +Console.WriteLine($"[DOWNLOAD] Read back: {downloaded}"); + +// List blobs in the container +Console.WriteLine($"[LIST] Blobs in '{containerName}':"); +await foreach (BlobItem item in containerClient.GetBlobsAsync()) { + Console.WriteLine($" - {item.Name} ({item.Properties.ContentLength} bytes)"); +} + +Console.WriteLine("[DONE] Storage blob sample complete"); diff --git a/test/StorageBlob/StorageBlob.csproj b/test/StorageBlob/StorageBlob.csproj new file mode 100644 index 00000000..65d65134 --- /dev/null +++ b/test/StorageBlob/StorageBlob.csproj @@ -0,0 +1,15 @@ + + + + Exe + net10.0 + enable + enable + + + + + + + + diff --git a/test/curl/triggerAsync.sh b/test/curl/triggerAsync.sh new file mode 100644 index 00000000..b4cb3056 --- /dev/null +++ b/test/curl/triggerAsync.sh @@ -0,0 +1 @@ +curl http://localhost:8000/api/delay?delay=4000 -H "X-UserProfile: 1" -H "S7PAsyncMode: true" diff --git a/test/generator/generator_one/appsettings.json b/test/generator/generator_one/appsettings.json index 501123d7..96c2c14d 100644 --- a/test/generator/generator_one/appsettings.json +++ b/test/generator/generator_one/appsettings.json @@ -1,8 +1,11 @@ { - "test_endpoint": "http://localhost:8000", - "test_endpoint_ACA": "https://nvm2-tc26.purpledesert-d46de6cb.eastus.azurecontainerapps.io", + "test_endpoint_local": "http://localhost:8000", + "test_endpoint_azure": "https://nvm2-tc26.purpledesert-d46de6cb.eastus.azurecontainerapps.io", + "test_endpoint_l7dev": "https://simplel7dev.wittybeach-67bb528b.eastus.azurecontainerapps.io", + "test_endpoint": "http://localhost:8000/api/delay?delay=4000", + "duration_seconds": "100s", - "concurrency": 3000, + "concurrency": 5, "interrun_delay": "0ms", //"Entra_audience": "http://localhost:", //"Entra_clientID": "12345", @@ -11,18 +14,21 @@ { "name": "test1", "method": "GET", - "path": "/file/openAI.txt", - //"path": "/api/delay?delay=0", + //"path": "/file/claude-3.5-haiku.txt", + "path": "/api/delay?delay=0", "timeout": "300s", "headers": { "X-TokenProcessor": "DefaultStream", "X-DelaySecs": "1", "X-Streaming": "false", + "S7PPriorityKey": "1", //"X-TokenProcessor": "AllUsage-2", //"X-TokenProcessor": "MultiLineAllUsage", "test": "x", "xx" : "Value1", - "x-userprofile": "123456" + "X-UserProfile": "highpriority", + "S7PTTL": "200", + "S7PAsyncMode": "true" } } ] diff --git a/test/nullserver/Python/lorem_ipsum.txt b/test/nullserver/Python/lorem_ipsum.txt new file mode 100644 index 00000000..14b1ab0c --- /dev/null +++ b/test/nullserver/Python/lorem_ipsum.txt @@ -0,0 +1,21 @@ +Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. + +Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. + +Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam. + +At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit quo minus id quod maxime placeat facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. + +Temporibus autem quibusdam et aut officiis debitis aut rerum necessitatibus saepe eveniet ut et voluptates repudiandae sint et molestiae non recusandae itaque earum rerum hic tenetur a sapiente delectus. Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. + +Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. + +Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Quis ipsum suspendisse ultrices gravida. Risus commodo viverra maecenas accumsan lacus vel facilisis volutpat. Consectetur adipiscing elit pellentesque pulvinar pellentesque. Pellentesque pulvinar pellentesque nisl nulla non metus auctor fringilla. Donec sollicitudin molestie malesuada. Praesent sapien massa, convallis a pellentesque nec, egestas non nisi. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Donec velit neque, auctor sit amet aliquam vel, ullamcorper sit amet ligula. + +Curabitur arcu erat, accumsan sit amet justo vitae, eleifend ac erat. Donec rutrum congue leo, eu scelerisque magna interdum. Nulla facilisi. Donec sollicitudin molestie malesuada. Mauris blandit aliquet elit, eget tincidunt nibh pulvinar a. Cras ultricies ligula sed magna dictumst vestibulum. Praesent sapien massa, convallis a pellentesque nec, egestas non nisi. Donec sollicitudin molestie malesuada. + +Mauris blandit aliquet elit, eget tincidunt nibh pulvinar a. Cras ultricies ligula sed magna dictumst vestibulum. Nulla porttitor accumsan tincidunt. Vivamus suscipit tortor eget felis porttitor volutpat. Donec sollicitudin molestie malesuada. Nulla facilisi. Donec sollicitudin molestie malesuada. Praesent sapien massa, convallis a pellentesque nec, egestas non nisi. + +Pellentesque pulvinar pellentesque nisl. Nulla porttitor accumsan tincidunt. Vivamus suscipit tortor eget felis porttitor volutpat. Donec sollicitudin molestie malesuada. Mauris blandit aliquet elit, eget tincidunt nibh pulvinar a. Cras ultricies ligula sed magna dictumst vestibulum. Praesent sapien massa, convallis a pellentesque nec, egestas non nisi. + +Quisque velit nisi, pretium ut lacinia in, elementum id enim. Donec sollicitudin molestie malesuada. Mauris blandit aliquet elit, eget tincidunt nibh pulvinar a. Cras ultricies ligula sed magna dictumst vestibulum. Nulla porttitor accumsan tincidunt. Vivamus suscipit tortor eget felis porttitor volutpat. Donec sollicitudin molestie malesuada. diff --git a/test/openai/curltest.sh b/test/openai/curltest.sh new file mode 100644 index 00000000..df5706cb --- /dev/null +++ b/test/openai/curltest.sh @@ -0,0 +1 @@ +curl -i -H test: x -H xx: Value1 -H x-userprofile: 123456 http://localhost:8000/file-nodelay/claude-3.5-haiku.txt -H X-TokenProcessor: AllUsage-2