docs(litellm-gateway): how-to + working example for the LiteLLM AI Gateway in front of OCI by fede-kamel · Pull Request #268 · oracle-samples/locus

fede-kamel · 2026-05-25T06:30:32Z

TL;DR

Replaces the closed PR #266 (in-process LiteLLMModel wrapper).

The LiteLLM-idiomatic integration is the gateway, not a Python library wrapper. This PR ships:

a deployment-grade how-to (docs/how-to/litellm-gateway.md),
a working local sample (examples/litellm-gateway/ — config.yaml, docker-compose.yml, helm-values.yaml, README.md),
a runnable companion notebook (examples/notebook_71_litellm_gateway.py),
27 unit tests + 7 live-OCI integration tests,
an SVG architecture diagram,
a cross-link from docs/how-to/oci-models.md.

Locus's existing OpenAIModel(base_url=...) is the LiteLLM-compatible client — no new Python class, no new dependency (litellm stays out of pyproject.toml).

Live verification — real OCI Generative AI, `us-chicago-1`

Every claim below was driven end-to-end against a Locus-owned OCI tenancy.

Capability	Status	Evidence
`docker compose up` (gateway + Postgres)	✅ both healthy	`docker ps` shows `locus-litellm-gateway` + `locus-litellm-db`
Gateway `/v1/models` lists all 6 OCI aliases	✅	`oci-cohere-command`, `oci-grok`, `oci-gpt5-mini`, `oci-llama-4-maverick`, `oci-gemini-2.5-flash`, `oci-cohere-embed`
Locus `OpenAIModel(base_url=...)` → OCI completion	✅	"Tokyo." / "Paris." / "Berlin." / "Rome."
Integration suite — 7 live tests	✅ 7/7	basic completion, multi-turn + system, streaming, tool-call, full Agent loop, `/v1/models` lookup, unauthenticated-call rejection
Unit tests — 27 (parse `config.yaml` / compose / Helm)	✅ 27/27	alias/docs parity, OCI env wiring on every entry, strict env-var form, master-key env-sourced, Postgres `db` service shape, gateway `depends_on: condition: service_healthy`, helm pod hardening
`/key/generate` issues a virtual key (Postgres-backed)	✅	with model allowlist + budget + expiry + metadata
Virtual key on its allowed model	✅	`oci-cohere-command` → "Paris."
Virtual key trying a model not on its allowlist	✅ rejected	exact error: `key not allowed to access model. This key can only access models=['oci-cohere-command']. Tried to access oci-gpt5-mini`
Cost tracking — `/spend/logs` per-request	✅	token counts + USD cost per call; rejected calls logged at `cost=$0.000000`
Cost tracking — `/global/spend/keys` aggregate	✅	rolled up per virtual key
Fallback chain — primary 5xx → secondary serves	✅	broken alias targeting `oci/xai.grok-NONEXISTENT-9999` with fallback `oci-cohere-command` — response served as `cohere.command-latest` with content "Rome."
`litellm` not in `pyproject.toml`	✅	grep returns empty; no Python dep added
`mkdocs --strict` build	✅	clean
Pre-commit hooks (ruff, mypy, codespell, gitleaks, commitizen, markdownlint, YAML formatter)	✅	all green

Why this shape (not the closed in-process wrapper)

Reviewers on PR #266 raised real concerns — silently-dropped params, custom tool-arg sentinels, "every provider works" overclaim, and the wrapper's permanent lag behind the gateway's feature surface. A second look at how LiteLLM is designed to be consumed clinched it:

LiteLLM's product is the gateway. The Python function litellm.acompletion() is internal scaffolding; the platform-grade pieces — virtual keys, budgets, fallbacks, callbacks, observability, audit, cost reporting, caching, guardrails — live in the proxy. A library wrapper would always re-implement a subset of that and lag behind it.

OpenAIModel(base_url=...) already speaks the gateway's contract, so the right integration is one config file telling the gateway how to reach OCI. That's this PR.

Net diff vs PR #266: ~−2,000 lines of code + tests + CI removed, ~+900 lines of docs / sample / test added. No new Python class. No new dep.

Locus agent
   │  OpenAIModel(model="oci-cohere-command",
   │              base_url="http://litellm-gateway:4000",
   │              api_key="<virtual-key>")
   ▼
LiteLLM AI Gateway  (config.yaml: every provider + virtual keys + fallbacks + callbacks)
   │  OCI Signature v1 RSA-SHA256 signing happens HERE — never in Locus
   ▼
Oracle Generative AI Infrastructure   (Cohere · Grok · Llama · Gemini · gpt-5)

What's in this PR

Documentation

docs/how-to/litellm-gateway.md — deployment guide. Sections: when to use the gateway vs. the direct OCI providers, an explicit "Scope" admonition (the gateway covers /20231130/actions/chat only; OCI's V1 shim and Responses API stay with the direct providers), local Docker quickstart, OKE quickstart, issuing per-team virtual keys, cost tracking with /spend/logs and /global/spend/keys, auth-boundary diagram (gateway holds OCI creds, Locus holds virtual keys), notebook-run-via-gateway recipe.
docs/img/litellm-gateway-architecture.svg — three-tier SVG (Locus → Gateway → OCI). Tier-2 panel itemises every platform-grade feature so reviewers see what the gateway carries that a library wrapper couldn't.
docs/notebooks/notebook_71_litellm_gateway.md — notebook md stub with the SVG embedded.
docs/how-to/oci-models.md — admonition at the top pointing to the gateway page as the recommended path for multi-tenant / cross-provider / centralised-observability deployments.
mkdocs.yml — nav entries (Guides → LiteLLM AI Gateway; Notebooks → 71 · LiteLLM AI Gateway).

Working sample (`examples/litellm-gateway/`)

config.yaml — 6 OCI model aliases (Cohere Command + Embed, Grok 4.20, gpt-5-mini, Llama 4 Maverick, Gemini 2.5 Flash) wired to OCI_* env vars via os.environ/..., drop_params: true, fallback chains across the catalog, master-key from LITELLM_MASTER_KEY env (never inlined).
docker-compose.yml — gateway + Postgres-17 sidecar. Gateway depends_on: db: condition: service_healthy, so the first /key/generate doesn't race past Prisma migrations. All required env vars use ${VAR:?...} strict form. OCI key mounted read-only at /oci-keys/key.pem.
helm-values.yaml — official litellm-helm chart values. ClusterIP-only Service (never expose publicly — the gateway holds OCI signing material), envFrom Kubernetes Secrets, OKE Workload Identity placeholder (gateway pod's identity replaces the long-lived signing key), pod hardening (runAsNonRoot, read-only root FS, allowPrivilegeEscalation: false, all caps dropped), external Postgres pointer.
README.md — side-by-side local + OKE quickstarts.

Companion notebook

examples/notebook_71_litellm_gateway.py — runnable end-to-end demo. Health-checks the gateway, builds an Agent(OpenAIModel(base_url=...)), runs blocking + streaming prompts, prints token counts. Self-skips with a wiring banner when LITELLM_GATEWAY_URL / LITELLM_GATEWAY_KEY aren't set (same UX as the Oracle ADB notebooks).

Tests

tests/unit/test_litellm_gateway_example.py — 27 tests, zero network. Parses the sample config.yaml, docker-compose.yml, helm-values.yaml and asserts every documented invariant: alias / docs parity, OCI env wiring on every entry, fallback chains reference declared aliases, compose uses ${VAR:?...} strict form, OCI key mounted read-only, Postgres db service shape, gateway depends_on with condition: service_healthy in long-form mapping (not the short list which doesn't wait), DATABASE_URL uses in-network host + strict env-var form, helm Service is ClusterIP-only, pod hardened.
tests/integration/test_litellm_gateway_live.py — 7 tests, gated on LITELLM_GATEWAY_URL / LITELLM_GATEWAY_KEY. /v1/models lookup, negative-path unauthenticated rejection, basic completion, multi-turn + system message, streaming, tool calling, full Agent loop. Auto-skipped without the env vars; runs from the parent reusable workflow when they're present.

Auth boundary — what changes

Without the gateway	With the gateway
Locus → OCI directly. Each Locus service carries the OCI signing key (or uses OKE Workload Identity at the Locus pod).	Locus → gateway with a virtual key. Gateway → OCI with the OCI signing key (or its own OKE Workload Identity at the gateway pod).

So Locus services no longer need OCI credentials at all — only the gateway does. Different agents / teams / customers each get their own virtual key with their own budget + model allowlist + audit trail. On OKE, the gateway pod can use Workload Identity so the OCI signing key never lands on disk anywhere.

Honest caveats

The docs list features the gateway supports but where I haven't live-demoed the integration in this PR:

Observability callbacks — Langfuse / OpenTelemetry / Datadog / Helicone. success_callback: ["langfuse"] is referenced in config.yaml as a commented-out hook. Wiring it end-to-end requires a backing service, follow-up PR with a Langfuse-cloud or local Langfuse demo.
Cache passthrough — Redis / S3 / Qdrant. Same shape: commented in config.yaml, documented, not live-demoed.
Guardrails — Lakera / Aporia / Presidio / Bedrock Guardrails. Listed in the SVG, not configured in the sample.
OKE helm install against a real cluster — helm-values.yaml ships and helm template lints fine, but I have not run a real install. Local Docker validates the same artifacts (config + image + env-var contract).

These are accurate-but-unverified claims. Tracked as follow-up PRs in #269 — one PR per capability, each with its own live demo + integration test. The docs read as a deployment guide — "the gateway provides X; see LiteLLM's docs for X-specific config" — and point at the upstream documentation for each. Each is a clean follow-up PR with its own live demo.

The four live-verified pieces (OCI native, virtual keys, cost tracking, fallback chains) are the core platform-grade value-add over the direct OCI providers, and they're all working today.

What was not changed

No removal of OCIChatCompletionsModel / OCIResponsesModel / OCIModel. Those remain the recommended primary path for single-tenant production, dev/CI, and on-OKE workload identity. The gateway is a parallel option for the multi-tenant / cross-provider / centralised-observability case.
No new Python code in Locus. Intentional.
No new Locus dep. pyproject.toml diff is empty.

Commits

5ae7c40 docs(litellm-gateway): Postgres sidecar, virtual keys, cost tracking, fallback verified
5111402 docs(litellm-gateway): rebuild SVG without text overlay; drop 'Why no new Locus class' section
8f75cd6 docs(litellm-gateway): simplify notebook 71 nav label to 'LiteLLM AI Gateway'
72bef4c docs(litellm-gateway): notebook 71 + SVG architecture diagram + unit & integration tests
1433dc8 docs(litellm-gateway): how-to + working example for the LiteLLM AI Gateway in front of OCI

Supersedes

Closes feat(models): native LiteLLMModel with three OCI transports + cross-provider routing #266 (in-process LiteLLMModel wrapper — closed in favour of this shape).
Closes Add 'litellm' option to LOCUS_MODEL_PROVIDER so all 70 example notebooks can run via LiteLLMModel #267 (notebook migration via LOCUS_MODEL_PROVIDER=litellm — superseded; the gateway path uses LOCUS_MODEL_PROVIDER=openai + OPENAI_BASE_URL which already works through examples/config.py).

…teway in front of OCI Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

…& integration tests - examples/notebook_71_litellm_gateway.py — runnable companion to the how-to. Health-checks the gateway, builds an Agent around OpenAIModel(base_url=...), runs blocking + streaming prompts. Self-skips with a wiring banner when LITELLM_GATEWAY_URL / LITELLM_GATEWAY_KEY aren't set. - docs/img/litellm-gateway-architecture.svg — three-tier SVG flow (Locus → LiteLLM Gateway → OCI Generative AI). The middle panel itemises every gateway feature so reviewers can see what the proxy carries that an in-process wrapper doesn't. - docs/notebooks/notebook_71_litellm_gateway.md — notebook md stub with the SVG embedded. - mkdocs.yml — notebook nav entry next to notebook 70. - docs/how-to/litellm-gateway.md — SVG embedded at the top. - tests/unit/test_litellm_gateway_example.py — 20 tests, no network. Parses config.yaml / docker-compose.yml / helm-values.yaml and asserts the documented invariants: alias / docs parity, OCI_* env wiring on every upstream entry, drop_params=True, master_key env sourced, fallback chains reference declared aliases, compose uses ${VAR:?…} strict form, OCI key mounted read-only, helm Service is ClusterIP-only, pod hardened (non-root, read-only root, caps dropped), README cross-references the artifacts. - tests/integration/test_litellm_gateway_live.py — drives the live gateway end-to-end through Locus's OpenAIModel: /v1/models health check, negative-path unauthenticated rejection, basic completion, multi-turn with system message, streaming, tool calling, full Agent loop. Auto-skipped when LITELLM_GATEWAY_URL / LITELLM_GATEWAY_KEY aren't set; runs from the existing _litellm_integration workflow. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

…Gateway' Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

… new Locus class' section Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

… fallback verified Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

… patterns Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

…ts as deployment-validation Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

The PR #268 work lands under [Unreleased] following the same shape as the b21 entries — leading summary paragraph, then enumerated detail of what ships + what's verified + what's tracked as follow-up in #269. No version bump in pyproject.toml — that's a release-manager call when b22 cuts. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

* docs(litellm-gateway): how-to + working example for the LiteLLM AI Gateway in front of OCI Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com> * docs(litellm-gateway): notebook 71 + SVG architecture diagram + unit & integration tests - examples/notebook_71_litellm_gateway.py — runnable companion to the how-to. Health-checks the gateway, builds an Agent around OpenAIModel(base_url=...), runs blocking + streaming prompts. Self-skips with a wiring banner when LITELLM_GATEWAY_URL / LITELLM_GATEWAY_KEY aren't set. - docs/img/litellm-gateway-architecture.svg — three-tier SVG flow (Locus → LiteLLM Gateway → OCI Generative AI). The middle panel itemises every gateway feature so reviewers can see what the proxy carries that an in-process wrapper doesn't. - docs/notebooks/notebook_71_litellm_gateway.md — notebook md stub with the SVG embedded. - mkdocs.yml — notebook nav entry next to notebook 70. - docs/how-to/litellm-gateway.md — SVG embedded at the top. - tests/unit/test_litellm_gateway_example.py — 20 tests, no network. Parses config.yaml / docker-compose.yml / helm-values.yaml and asserts the documented invariants: alias / docs parity, OCI_* env wiring on every upstream entry, drop_params=True, master_key env sourced, fallback chains reference declared aliases, compose uses ${VAR:?…} strict form, OCI key mounted read-only, helm Service is ClusterIP-only, pod hardened (non-root, read-only root, caps dropped), README cross-references the artifacts. - tests/integration/test_litellm_gateway_live.py — drives the live gateway end-to-end through Locus's OpenAIModel: /v1/models health check, negative-path unauthenticated rejection, basic completion, multi-turn with system message, streaming, tool calling, full Agent loop. Auto-skipped when LITELLM_GATEWAY_URL / LITELLM_GATEWAY_KEY aren't set; runs from the existing _litellm_integration workflow. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com> * docs(litellm-gateway): simplify notebook 71 nav label to 'LiteLLM AI Gateway' Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com> * docs(litellm-gateway): rebuild SVG without text overlay; drop 'Why no new Locus class' section Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com> * docs(litellm-gateway): Postgres sidecar, virtual keys, cost tracking, fallback verified Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com> * docs(litellm-gateway): cost-tracking suite + notebook 72 + enterprise patterns Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com> * ci(litellm-gateway): kill alias drift + corporate-proxy override Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com> * docs(litellm-gateway): compress enterprise section + reframe cost tests as deployment-validation Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com> * docs(changelog): add LiteLLM AI Gateway integration entry to Unreleased The PR #268 work lands under [Unreleased] following the same shape as the b21 entries — leading summary paragraph, then enumerated detail of what ships + what's verified + what's tracked as follow-up in #269. No version bump in pyproject.toml — that's a release-manager call when b22 cuts. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com> * chore(release): v0.2.0b22 — LiteLLM AI Gateway integration One PR landed since b21 (#268): Locus is now documented + sampled + tested as a first-class consumer of the LiteLLM AI Gateway in front of Oracle Generative AI Infrastructure. Zero new Python code in Locus, no new dependency added to pyproject.toml — the integration is a deployment guide + working sample + tests. Live-verified against real OCI us-chicago-1 (LUIGI_FRA_API tenancy): - 7/7 live gateway integration tests - 7/7 cost-tracking deployment-validation tests - 29/29 unit tests over the shipped sample - Fallback chain validated with a broken-on-purpose primary - DCO sign-off on every commit - mkdocs --strict clean Four follow-up gateway capabilities (Langfuse observability, Redis cache, Lakera/Presidio guardrails, OKE helm install) are tracked in #269 — each becomes its own focused PR with its own live demo. See CHANGELOG.md for the full breakdown of what ships. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com> --------- Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 25, 2026

fede-kamel added 8 commits May 25, 2026 12:12

docs(litellm-gateway): how-to + working example for the LiteLLM AI Ga…

bd189d2

…teway in front of OCI Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

docs(litellm-gateway): simplify notebook 71 nav label to 'LiteLLM AI …

1b32728

…Gateway' Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

docs(litellm-gateway): rebuild SVG without text overlay; drop 'Why no…

377e9d1

… new Locus class' section Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

docs(litellm-gateway): Postgres sidecar, virtual keys, cost tracking,…

856a7b2

… fallback verified Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

docs(litellm-gateway): cost-tracking suite + notebook 72 + enterprise…

1292efd

… patterns Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

ci(litellm-gateway): kill alias drift + corporate-proxy override

ebeeb83

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

docs(litellm-gateway): compress enterprise section + reframe cost tes…

b8a48ed

…ts as deployment-validation Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

fede-kamel force-pushed the docs/litellm-gateway branch from 8c52652 to b8a48ed Compare May 25, 2026 16:12

fede-kamel mentioned this pull request May 25, 2026

Follow-up: live-verify LiteLLM AI Gateway features deferred from PR #268 #269

Open

15 tasks

fede-kamel merged commit 586daaa into main May 25, 2026
10 checks passed

fede-kamel mentioned this pull request May 25, 2026

chore(release): v0.2.0b22 — LiteLLM AI Gateway integration #270

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(litellm-gateway): how-to + working example for the LiteLLM AI Gateway in front of OCI#268

docs(litellm-gateway): how-to + working example for the LiteLLM AI Gateway in front of OCI#268
fede-kamel merged 8 commits into
mainfrom
docs/litellm-gateway

fede-kamel commented May 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fede-kamel commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Live verification — real OCI Generative AI, us-chicago-1

Why this shape (not the closed in-process wrapper)

What's in this PR

Documentation

Working sample (examples/litellm-gateway/)

Companion notebook

Tests

Auth boundary — what changes

Honest caveats

What was not changed

Commits

Related

Supersedes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fede-kamel commented May 25, 2026 •

edited

Loading

Live verification — real OCI Generative AI, `us-chicago-1`

Working sample (`examples/litellm-gateway/`)