docs(litellm-gateway): how-to + working example for the LiteLLM AI Gateway in front of OCI#268
Merged
Conversation
…teway in front of OCI Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
…& integration tests
- examples/notebook_71_litellm_gateway.py — runnable companion to
the how-to. Health-checks the gateway, builds an Agent around
OpenAIModel(base_url=...), runs blocking + streaming prompts.
Self-skips with a wiring banner when LITELLM_GATEWAY_URL /
LITELLM_GATEWAY_KEY aren't set.
- docs/img/litellm-gateway-architecture.svg — three-tier SVG flow
(Locus → LiteLLM Gateway → OCI Generative AI). The middle panel
itemises every gateway feature so reviewers can see what the
proxy carries that an in-process wrapper doesn't.
- docs/notebooks/notebook_71_litellm_gateway.md — notebook md stub
with the SVG embedded.
- mkdocs.yml — notebook nav entry next to notebook 70.
- docs/how-to/litellm-gateway.md — SVG embedded at the top.
- tests/unit/test_litellm_gateway_example.py — 20 tests, no network.
Parses config.yaml / docker-compose.yml / helm-values.yaml and
asserts the documented invariants: alias / docs parity, OCI_* env
wiring on every upstream entry, drop_params=True, master_key env
sourced, fallback chains reference declared aliases, compose uses
${VAR:?…} strict form, OCI key mounted read-only, helm Service is
ClusterIP-only, pod hardened (non-root, read-only root, caps
dropped), README cross-references the artifacts.
- tests/integration/test_litellm_gateway_live.py — drives the live
gateway end-to-end through Locus's OpenAIModel: /v1/models health
check, negative-path unauthenticated rejection, basic completion,
multi-turn with system message, streaming, tool calling, full
Agent loop. Auto-skipped when LITELLM_GATEWAY_URL /
LITELLM_GATEWAY_KEY aren't set; runs from the existing
_litellm_integration workflow.
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
…Gateway' Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
… new Locus class' section Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
… fallback verified Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
… patterns Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
…ts as deployment-validation Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
8c52652 to
b8a48ed
Compare
15 tasks
fede-kamel
added a commit
that referenced
this pull request
May 25, 2026
The PR #268 work lands under [Unreleased] following the same shape as the b21 entries — leading summary paragraph, then enumerated detail of what ships + what's verified + what's tracked as follow-up in #269. No version bump in pyproject.toml — that's a release-manager call when b22 cuts. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
6 tasks
fede-kamel
added a commit
that referenced
this pull request
May 25, 2026
* docs(litellm-gateway): how-to + working example for the LiteLLM AI Gateway in front of OCI
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
* docs(litellm-gateway): notebook 71 + SVG architecture diagram + unit & integration tests
- examples/notebook_71_litellm_gateway.py — runnable companion to
the how-to. Health-checks the gateway, builds an Agent around
OpenAIModel(base_url=...), runs blocking + streaming prompts.
Self-skips with a wiring banner when LITELLM_GATEWAY_URL /
LITELLM_GATEWAY_KEY aren't set.
- docs/img/litellm-gateway-architecture.svg — three-tier SVG flow
(Locus → LiteLLM Gateway → OCI Generative AI). The middle panel
itemises every gateway feature so reviewers can see what the
proxy carries that an in-process wrapper doesn't.
- docs/notebooks/notebook_71_litellm_gateway.md — notebook md stub
with the SVG embedded.
- mkdocs.yml — notebook nav entry next to notebook 70.
- docs/how-to/litellm-gateway.md — SVG embedded at the top.
- tests/unit/test_litellm_gateway_example.py — 20 tests, no network.
Parses config.yaml / docker-compose.yml / helm-values.yaml and
asserts the documented invariants: alias / docs parity, OCI_* env
wiring on every upstream entry, drop_params=True, master_key env
sourced, fallback chains reference declared aliases, compose uses
${VAR:?…} strict form, OCI key mounted read-only, helm Service is
ClusterIP-only, pod hardened (non-root, read-only root, caps
dropped), README cross-references the artifacts.
- tests/integration/test_litellm_gateway_live.py — drives the live
gateway end-to-end through Locus's OpenAIModel: /v1/models health
check, negative-path unauthenticated rejection, basic completion,
multi-turn with system message, streaming, tool calling, full
Agent loop. Auto-skipped when LITELLM_GATEWAY_URL /
LITELLM_GATEWAY_KEY aren't set; runs from the existing
_litellm_integration workflow.
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
* docs(litellm-gateway): simplify notebook 71 nav label to 'LiteLLM AI Gateway'
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
* docs(litellm-gateway): rebuild SVG without text overlay; drop 'Why no new Locus class' section
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
* docs(litellm-gateway): Postgres sidecar, virtual keys, cost tracking, fallback verified
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
* docs(litellm-gateway): cost-tracking suite + notebook 72 + enterprise patterns
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
* ci(litellm-gateway): kill alias drift + corporate-proxy override
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
* docs(litellm-gateway): compress enterprise section + reframe cost tests as deployment-validation
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
* docs(changelog): add LiteLLM AI Gateway integration entry to Unreleased
The PR #268 work lands under [Unreleased] following the same shape
as the b21 entries — leading summary paragraph, then enumerated
detail of what ships + what's verified + what's tracked as
follow-up in #269.
No version bump in pyproject.toml — that's a release-manager call
when b22 cuts.
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
* chore(release): v0.2.0b22 — LiteLLM AI Gateway integration
One PR landed since b21 (#268): Locus is now documented + sampled +
tested as a first-class consumer of the LiteLLM AI Gateway in front
of Oracle Generative AI Infrastructure. Zero new Python code in
Locus, no new dependency added to pyproject.toml — the integration
is a deployment guide + working sample + tests.
Live-verified against real OCI us-chicago-1 (LUIGI_FRA_API tenancy):
- 7/7 live gateway integration tests
- 7/7 cost-tracking deployment-validation tests
- 29/29 unit tests over the shipped sample
- Fallback chain validated with a broken-on-purpose primary
- DCO sign-off on every commit
- mkdocs --strict clean
Four follow-up gateway capabilities (Langfuse observability, Redis
cache, Lakera/Presidio guardrails, OKE helm install) are tracked
in #269 — each becomes its own focused PR with its own live demo.
See CHANGELOG.md for the full breakdown of what ships.
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
---------
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
Replaces the closed PR #266 (in-process
LiteLLMModelwrapper).The LiteLLM-idiomatic integration is the gateway, not a Python library wrapper. This PR ships:
docs/how-to/litellm-gateway.md),examples/litellm-gateway/—config.yaml,docker-compose.yml,helm-values.yaml,README.md),examples/notebook_71_litellm_gateway.py),docs/how-to/oci-models.md.Locus's existing
OpenAIModel(base_url=...)is the LiteLLM-compatible client — no new Python class, no new dependency (litellmstays out ofpyproject.toml).Live verification — real OCI Generative AI,
us-chicago-1Every claim below was driven end-to-end against a Locus-owned OCI tenancy.
docker compose up(gateway + Postgres)docker psshowslocus-litellm-gateway+locus-litellm-db/v1/modelslists all 6 OCI aliasesoci-cohere-command,oci-grok,oci-gpt5-mini,oci-llama-4-maverick,oci-gemini-2.5-flash,oci-cohere-embedOpenAIModel(base_url=...)→ OCI completion/v1/modelslookup, unauthenticated-call rejectionconfig.yaml/ compose / Helm)dbservice shape, gatewaydepends_on: condition: service_healthy, helm pod hardening/key/generateissues a virtual key (Postgres-backed)oci-cohere-command→ "Paris."key not allowed to access model. This key can only access models=['oci-cohere-command']. Tried to access oci-gpt5-mini/spend/logsper-requestcost=$0.000000/global/spend/keysaggregateoci/xai.grok-NONEXISTENT-9999with fallbackoci-cohere-command— response served ascohere.command-latestwith content "Rome."litellmnot inpyproject.tomlmkdocs --strictbuildWhy this shape (not the closed in-process wrapper)
Reviewers on PR #266 raised real concerns — silently-dropped params, custom tool-arg sentinels, "every provider works" overclaim, and the wrapper's permanent lag behind the gateway's feature surface. A second look at how LiteLLM is designed to be consumed clinched it:
OpenAIModel(base_url=...)already speaks the gateway's contract, so the right integration is one config file telling the gateway how to reach OCI. That's this PR.Net diff vs PR #266: ~−2,000 lines of code + tests + CI removed, ~+900 lines of docs / sample / test added. No new Python class. No new dep.
What's in this PR
Documentation
docs/how-to/litellm-gateway.md— deployment guide. Sections: when to use the gateway vs. the direct OCI providers, an explicit "Scope" admonition (the gateway covers/20231130/actions/chatonly; OCI's V1 shim and Responses API stay with the direct providers), local Docker quickstart, OKE quickstart, issuing per-team virtual keys, cost tracking with/spend/logsand/global/spend/keys, auth-boundary diagram (gateway holds OCI creds, Locus holds virtual keys), notebook-run-via-gateway recipe.docs/img/litellm-gateway-architecture.svg— three-tier SVG (Locus → Gateway → OCI). Tier-2 panel itemises every platform-grade feature so reviewers see what the gateway carries that a library wrapper couldn't.docs/notebooks/notebook_71_litellm_gateway.md— notebook md stub with the SVG embedded.docs/how-to/oci-models.md— admonition at the top pointing to the gateway page as the recommended path for multi-tenant / cross-provider / centralised-observability deployments.mkdocs.yml— nav entries (Guides → LiteLLM AI Gateway; Notebooks → 71 · LiteLLM AI Gateway).Working sample (
examples/litellm-gateway/)config.yaml— 6 OCI model aliases (Cohere Command + Embed, Grok 4.20, gpt-5-mini, Llama 4 Maverick, Gemini 2.5 Flash) wired toOCI_*env vars viaos.environ/...,drop_params: true, fallback chains across the catalog, master-key fromLITELLM_MASTER_KEYenv (never inlined).docker-compose.yml— gateway + Postgres-17 sidecar. Gatewaydepends_on: db: condition: service_healthy, so the first/key/generatedoesn't race past Prisma migrations. All required env vars use${VAR:?...}strict form. OCI key mounted read-only at/oci-keys/key.pem.helm-values.yaml— officiallitellm-helmchart values. ClusterIP-only Service (never expose publicly — the gateway holds OCI signing material),envFromKubernetes Secrets, OKE Workload Identity placeholder (gateway pod's identity replaces the long-lived signing key), pod hardening (runAsNonRoot, read-only root FS,allowPrivilegeEscalation: false, all caps dropped), external Postgres pointer.README.md— side-by-side local + OKE quickstarts.Companion notebook
examples/notebook_71_litellm_gateway.py— runnable end-to-end demo. Health-checks the gateway, builds anAgent(OpenAIModel(base_url=...)), runs blocking + streaming prompts, prints token counts. Self-skips with a wiring banner whenLITELLM_GATEWAY_URL/LITELLM_GATEWAY_KEYaren't set (same UX as the Oracle ADB notebooks).Tests
tests/unit/test_litellm_gateway_example.py— 27 tests, zero network. Parses the sampleconfig.yaml,docker-compose.yml,helm-values.yamland asserts every documented invariant: alias / docs parity, OCI env wiring on every entry, fallback chains reference declared aliases, compose uses${VAR:?...}strict form, OCI key mounted read-only, Postgresdbservice shape, gatewaydepends_onwithcondition: service_healthyin long-form mapping (not the short list which doesn't wait),DATABASE_URLuses in-network host + strict env-var form, helm Service is ClusterIP-only, pod hardened.tests/integration/test_litellm_gateway_live.py— 7 tests, gated onLITELLM_GATEWAY_URL/LITELLM_GATEWAY_KEY./v1/modelslookup, negative-path unauthenticated rejection, basic completion, multi-turn + system message, streaming, tool calling, full Agent loop. Auto-skipped without the env vars; runs from the parent reusable workflow when they're present.Auth boundary — what changes
So Locus services no longer need OCI credentials at all — only the gateway does. Different agents / teams / customers each get their own virtual key with their own budget + model allowlist + audit trail. On OKE, the gateway pod can use Workload Identity so the OCI signing key never lands on disk anywhere.
Honest caveats
The docs list features the gateway supports but where I haven't live-demoed the integration in this PR:
success_callback: ["langfuse"]is referenced inconfig.yamlas a commented-out hook. Wiring it end-to-end requires a backing service, follow-up PR with a Langfuse-cloud or local Langfuse demo.config.yaml, documented, not live-demoed.helm installagainst a real cluster —helm-values.yamlships andhelm templatelints fine, but I have not run a real install. Local Docker validates the same artifacts (config + image + env-var contract).These are accurate-but-unverified claims. Tracked as follow-up PRs in #269 — one PR per capability, each with its own live demo + integration test. The docs read as a deployment guide — "the gateway provides X; see LiteLLM's docs for X-specific config" — and point at the upstream documentation for each. Each is a clean follow-up PR with its own live demo.
The four live-verified pieces (OCI native, virtual keys, cost tracking, fallback chains) are the core platform-grade value-add over the direct OCI providers, and they're all working today.
What was not changed
OCIChatCompletionsModel/OCIResponsesModel/OCIModel. Those remain the recommended primary path for single-tenant production, dev/CI, and on-OKE workload identity. The gateway is a parallel option for the multi-tenant / cross-provider / centralised-observability case.pyproject.tomldiff is empty.Commits
5ae7c40 docs(litellm-gateway): Postgres sidecar, virtual keys, cost tracking, fallback verified5111402 docs(litellm-gateway): rebuild SVG without text overlay; drop 'Why no new Locus class' section8f75cd6 docs(litellm-gateway): simplify notebook 71 nav label to 'LiteLLM AI Gateway'72bef4c docs(litellm-gateway): notebook 71 + SVG architecture diagram + unit & integration tests1433dc8 docs(litellm-gateway): how-to + working example for the LiteLLM AI Gateway in front of OCIRelated
Supersedes
LiteLLMModelwrapper — closed in favour of this shape).LOCUS_MODEL_PROVIDER=litellm— superseded; the gateway path usesLOCUS_MODEL_PROVIDER=openai+OPENAI_BASE_URLwhich already works throughexamples/config.py).