Skip to content

docs(litellm-gateway): how-to + working example for the LiteLLM AI Gateway in front of OCI#268

Merged
fede-kamel merged 8 commits into
mainfrom
docs/litellm-gateway
May 25, 2026
Merged

docs(litellm-gateway): how-to + working example for the LiteLLM AI Gateway in front of OCI#268
fede-kamel merged 8 commits into
mainfrom
docs/litellm-gateway

Conversation

@fede-kamel
Copy link
Copy Markdown
Contributor

@fede-kamel fede-kamel commented May 25, 2026

TL;DR

Replaces the closed PR #266 (in-process LiteLLMModel wrapper).

The LiteLLM-idiomatic integration is the gateway, not a Python library wrapper. This PR ships:

  • a deployment-grade how-to (docs/how-to/litellm-gateway.md),
  • a working local sample (examples/litellm-gateway/config.yaml, docker-compose.yml, helm-values.yaml, README.md),
  • a runnable companion notebook (examples/notebook_71_litellm_gateway.py),
  • 27 unit tests + 7 live-OCI integration tests,
  • an SVG architecture diagram,
  • a cross-link from docs/how-to/oci-models.md.

Locus's existing OpenAIModel(base_url=...) is the LiteLLM-compatible client — no new Python class, no new dependency (litellm stays out of pyproject.toml).


Live verification — real OCI Generative AI, us-chicago-1

Every claim below was driven end-to-end against a Locus-owned OCI tenancy.

Capability Status Evidence
docker compose up (gateway + Postgres) ✅ both healthy docker ps shows locus-litellm-gateway + locus-litellm-db
Gateway /v1/models lists all 6 OCI aliases oci-cohere-command, oci-grok, oci-gpt5-mini, oci-llama-4-maverick, oci-gemini-2.5-flash, oci-cohere-embed
Locus OpenAIModel(base_url=...) → OCI completion "Tokyo." / "Paris." / "Berlin." / "Rome."
Integration suite — 7 live tests ✅ 7/7 basic completion, multi-turn + system, streaming, tool-call, full Agent loop, /v1/models lookup, unauthenticated-call rejection
Unit tests — 27 (parse config.yaml / compose / Helm) ✅ 27/27 alias/docs parity, OCI env wiring on every entry, strict env-var form, master-key env-sourced, Postgres db service shape, gateway depends_on: condition: service_healthy, helm pod hardening
/key/generate issues a virtual key (Postgres-backed) with model allowlist + budget + expiry + metadata
Virtual key on its allowed model oci-cohere-command → "Paris."
Virtual key trying a model not on its allowlist ✅ rejected exact error: key not allowed to access model. This key can only access models=['oci-cohere-command']. Tried to access oci-gpt5-mini
Cost tracking/spend/logs per-request token counts + USD cost per call; rejected calls logged at cost=$0.000000
Cost tracking/global/spend/keys aggregate rolled up per virtual key
Fallback chain — primary 5xx → secondary serves broken alias targeting oci/xai.grok-NONEXISTENT-9999 with fallback oci-cohere-command — response served as cohere.command-latest with content "Rome."
litellm not in pyproject.toml grep returns empty; no Python dep added
mkdocs --strict build clean
Pre-commit hooks (ruff, mypy, codespell, gitleaks, commitizen, markdownlint, YAML formatter) all green

Why this shape (not the closed in-process wrapper)

Reviewers on PR #266 raised real concerns — silently-dropped params, custom tool-arg sentinels, "every provider works" overclaim, and the wrapper's permanent lag behind the gateway's feature surface. A second look at how LiteLLM is designed to be consumed clinched it:

LiteLLM's product is the gateway. The Python function litellm.acompletion() is internal scaffolding; the platform-grade pieces — virtual keys, budgets, fallbacks, callbacks, observability, audit, cost reporting, caching, guardrails — live in the proxy. A library wrapper would always re-implement a subset of that and lag behind it.

OpenAIModel(base_url=...) already speaks the gateway's contract, so the right integration is one config file telling the gateway how to reach OCI. That's this PR.

Net diff vs PR #266: ~−2,000 lines of code + tests + CI removed, ~+900 lines of docs / sample / test added. No new Python class. No new dep.

Locus agent
   │  OpenAIModel(model="oci-cohere-command",
   │              base_url="http://litellm-gateway:4000",
   │              api_key="<virtual-key>")
   ▼
LiteLLM AI Gateway  (config.yaml: every provider + virtual keys + fallbacks + callbacks)
   │  OCI Signature v1 RSA-SHA256 signing happens HERE — never in Locus
   ▼
Oracle Generative AI Infrastructure   (Cohere · Grok · Llama · Gemini · gpt-5)

What's in this PR

Documentation

  • docs/how-to/litellm-gateway.md — deployment guide. Sections: when to use the gateway vs. the direct OCI providers, an explicit "Scope" admonition (the gateway covers /20231130/actions/chat only; OCI's V1 shim and Responses API stay with the direct providers), local Docker quickstart, OKE quickstart, issuing per-team virtual keys, cost tracking with /spend/logs and /global/spend/keys, auth-boundary diagram (gateway holds OCI creds, Locus holds virtual keys), notebook-run-via-gateway recipe.
  • docs/img/litellm-gateway-architecture.svg — three-tier SVG (Locus → Gateway → OCI). Tier-2 panel itemises every platform-grade feature so reviewers see what the gateway carries that a library wrapper couldn't.
  • docs/notebooks/notebook_71_litellm_gateway.md — notebook md stub with the SVG embedded.
  • docs/how-to/oci-models.md — admonition at the top pointing to the gateway page as the recommended path for multi-tenant / cross-provider / centralised-observability deployments.
  • mkdocs.yml — nav entries (Guides → LiteLLM AI Gateway; Notebooks → 71 · LiteLLM AI Gateway).

Working sample (examples/litellm-gateway/)

  • config.yaml — 6 OCI model aliases (Cohere Command + Embed, Grok 4.20, gpt-5-mini, Llama 4 Maverick, Gemini 2.5 Flash) wired to OCI_* env vars via os.environ/..., drop_params: true, fallback chains across the catalog, master-key from LITELLM_MASTER_KEY env (never inlined).
  • docker-compose.yml — gateway + Postgres-17 sidecar. Gateway depends_on: db: condition: service_healthy, so the first /key/generate doesn't race past Prisma migrations. All required env vars use ${VAR:?...} strict form. OCI key mounted read-only at /oci-keys/key.pem.
  • helm-values.yaml — official litellm-helm chart values. ClusterIP-only Service (never expose publicly — the gateway holds OCI signing material), envFrom Kubernetes Secrets, OKE Workload Identity placeholder (gateway pod's identity replaces the long-lived signing key), pod hardening (runAsNonRoot, read-only root FS, allowPrivilegeEscalation: false, all caps dropped), external Postgres pointer.
  • README.md — side-by-side local + OKE quickstarts.

Companion notebook

  • examples/notebook_71_litellm_gateway.py — runnable end-to-end demo. Health-checks the gateway, builds an Agent(OpenAIModel(base_url=...)), runs blocking + streaming prompts, prints token counts. Self-skips with a wiring banner when LITELLM_GATEWAY_URL / LITELLM_GATEWAY_KEY aren't set (same UX as the Oracle ADB notebooks).

Tests

  • tests/unit/test_litellm_gateway_example.py — 27 tests, zero network. Parses the sample config.yaml, docker-compose.yml, helm-values.yaml and asserts every documented invariant: alias / docs parity, OCI env wiring on every entry, fallback chains reference declared aliases, compose uses ${VAR:?...} strict form, OCI key mounted read-only, Postgres db service shape, gateway depends_on with condition: service_healthy in long-form mapping (not the short list which doesn't wait), DATABASE_URL uses in-network host + strict env-var form, helm Service is ClusterIP-only, pod hardened.
  • tests/integration/test_litellm_gateway_live.py — 7 tests, gated on LITELLM_GATEWAY_URL / LITELLM_GATEWAY_KEY. /v1/models lookup, negative-path unauthenticated rejection, basic completion, multi-turn + system message, streaming, tool calling, full Agent loop. Auto-skipped without the env vars; runs from the parent reusable workflow when they're present.

Auth boundary — what changes

Without the gateway With the gateway
Locus → OCI directly. Each Locus service carries the OCI signing key (or uses OKE Workload Identity at the Locus pod). Locus → gateway with a virtual key. Gateway → OCI with the OCI signing key (or its own OKE Workload Identity at the gateway pod).

So Locus services no longer need OCI credentials at all — only the gateway does. Different agents / teams / customers each get their own virtual key with their own budget + model allowlist + audit trail. On OKE, the gateway pod can use Workload Identity so the OCI signing key never lands on disk anywhere.


Honest caveats

The docs list features the gateway supports but where I haven't live-demoed the integration in this PR:

  • Observability callbacks — Langfuse / OpenTelemetry / Datadog / Helicone. success_callback: ["langfuse"] is referenced in config.yaml as a commented-out hook. Wiring it end-to-end requires a backing service, follow-up PR with a Langfuse-cloud or local Langfuse demo.
  • Cache passthrough — Redis / S3 / Qdrant. Same shape: commented in config.yaml, documented, not live-demoed.
  • Guardrails — Lakera / Aporia / Presidio / Bedrock Guardrails. Listed in the SVG, not configured in the sample.
  • OKE helm install against a real clusterhelm-values.yaml ships and helm template lints fine, but I have not run a real install. Local Docker validates the same artifacts (config + image + env-var contract).

These are accurate-but-unverified claims. Tracked as follow-up PRs in #269 — one PR per capability, each with its own live demo + integration test. The docs read as a deployment guide"the gateway provides X; see LiteLLM's docs for X-specific config" — and point at the upstream documentation for each. Each is a clean follow-up PR with its own live demo.

The four live-verified pieces (OCI native, virtual keys, cost tracking, fallback chains) are the core platform-grade value-add over the direct OCI providers, and they're all working today.


What was not changed

  • No removal of OCIChatCompletionsModel / OCIResponsesModel / OCIModel. Those remain the recommended primary path for single-tenant production, dev/CI, and on-OKE workload identity. The gateway is a parallel option for the multi-tenant / cross-provider / centralised-observability case.
  • No new Python code in Locus. Intentional.
  • No new Locus dep. pyproject.toml diff is empty.

Commits

5ae7c40 docs(litellm-gateway): Postgres sidecar, virtual keys, cost tracking, fallback verified
5111402 docs(litellm-gateway): rebuild SVG without text overlay; drop 'Why no new Locus class' section
8f75cd6 docs(litellm-gateway): simplify notebook 71 nav label to 'LiteLLM AI Gateway'
72bef4c docs(litellm-gateway): notebook 71 + SVG architecture diagram + unit & integration tests
1433dc8 docs(litellm-gateway): how-to + working example for the LiteLLM AI Gateway in front of OCI


Related

Supersedes

@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 25, 2026
…teway in front of OCI

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
…& integration tests

  - examples/notebook_71_litellm_gateway.py — runnable companion to
    the how-to. Health-checks the gateway, builds an Agent around
    OpenAIModel(base_url=...), runs blocking + streaming prompts.
    Self-skips with a wiring banner when LITELLM_GATEWAY_URL /
    LITELLM_GATEWAY_KEY aren't set.

  - docs/img/litellm-gateway-architecture.svg — three-tier SVG flow
    (Locus → LiteLLM Gateway → OCI Generative AI). The middle panel
    itemises every gateway feature so reviewers can see what the
    proxy carries that an in-process wrapper doesn't.

  - docs/notebooks/notebook_71_litellm_gateway.md — notebook md stub
    with the SVG embedded.

  - mkdocs.yml — notebook nav entry next to notebook 70.

  - docs/how-to/litellm-gateway.md — SVG embedded at the top.

  - tests/unit/test_litellm_gateway_example.py — 20 tests, no network.
    Parses config.yaml / docker-compose.yml / helm-values.yaml and
    asserts the documented invariants: alias / docs parity, OCI_* env
    wiring on every upstream entry, drop_params=True, master_key env
    sourced, fallback chains reference declared aliases, compose uses
    ${VAR:?…} strict form, OCI key mounted read-only, helm Service is
    ClusterIP-only, pod hardened (non-root, read-only root, caps
    dropped), README cross-references the artifacts.

  - tests/integration/test_litellm_gateway_live.py — drives the live
    gateway end-to-end through Locus's OpenAIModel: /v1/models health
    check, negative-path unauthenticated rejection, basic completion,
    multi-turn with system message, streaming, tool calling, full
    Agent loop. Auto-skipped when LITELLM_GATEWAY_URL /
    LITELLM_GATEWAY_KEY aren't set; runs from the existing
    _litellm_integration workflow.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
…Gateway'

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
… new Locus class' section

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
… fallback verified

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
… patterns

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
…ts as deployment-validation

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
@fede-kamel fede-kamel force-pushed the docs/litellm-gateway branch from 8c52652 to b8a48ed Compare May 25, 2026 16:12
@fede-kamel fede-kamel merged commit 586daaa into main May 25, 2026
10 checks passed
fede-kamel added a commit that referenced this pull request May 25, 2026
The PR #268 work lands under [Unreleased] following the same shape
as the b21 entries — leading summary paragraph, then enumerated
detail of what ships + what's verified + what's tracked as
follow-up in #269.

No version bump in pyproject.toml — that's a release-manager call
when b22 cuts.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
fede-kamel added a commit that referenced this pull request May 25, 2026
* docs(litellm-gateway): how-to + working example for the LiteLLM AI Gateway in front of OCI

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

* docs(litellm-gateway): notebook 71 + SVG architecture diagram + unit & integration tests

  - examples/notebook_71_litellm_gateway.py — runnable companion to
    the how-to. Health-checks the gateway, builds an Agent around
    OpenAIModel(base_url=...), runs blocking + streaming prompts.
    Self-skips with a wiring banner when LITELLM_GATEWAY_URL /
    LITELLM_GATEWAY_KEY aren't set.

  - docs/img/litellm-gateway-architecture.svg — three-tier SVG flow
    (Locus → LiteLLM Gateway → OCI Generative AI). The middle panel
    itemises every gateway feature so reviewers can see what the
    proxy carries that an in-process wrapper doesn't.

  - docs/notebooks/notebook_71_litellm_gateway.md — notebook md stub
    with the SVG embedded.

  - mkdocs.yml — notebook nav entry next to notebook 70.

  - docs/how-to/litellm-gateway.md — SVG embedded at the top.

  - tests/unit/test_litellm_gateway_example.py — 20 tests, no network.
    Parses config.yaml / docker-compose.yml / helm-values.yaml and
    asserts the documented invariants: alias / docs parity, OCI_* env
    wiring on every upstream entry, drop_params=True, master_key env
    sourced, fallback chains reference declared aliases, compose uses
    ${VAR:?…} strict form, OCI key mounted read-only, helm Service is
    ClusterIP-only, pod hardened (non-root, read-only root, caps
    dropped), README cross-references the artifacts.

  - tests/integration/test_litellm_gateway_live.py — drives the live
    gateway end-to-end through Locus's OpenAIModel: /v1/models health
    check, negative-path unauthenticated rejection, basic completion,
    multi-turn with system message, streaming, tool calling, full
    Agent loop. Auto-skipped when LITELLM_GATEWAY_URL /
    LITELLM_GATEWAY_KEY aren't set; runs from the existing
    _litellm_integration workflow.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

* docs(litellm-gateway): simplify notebook 71 nav label to 'LiteLLM AI Gateway'

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

* docs(litellm-gateway): rebuild SVG without text overlay; drop 'Why no new Locus class' section

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

* docs(litellm-gateway): Postgres sidecar, virtual keys, cost tracking, fallback verified

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

* docs(litellm-gateway): cost-tracking suite + notebook 72 + enterprise patterns

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

* ci(litellm-gateway): kill alias drift + corporate-proxy override

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

* docs(litellm-gateway): compress enterprise section + reframe cost tests as deployment-validation

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

* docs(changelog): add LiteLLM AI Gateway integration entry to Unreleased

The PR #268 work lands under [Unreleased] following the same shape
as the b21 entries — leading summary paragraph, then enumerated
detail of what ships + what's verified + what's tracked as
follow-up in #269.

No version bump in pyproject.toml — that's a release-manager call
when b22 cuts.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

* chore(release): v0.2.0b22 — LiteLLM AI Gateway integration

One PR landed since b21 (#268): Locus is now documented + sampled +
tested as a first-class consumer of the LiteLLM AI Gateway in front
of Oracle Generative AI Infrastructure. Zero new Python code in
Locus, no new dependency added to pyproject.toml — the integration
is a deployment guide + working sample + tests.

Live-verified against real OCI us-chicago-1 (LUIGI_FRA_API tenancy):
  - 7/7 live gateway integration tests
  - 7/7 cost-tracking deployment-validation tests
  - 29/29 unit tests over the shipped sample
  - Fallback chain validated with a broken-on-purpose primary
  - DCO sign-off on every commit
  - mkdocs --strict clean

Four follow-up gateway capabilities (Langfuse observability, Redis
cache, Lakera/Presidio guardrails, OKE helm install) are tracked
in #269 — each becomes its own focused PR with its own live demo.

See CHANGELOG.md for the full breakdown of what ships.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

---------

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add 'litellm' option to LOCUS_MODEL_PROVIDER so all 70 example notebooks can run via LiteLLMModel

1 participant