Skip to content

Add Models list/get, CountTokens, and legacy completions routes#9

Merged
harshaneel merged 3 commits into
mainfrom
hg/sdk-feature-pr1-2026-05
May 18, 2026
Merged

Add Models list/get, CountTokens, and legacy completions routes#9
harshaneel merged 3 commits into
mainfrom
hg/sdk-feature-pr1-2026-05

Conversation

@harshaneel
Copy link
Copy Markdown
Owner

Summary

Adds the six most-used SDK call sites that previously hit the proxy's 404 fallback, all backed by endpoints llama.cpp already exposes:

Route Used by Notes
POST /v1/completions OpenAI legacy Completions.New Passthrough
GET /v1/models OpenAI Models.List Passthrough
GET /v1/models/{id} OpenAI Models.Retrieve Passthrough
GET /v1beta/models genai Models.List Translated from upstream /v1/models
GET /v1beta/models/{name} genai Models.Get Translated
POST /v1beta/models/{m}:countTokens genai Models.CountTokens Calls llama.cpp /tokenize, returns {totalTokens}

The OpenAI handler was refactored from handleOpenAIChatCompletions into a generic handleOpenAIPassthrough(w, r, upstreamURL) that uses the incoming r.Method. A new fetchUpstreamJSON helper handles GET-and-decode for the Gemini-side translations.

What's intentionally not here

  • ComputeTokens — the genai SDK gates this behind BackendVertexAI; calls on BackendGeminiAPI (what localaik targets) error inside the SDK before ever issuing an HTTP request. Documenting in README.
  • Embeddings — separate PR; requires a second llama.cpp instance in the container running an embed model.

Drive-by

examples/go/gemini-structured/main.go had pre-existing gofmt drift that broke make lint on main. Auto-formatted to get CI green; one-line whitespace fix.

Test plan

  • make lint clean
  • make test-unit green
  • go test ./integration green — new SDK contract tests cover Models.List, Models.Get, Models.CountTokens (genai) and Models.List, Models.Get, Completions.New (openai-go)
  • Both code-reviewer agents run; actionable findings addressed (dropped unused GeminiModelNameFromPath helper, improved fetchUpstreamJSON docstring)
  • CI runs make test-integration against the live docker image

🤖 Generated with Claude Code

harshaneel and others added 3 commits May 18, 2026 10:41
Closes the most-used SDK surface gaps for client init and tokenizer
checks. Six new routes, all backed by existing llama.cpp endpoints:

- POST /v1/completions             (OpenAI legacy passthrough)
- GET  /v1/models                  (OpenAI passthrough)
- GET  /v1/models/{id}             (OpenAI passthrough)
- GET  /v1beta/models              (Gemini, translated from /v1/models)
- GET  /v1beta/models/{name}       (Gemini, translated)
- POST /v1beta/models/{m}:countTokens (Gemini, calls llama.cpp /tokenize)

Adds SDK contract tests exercising genai's Models.List / Models.Get /
Models.CountTokens and openai-go's Models.List / Models.Get /
Completions.New against the new routes. Also includes an unrelated
gofmt drive-by on examples/go/gemini-structured/main.go which was
breaking make lint on main.

ComputeTokens is intentionally not implemented: the genai SDK gates it
behind BackendVertexAI and never reaches a Gemini-Developer-API proxy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes five test gaps identified in review:

- Upstream-error → Gemini-error translation for /v1beta/models and
  :countTokens (TestServer*UpstreamError)
- Method confusion: POST /v1/models, PUT /v1beta/models, GET on POST
  routes all return 404 instead of being silently misrouted
- OpenAI model-get path-traversal guard (was Gemini-only)
- Gemini action-verb collision: GET /v1beta/models/foo:bar must 404,
  not route to handleGeminiModelGet
- Auth header (Authorization, X-Goog-Api-Key) stripping on the new
  passthrough routes (legacy completions, models list, model get)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Move closure vars (seenAuth, seenGoogKey, upstream) inside the
  TestServerPassthroughStripsAuthHeaders sub-test so each case is
  fully isolated and the test is safe under future t.Parallel()
- Use nil body on GET sub-cases (was bytes.NewBufferString("{}")) to
  match the convention used elsewhere in the file
- Add POST /v1beta/models to TestServerMethodConfusion — covers the
  prefix-route regression target alongside the existing PUT case

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@harshaneel harshaneel merged commit 0a5de1e into main May 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant