Add Models list/get, CountTokens, and legacy completions routes by harshaneel · Pull Request #9 · harshaneel/localaik

harshaneel · 2026-05-18T17:42:09Z

Summary

Adds the six most-used SDK call sites that previously hit the proxy's 404 fallback, all backed by endpoints llama.cpp already exposes:

Route	Used by	Notes
`POST /v1/completions`	OpenAI legacy `Completions.New`	Passthrough
`GET /v1/models`	OpenAI `Models.List`	Passthrough
`GET /v1/models/{id}`	OpenAI `Models.Retrieve`	Passthrough
`GET /v1beta/models`	genai `Models.List`	Translated from upstream `/v1/models`
`GET /v1beta/models/{name}`	genai `Models.Get`	Translated
`POST /v1beta/models/{m}:countTokens`	genai `Models.CountTokens`	Calls llama.cpp `/tokenize`, returns `{totalTokens}`

The OpenAI handler was refactored from handleOpenAIChatCompletions into a generic handleOpenAIPassthrough(w, r, upstreamURL) that uses the incoming r.Method. A new fetchUpstreamJSON helper handles GET-and-decode for the Gemini-side translations.

What's intentionally not here

ComputeTokens — the genai SDK gates this behind BackendVertexAI; calls on BackendGeminiAPI (what localaik targets) error inside the SDK before ever issuing an HTTP request. Documenting in README.
Embeddings — separate PR; requires a second llama.cpp instance in the container running an embed model.

Drive-by

examples/go/gemini-structured/main.go had pre-existing gofmt drift that broke make lint on main. Auto-formatted to get CI green; one-line whitespace fix.

Test plan

make lint clean
make test-unit green
go test ./integration green — new SDK contract tests cover Models.List, Models.Get, Models.CountTokens (genai) and Models.List, Models.Get, Completions.New (openai-go)
Both code-reviewer agents run; actionable findings addressed (dropped unused GeminiModelNameFromPath helper, improved fetchUpstreamJSON docstring)
CI runs make test-integration against the live docker image

🤖 Generated with Claude Code

Closes the most-used SDK surface gaps for client init and tokenizer checks. Six new routes, all backed by existing llama.cpp endpoints: - POST /v1/completions (OpenAI legacy passthrough) - GET /v1/models (OpenAI passthrough) - GET /v1/models/{id} (OpenAI passthrough) - GET /v1beta/models (Gemini, translated from /v1/models) - GET /v1beta/models/{name} (Gemini, translated) - POST /v1beta/models/{m}:countTokens (Gemini, calls llama.cpp /tokenize) Adds SDK contract tests exercising genai's Models.List / Models.Get / Models.CountTokens and openai-go's Models.List / Models.Get / Completions.New against the new routes. Also includes an unrelated gofmt drive-by on examples/go/gemini-structured/main.go which was breaking make lint on main. ComputeTokens is intentionally not implemented: the genai SDK gates it behind BackendVertexAI and never reaches a Gemini-Developer-API proxy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes five test gaps identified in review: - Upstream-error → Gemini-error translation for /v1beta/models and :countTokens (TestServer*UpstreamError) - Method confusion: POST /v1/models, PUT /v1beta/models, GET on POST routes all return 404 instead of being silently misrouted - OpenAI model-get path-traversal guard (was Gemini-only) - Gemini action-verb collision: GET /v1beta/models/foo:bar must 404, not route to handleGeminiModelGet - Auth header (Authorization, X-Goog-Api-Key) stripping on the new passthrough routes (legacy completions, models list, model get) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Move closure vars (seenAuth, seenGoogKey, upstream) inside the TestServerPassthroughStripsAuthHeaders sub-test so each case is fully isolated and the test is safe under future t.Parallel() - Use nil body on GET sub-cases (was bytes.NewBufferString("{}")) to match the convention used elsewhere in the file - Add POST /v1beta/models to TestServerMethodConfusion — covers the prefix-route regression target alongside the existing PUT case Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

harshaneel and others added 3 commits May 18, 2026 10:41

harshaneel merged commit 0a5de1e into main May 18, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Models list/get, CountTokens, and legacy completions routes#9

Add Models list/get, CountTokens, and legacy completions routes#9
harshaneel merged 3 commits into
mainfrom
hg/sdk-feature-pr1-2026-05

harshaneel commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

harshaneel commented May 18, 2026

Summary

What's intentionally not here

Drive-by

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant