Releases: lordbasilaiassistant-sudo/keylessai
Releases · lordbasilaiassistant-sudo/keylessai
v0.4.0 — tool calling end-to-end
Tool calling
The tools field is no longer silently dropped. tool_calls round-trips end to end on the public Worker (https://keylessai.thryx.workers.dev/v1) and the local proxy.
Added
- OpenAI tool calling —
POST /v1/chat/completionswith atoolsarray now returnsmessage.tool_calls(non-stream) ordelta.tool_callsSSE deltas (stream).tool_choice,parallel_tool_calls, androle: "tool"reply messages are all forwarded through the pipeline.finish_reasoncorrectly reports"tool_calls"when the model emits a call. - Provider capability flags — every provider exports
capabilities = { tools: bool }. Pollinations + ApiAirforce bothtrue; Pollinations-GET + Yqcloudfalse. Custom providers registered viaregisterProvider()default tofalsefor safety. - Tool-aware failover — when a request includes
tools, the router filtersFAILOVER_ORDERto providers that advertisecapabilities.tools. If none qualify, throwsToolsUnsupportedError(mapped to a 400 withcode: "tool_calls_unsupported") instead of silently degrading to a non-tool provider. providerSupportsTools(id)+ToolsUnsupportedErrorexported from the package surface.- Tool schema validation (
src/server/validate.js) —tools.length ≤ 128,function.name≤ 64 chars + charset[a-zA-Z0-9_-]+,tool_choiceshape ("auto" | "none" | "required" | {type, function}),parallel_tool_callsboolean. examples/tool-calling.js— runnable two-turn round-trip with the OpenAI Node SDK.- 27 new tests: 9 happy-path (
test/tools.test.mjs) + 18 adversarial (test/tools.extreme.test.mjs) — char-by-char streaming, parallel tool calls, prototype-pollution payloads, mid-stream errors, cache poisoning attempts, all-providers-circuit-open. 127/127 passing.
Changed
- Cache bypass for tool-bearing requests — proxy + Worker skip
defaultCacheentirely whenbody.toolsis present. Tool-call payloads are inherently non-idempotent (eachcall_idparticipates in a turn-by-turn round trip with the client). - Worker version bumped to 0.4.0 — visible at
GET /health.
Fixed
- Pollinations + ApiAirforce streamers previously parsed
delta.tool_callsfrom the upstream SSE but discarded it. They now emit{type: "tool_call_delta", index, id?, name?, arguments?}chunks that propagate through the router untouched.
Drop-in usage
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://keylessai.thryx.workers.dev/v1",
apiKey: "not-needed",
});
const res = await client.chat.completions.create({
model: "openai-fast",
messages: [{ role: "user", content: "What's the weather in NYC?" }],
tools: [{ type: "function", function: { name: "get_weather", parameters: { type: "object", properties: { city: { type: "string" } } } } }],
});
console.log(res.choices[0].message.tool_calls);Full diff: v0.3.0...v0.4.0
v0.2.1 — dogfood + reliability
Reliability + dogfood pass. No breaking changes. Upgrade is safe for anyone on 0.2.0.
Added
- Dogfood test harness (
dogfood/) — runnable end-to-end tests that actually exercise real third-party SDKs (OpenAI Node, OpenAI Python, LangChain) + raw curl + /v1/completions + /v1/embeddings. Nightly CI workflow runs them and uploads 30-day transcript artifacts. - Stream watchdog (
src/core/stream.js) — heartbeat (45s) + overall deadline (180s) + fetch-level timeout (30s). Providers'streamChatnow usesreadWithWatchdog()so upstream silent-hangs trigger failover instead of hanging forever. - Circuit breaker (
src/core/circuit.js) — 5 consecutive failures opens the circuit per provider, 30s cooldown, then half-open. Router skips open providers instantly. - Per-provider latency metrics (
src/core/metrics.js) — rolling p50/p95 TTFB + success rate over the last 100 samples per provider. Exposed on/health. - Graceful shutdown —
server.drain(graceMs)waits for in-flight requests to finish before exiting; CLI SIGINT/SIGTERM uses it. - Request body cap — 1 MiB limit on POST bodies with a clean 413 response. Prevents OOM from hostile/broken clients.
/v1/completionslegacy endpoint (wrapspromptas a chat message)./v1/embeddings501 stub with a helpful error pointing users at self-hosted options.- CLI
doctorsubcommand — provider health + latency + model list + end-to-end smoke test. - CHANGELOG.md and SECURITY.md.
- CONTRIBUTING.md, issue templates, PR template.
- CSP, Referrer-Policy, X-Content-Type-Options, Permissions-Policy meta tags.
:focus-visiblering, ARIA labels, mobile 44px tap targets,prefers-reduced-motionsupport.- Aggregator pool stats strip on hero (honest live-verified vs upstream-tracked counts).
- Chat history persistence across page reloads (localStorage, capped at 50 turns).
- Markdown rendering in chat with safe code-block copy + XSS-tested link-scheme filter.
- Retry + switch-provider actions on error bubbles; copy + regenerate actions on assistant messages.
- Thinking indicator while model emits reasoning tokens before real content.
- Auto-deploy to GitHub Pages via Actions.
- Daily provider catalog sync from
Free-AI-Things/g4f-working(183 tracked models across 13 upstream providers). - Test suite grew from 0 → 60 tests across 8 modules, gated in CI.
- JSDoc on all public exports.
.gitattributes,.nvmrc,.vscode/settings.jsonfor contributor consistency.
Changed
- Organized into
src/{ui,core,server}— clean browser / shared / Node-only split. - Split 518-line
app.jsinto focused modules:storage,suggestions,messages,pool-stats. - Router auto mode is now fast-fail: no health-check round-trip, no same-provider retry, instant failover. Pinned mode keeps modest retry budgets.
- Cache key now includes
temperature,top_p,tools,response_format— fixes silent correctness bug where high-temperature callers got deterministic cached replies. - Extracted notice detection to
src/core/notices.jswith 9 tests covering real-world samples. - Extracted drawer endpoint data to
src/ui/drawer-endpoints.jsfor easier editing.
Fixed
- CSS specificity — inline-code styling was leaking into fenced code blocks.
- Flaky CI queue test (
.unref()let event loop exit before timer fired). - Send button was
disabledduring streaming, so clicking it couldn't abort (only Enter worked).
v0.2.0 — multi-provider aggregator + production proxy
Second shipload. Focus: real multi-provider aggregation, production-ready
proxy, and repo maturity for collaborators and forks.
Added
- Second real provider: ApiAirforce (
providers/airforce.js) with 8 free-tier models (grok-4.1-mini:free,step-3.5-flash:free, etc.), inline<think>stripping, CORS-open - Proxy prompt cache (
src/core/cache.js) — LRU + 5-min TTL, 15× speedup on identical repeat calls, exposed stats on/health - Client-side single-flight queue (
src/core/queue.js) — serializes parallel callers to stay under Pollinations' 1-concurrent-per-IP limit; eliminates "Queue full" 429s - Notice / ad injection detection + retry in the router — auto-retries with exponential backoff when providers return promo URLs or deprecation notices instead of real responses
- Model name aliasing in the proxy — your code can send
gpt-4o,gpt-4o-mini,claude-3-5-sonnet-latest, etc. and the proxy transparently routes to the current anonymous-tier model - CLI
doctorsubcommand — provider health checks with latency, live model lists, slot gate + cache stats, end-to-end smoke test - Auto-deploy to GitHub Pages via Actions (ships in ~30s)
- Daily provider catalog sync from Free-AI-Things/g4f-working — auto-commits an updated
providers/_catalog.jsonwhen upstream changes - Test suite (32 tests): markdown renderer (XSS safety, code blocks, safe link schemes), LRU cache (TTL, eviction, stats), slot gate (serialization, timeouts, overflow), storage (localStorage round-tripping, corruption recovery)
- CI gate on every push and PR (
.github/workflows/test.yml) - Aggregator stats strip on the hero: honest live-verified counts vs upstream-tracked counts
- Chat persistence across page reloads (localStorage, capped at 50 turns)
- Retry + switch-provider actions on error bubbles
- Copy + regenerate actions on assistant messages
- Markdown rendering in chat (safe, XSS-tested, code blocks with copy button, lists, headings)
- Suggestion chips on empty state
- CSP, Referrer-Policy, X-Content-Type-Options, Permissions-Policy meta tags
:focus-visiblering, ARIA labels, mobile 44px tap targets,prefers-reduced-motionsupport- CONTRIBUTING.md, SECURITY.md, issue + PR templates
- JSDoc on all public exports (
src/index.js,core/router.js,core/cache.js,core/queue.js) .gitattributesnormalizing line endings (no more CRLF warnings)
Changed
- Reorganized into
src/{ui,core,server}— browser-only / shared-runtime / Node-only are now separate concerns - Split the 518-line
app.jsgod file into focused modules: storage, suggestions, messages, pool-stats - Hero copy rewritten to lead with direct URL swap path (no local compute required)
- README reframed around the aggregation + spam-filtering value over raw Pollinations
- Router serializes through the slot gate, detects spam, and auto-fails over across the pool
Removed
- WebLLM / Ollama / LM Studio providers — required user local compute, violates the "zero user compute" thesis. Purged from every public asset (README, AGENTS.md, llms.txt, llms-full.txt, index.html meta + JSON-LD, og-image.svg, system prompt, repo description, topics)
Fixed
- CSS specificity leak — inline-code styling was bleeding into fenced code blocks
- Flaky queue timeout test on CI (
.unref()allowed event loop to exit before timer fired) - Pollinations deprecation notice was sometimes served instead of real response — now detected and retried