An OpenAI-compatible API endpoint for the Apple Foundation Models CLI (fm).
macOS 27.0 ships a beta fm CLI to run Apple Foundation Models
on-device (system) and on Apple's Private Cloud Compute (pcc). It includes a Chat Completions local server (fm serve), but tool-calling schema support is limited and token-usage reporting is broken.
fm-proxy sits in front of fm serve to give you an OpenAI-compatible Chat Completions endpoint. Point any OpenAI client at the local URL and use Apple's models with no code changes.
⚠️ Beta. This is built on macOS 27.0 beta so expect breaking changes with system updates.
- Chat completions — streaming and non-streaming.
- Fixed tool / function calling — Flattens to the subset
fm serveaccepts (nested parameters are losslessly round-tripped as JSON). Accepts standard OpenAItools, including rich nested schemas. - Fixed token counts —
fm servereportsprompt_tokensas0; the proxy fills in real token counts so context window gauges work. - Added vision support — standard
image_urlcontent parts with base64 data URLs. - Enabled CORS — Browser-based clients can connect directly.
- OpenAI-shaped errors — failures come back typed (
rate_limit_exceeded,service_unavailable) so clients can branch on the cause. A mid-stream safety filter abort ends the completion asfinish_reason:"content_filter"with any partial output preserved (no exception thrown).
Includes the native GET /v1/models and GET /health endpoints as straight passthroughs to fm serve.
- macOS 27.0 (ships with
fmCLI baked in). - Signed in with your Apple Account and Apple Intelligence enabled.
- The
pccmodel runs on Private Cloud Compute and needs that you to be signed in. - The
systemmodel is available locally.
- The
- Node.js (v18+). The proxy uses only Node's standard library, no
npm install.
Starts the proxy, then runs fm serve in the foreground:
./fm-launch.shWhen it prints stack up — OpenAI base URL: http://127.0.0.1:1977/v1, you're good to go.
fm serve must run in the foreground. macOS only grants PCC (Private Cloud Compute) attribution to a foreground, TTY-attached
fm serve. Backgrounding it, under node, or with a shell&, makes everypccrequest fail with"not available in this context"(HTTP 503), whilesystemkeeps working.The launcher runs
fm servein the foreground (blocking the terminal it was launched in) and the proxy as a backgrounded child. Use Ctrl-C to stop (it reaps the proxy); don't Ctrl-Z — a suspendedfm servewon't be cleaned up and will strand the port.
Point any OpenAI client at:
- Base URL:
http://127.0.0.1:1977/v1 - API key: (required but ignored) use any dummy key (ex:
sk-7777777) - Models:
system(on-device)pcc(Private Cloud Compute).
curl http://127.0.0.1:1977/v1/chat/completions \
-H "Authorization: Bearer sk-local" \
-H "Content-Type: application/json" \
-d '{"model":"pcc","messages":[{"role":"user","content":"Say hello in one word."}]}'from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:1977/v1", api_key="sk-local")
print(client.chat.completions.create(
model="pcc",
messages=[{"role": "user", "content": "Say hello in one word."}],
).choices[0].message.content)If you'd rather run the two processes yourself:
/usr/bin/fm serve --port 1976 # Apple's engine (keep it in the FOREGROUND — see below)
node fm-proxy.js # the proxy (listens on :1977 → :1976)Run fm serve in its own terminal, in the foreground. Backgrounding it (or running it under another process) loses attribution and pcc will return 503 errors. fm-proxy handles this for you automatically. Running it in this manual form is the same thing, just split across two terminals.
./fm-launch.sh [options]
-v, --verbose show the proxy's per-request [assembled] telemetry (errors/warnings are always shown, even without this)
--fm-port <n> fm serve port (default 1976)
--proxy-port <n> proxy port clients use (default 1977)
--fm-bin <path> fm binary (default /usr/bin/fm)
--health-timeout <ms> how long to wait for fm serve (default 20000)
-h, --helpFM_PORT and PROXY_PORT env vars are also accepted as alternatives to --fm-port / --proxy-port.
node --testConsider this an experimental beta, and not deeply tested:
I've seen distinct mid-stream failure modes on pcc:
- The model emits valid output, then
fm serveinterrupts with a safety-guardrail abort (surfaces asfinish_reason:"content_filter") - Random transient rate-limiting (
fm-proxyhas built-in retries, and surfacesrate_limit_exceeded) - When PCC attribution is missing (e.g.
fm servegot backgrounded), aservice_unavailable503.
Exactly what triggers each is unverified. Apple's error messaging is generic, and these are what I've been able to deduce after testing. The proxy classifies the errors so clients can react appropriately instead of guessing or erroring out.
Because fm serve is part of macOS 27.0 beta, its request/response behavior, schema support, and error semantics may change between builds. Which can change how this proxy works. Expect to update the proxy as the betas evolve.
Known limits: nested structured output is approximated rather than strictly enforced; n > 1 isn't supported; sampling parameters are passed through as-is.
See AGENTS.md for the deeper technical notes (schema flattening,
token accounting, the PCC context ceiling, and the structured-output situation).
| Path | What |
|---|---|
fm-proxy.js |
The proxy (the app). |
fm-launch.sh |
One-command launcher — runs fm serve foreground (required for PCC) + the proxy. |
fm-proxy.test.js |
Unit + integration tests. |
AGENTS.md |
Deep technical notes / runbook. |
docs/ |
fm CLI reference and PCC findings. |
tools/ |
Dev utilities — gen-fm-docs.py regenerates docs/fm-reference.md from the installed binary. |
MIT.