feat: self-contained API keys (read from gbrain's own config, not just env)#121
Open
vinsew wants to merge 1 commit intogarrytan:masterfrom
Open
feat: self-contained API keys (read from gbrain's own config, not just env)#121vinsew wants to merge 1 commit intogarrytan:masterfrom
vinsew wants to merge 1 commit intogarrytan:masterfrom
Conversation
Today embedding.ts and expansion.ts call `new OpenAI()` / `new Anthropic()`
with no arguments, which makes the SDKs read OPENAI_API_KEY /
ANTHROPIC_API_KEY from the process's env. That puts the burden on every
caller — shells, cron jobs, agent subprocesses, daemons — to propagate
those env vars correctly. When a caller's env doesn't have them (e.g.
launchd-spawned daemons, agent terminal tools with sanitized env), the
caller silently gets empty results from `gbrain query` / `gbrain embed`
because the SDK falls back to anonymous API calls that fail.
GBrain already has `openai_api_key` and `anthropic_api_key` fields in
its GBrainConfig schema (src/core/config.ts) and stores them in
~/.gbrain/config.json, but none of the runtime code actually reads
those fields — the config is populated but never consulted. This PR
connects that wiring so gbrain becomes self-contained: callers just
run `gbrain ...` and gbrain finds its own keys.
Changes:
- config.ts: merge ANTHROPIC_API_KEY from env into loaded config
(was silently dropped — only OPENAI_API_KEY was being merged)
- embedding.ts: read openai_api_key from loadConfig() and pass to
`new OpenAI({ apiKey })`. Falls back to SDK's env-default behavior
when config has no key (preserves current behavior for users who
rely on env vars).
- expansion.ts: same pattern for Anthropic.
Usage for callers:
# One-time setup (put keys in gbrain's own config file)
$ cat >> ~/.gbrain/config.json.fragment <<EOF
{"openai_api_key": "sk-...", "anthropic_api_key": "sk-ant-..."}
EOF
$ chmod 600 ~/.gbrain/config.json
# (or edit config.json directly)
# Then from ANY caller, no env vars needed:
$ gbrain query "..." # just works
$ gbrain embed --stale # just works
This is especially valuable for:
- Cron jobs run under launchd/systemd (which don't inherit shell env)
- Agent terminal tools with env sanitization
- Subprocess calls from Python/Node agents without env passthrough
- Docker containers without explicit env forwarding
Precedence (preserved from existing code):
env var > config file
So users who want to override per-process still can.
Impact:
- 4 files changed, +67 / -5 lines
- Zero behavior change for users who already have env vars set
- Callers without env vars in their subprocess context now work IF
the keys are written to ~/.gbrain/config.json
Tests:
- 4 new tests in test/config.test.ts cover: OPENAI env merge,
ANTHROPIC env merge (regression — was missing), both together,
and absence-when-neither.
- All 12 config tests pass; no pre-existing regressions.
98cd225 to
83d3851
Compare
vinsew
added a commit
to vinsew/gbrain
that referenced
this pull request
Apr 14, 2026
GBrain stores internal cross-page references in slug form (e.g. `[Alice](./alice)`) because the slug is the canonical identifier in the DB. That works inside GBrain's own resolution layer. But when those pages are exported as `.md` files on disk and opened in standard markdown viewers (Obsidian, VS Code preview, GitHub web view, typical mkdocs/jekyll renderers), the viewers look for a literal file at `./alice` — which doesn't exist. The actual file is `./alice.md`. Result: every internal link in an exported brain is silently broken on disk. The user clicks `[小龙]` in `龙虾群.md`, sees a 404 / empty page, and cannot navigate the brain outside of GBrain itself. This defeats half the value of having the brain stored as portable markdown. Fix: Add `normalizeInternalLinks(content)` that runs over each page's serialized markdown right before `writeFileSync` and rewrites slug-form internal links to filename-form by appending `.md`: [Alice](./alice) -> [Alice](./alice.md) [Alice](alice) -> [Alice](alice.md) [Alice](../people/alice) -> [Alice](../people/alice.md) [小龙](../people/小龙) -> [小龙](../people/小龙.md) Conservative: leaves untouched anything that looks external or already extended: - URL schemes (http:, https:, mailto:, ftp:, file:, tel:, ...) — skip - Anchors (#section) — skip - Empty targets — skip - Trailing slash (directory references) — skip - Already has any extension (.md, .png, .pdf, .MD, ...) — skip - Preserves query strings and anchors when appending: [Section](./alice#bio) -> [Section](./alice.md#bio) [Search](./alice?q=t) -> [Search](./alice.md?q=t) The DB content stays slug-form (GBrain's internal convention is unchanged). Only the on-disk export gets the `.md` annotation, so the exported markdown is viewable as-is by any standard renderer. Real-world reproduction this fix addresses: $ gbrain put 龙虾群 < <(echo '[小龙](./小龙)') $ gbrain export --dir /tmp/out $ cat /tmp/out/龙虾群.md # before this PR: contains [小龙](./小龙) — clicking 404s # after this PR: contains [小龙](./小龙.md) — clicking opens the file Impact: - 2 files changed, +149 / -1 lines (1 line of helper invocation + ~40 lines of helper + comment + 26 tests) - Zero behavior change for external URLs, anchors, or already-extended links - DB content unchanged — only the on-disk export representation gains the `.md` annotation - Existing exports remain valid (re-running export on an already-exported brain is idempotent because already-extended links are skipped) Tests: - 26 new tests covering: same-dir slug, parent-dir slug, deep nesting, CJK slugs, multiple links per line, multi-line markdown, all 6 external schemes (http/https/mailto/file/ftp/tel), all 4 extension cases (md/png/pdf/uppercase), anchor preservation, query preservation, empty/trailing-slash/no-link edge cases. - All 26 tests pass. - Full suite: 612 pass / no new regressions (4 pre-existing PGLiteEngine failures are unrelated and exist on master). Fifth in a series of practical PRs from a real Chinese-speaking deploy. Companion to: - garrytan#114 (chunker CJK) - garrytan#115 (slugify CJK) - garrytan#119 (sync git quotepath CJK) - garrytan#121 (self-contained API keys) Same theme: GBrain is meaningfully more useful when the markdown export is a first-class deliverable, not a half-broken side-effect.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Today
src/core/embedding.tsandsrc/core/search/expansion.tscallnew OpenAI()/new Anthropic()with no arguments, which makes the SDKs readOPENAI_API_KEY/ANTHROPIC_API_KEYfrom the process's env. That puts the burden on every caller — shells, cron jobs, agent subprocess tools, daemons, containers — to propagate those env vars correctly.When a caller's subprocess env doesn't have them (common for launchd-spawned daemons, agent terminal tools with env sanitization, or any subprocess without explicit env forwarding), the caller silently gets empty results from
gbrain query/gbrain embedbecause the SDK falls back to anonymous API calls that fail with 401 (then throw, then get caught by gbrain's retry/fallback logic, then return "No results").The debugging experience is painful: the user sees "No results" in their agent's output, assumes the brain is empty, doesn't realize the subprocess env is missing keys. I hit this personally running
gbrain queryfrom a launchd-managed agent's terminal tool — the .zshrc-sourced keys were in my interactive shell but not in the daemon's env, and there's no surface-level error telling you that.The existing schema is almost there — just unconnected
GBrainConfig(src/core/config.ts:10-16) already definesopenai_api_keyandanthropic_api_keyfields, andsaveConfig()writes them to~/.gbrain/config.jsonwith 0600 perms. But none of the runtime code actually reads those fields — the config is populated but never consulted. This PR just connects the wiring.Also fixed:
loadConfig()was silently droppingANTHROPIC_API_KEYfrom the env merge (onlyOPENAI_API_KEYwas being merged on line 43). Minor but present bug.Fix
Three small changes:
config.ts: mergeANTHROPIC_API_KEYenv var into loaded config alongsideOPENAI_API_KEY(was silently dropped).embedding.ts: readopenai_api_keyfromloadConfig()and pass tonew OpenAI({ apiKey }). Falls back tonew OpenAI()(SDK env-default) when config has no key — preserves current behavior for users who rely on env vars.expansion.ts: same pattern for Anthropic.Precedence preserved from existing code:
env var > config file(becauseloadConfig()merges env over file). Users who want to override per-process still can.Usage for callers (new capability)
One-time setup:
Then from any caller — no env vars needed:
Especially valuable for:
-e OPENAI_API_KEYpassthroughImpact
~/.gbrain/config.jsonsaveConfig)Test plan
test/config.test.ts:OPENAI_API_KEYenv merges into configANTHROPIC_API_KEYenv merges into config (regression — was missing)bun testsuite: no new regressions (the 4 pre-existingPGLiteEnginefailures are unrelated and exist onmaster)Context
Fourth in a series of PRs from real-user setup on a Chinese-speaking deployment: #114 (chunker CJK), #115 (slugify CJK), #119 (sync CJK via core.quotepath), and now this one (key portability). Each addresses a distinct anti-pattern surfaced by running gbrain outside the "English-speaker, interactive-shell, env-vars-in-.zshrc" default assumption.
This PR is the only one that's a feature, not a bug fix — but it makes gbrain meaningfully more embeddable as the knowledge backbone for cron-driven agents and daemons, which is the direction the SKILLPACK itself advocates.