Skip to content

feat(storage): PgDeepLakeTransport — direct self-hosted Postgres (pg_deeplake) backend#130

Open
chrisl10 wants to merge 2 commits into
legioncodeinc:mainfrom
chrisl10:feat/pg-deeplake-transport
Open

feat(storage): PgDeepLakeTransport — direct self-hosted Postgres (pg_deeplake) backend#130
chrisl10 wants to merge 2 commits into
legioncodeinc:mainfrom
chrisl10:feat/pg-deeplake-transport

Conversation

@chrisl10

@chrisl10 chrisl10 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What

Adds a second DeepLakeTransport implementation, PgDeepLakeTransport, that connects honeycomb directly to a self-hosted Activeloop pg_deeplake Postgres backend (postgres://...) instead of an HTTP DeepLake gateway. It sits behind the existing transport seam in createStorageClient, so the storage client, heal, write, and vector layers are identical regardless of backend. Selection is automatic: a postgres:// / postgresql:// endpoint wires the pg transport; anything else keeps HttpDeepLakeTransport.

Why

pg_deeplake (the open-source Postgres extension, quay.io/activeloopai/pg-deeplake) speaks honeycomb's SQL dialect natively (USING deeplake, float4[768], <#>, deeplake_index), so honeycomb can run fully self-hosted with no per-query cloud cost. Today the only transport is the HTTP gateway; this lets an operator point straight at their own Postgres, no gateway in the middle.

Contract baked in (so nobody re-discovers it)

Two non-obvious requirements, validated against a live pg_deeplake, are encoded in the transport with comments:

  • (G1) Raw error text. heal.ts classifyFailure regex-matches raw pg messages like relation "x" does not exist. The transport forwards err.message unmodified into TransportError("query", ...). JSON-wrapping would escape the quotes (relation \"x\") and silently break schema-heal.
  • (G2) Workspace = Postgres schema. honeycomb introspects information_schema.columns WHERE table_schema = '<workspace>' and uses unqualified table names, so the transport does CREATE SCHEMA IF NOT EXISTS + SET search_path per workspace (with safe identifier quoting). This also gives per-workspace isolation for free.

Details

  • pg is an optionalDependency, lazy-imported (await import("pg")) only when a postgres:// endpoint selects this backend, and externalized in the daemon esbuild target, so cloud bundles stay lean and HTTP-only users never need pg on disk.
  • The pool factory is injectable for testing (no live Postgres needed).
  • Honors the client's per-statement abort signal (races the query, destroys the connection on timeout so an in-flight statement cannot linger on a pooled connection, and handles the already-aborted-during-setup edge case).

Testing

tests/daemon/storage/pg-transport.test.ts (fake pool): schema/search_path ordering, per-workspace schema caching, row passthrough, raw-error preservation (asserts classifyFailure still returns missing-table), workspace identifier quote-escaping (injection attempt), abort-destroys-connection, already-aborted-no-connect, and connect-failure to connection error. npm run typecheck, npm run dup, and the storage suite are green.

Open questions

  • Bundle externalization: I added pg to the daemon target external list; happy to match your preferred convention.
  • Pairs with the login --endpoint PR for setup UX, but is independent.
  • Context: validated end to end (real honeycomb to self-hosted pg_deeplake over a private network) before submitting.

Summary by CodeRabbit

  • New Features

    • Added support for self-hosted Postgres storage endpoints via a native Postgres transport.
    • Improved storage client startup with lazy initialization (client is created on first use).
  • Bug Fixes

    • Prevented the daemon bundle from embedding the optional Postgres driver.
    • Improved timeout, connection, and query error handling for Postgres-backed storage.
    • Strengthened workspace isolation (schema creation and search_path) and safer SQL handling.
  • Tests

    • Added a comprehensive test suite for Postgres transport behavior, including abort handling and error mapping.

…ckend

Add a second DeepLakeTransport behind the existing pluggable seam that
connects DIRECTLY to an Activeloop pg_deeplake Postgres URL, with no HTTP
gateway in between. This makes self-hosting honeycomb's storage backend a
first-class option.

The transport encodes the reverse-engineered contract proved end to end:

- Workspace maps to its own Postgres schema. honeycomb introspects
  information_schema.columns WHERE table_schema = '<workspace>' and then
  uses unqualified table names, so the transport runs CREATE SCHEMA IF NOT
  EXISTS (cached per workspace) and SET search_path on every checkout.
- Raw error text is passed through unmodified into TransportError("query",
  ...). heal.classifyFailure regex-matches the raw pg message (for example
  relation "x" does not exist), so the message is never JSON-wrapped (that
  would escape the quotes and break schema-heal).
- pg_deeplake speaks honeycomb's SQL dialect natively (USING deeplake,
  float4[768], the <#> cosine operator, deeplake_index BM25), so the
  transport is a pure passthrough that forwards req.sql verbatim.
- Identifiers are quote-escaped via a quoteIdent helper, never concatenated
  raw, so a workspace name cannot inject SQL.
- req.signal is honored: an already-aborted signal throws a timeout before a
  connection is opened; otherwise the query races the abort, the in-flight
  statement is destroyed on abort (release with a truthy arg) so it does not
  linger on a pooled connection, and the losing promise is settled to avoid
  an unhandled rejection.

Selection happens in createDefaultTransport: a postgres:// or postgresql://
endpoint picks PgDeepLakeTransport, anything else keeps HttpDeepLakeTransport.
The helper is reused by both the eager and lazy client factories.

pg is added as an optionalDependency and dynamic-imported only when this
backend is used, with @types/pg in devDependencies, and is externalized in
the daemon esbuild target so cloud bundles stay lean.

Tests inject a fake pool via the poolFactory param and cover schema setup
ordering, row passthrough, raw error preservation (heal still classifies),
identifier escaping, and the abort and connection mappings.
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c0fb711d-c6b5-47be-a0ec-4207f56472ad

📥 Commits

Reviewing files that changed from the base of the PR and between 29f94c1 and b67e8de.

📒 Files selected for processing (1)
  • src/daemon/storage/pg-transport.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/daemon/storage/pg-transport.ts

📝 Walkthrough

Walkthrough

The daemon storage path now supports optional pg-backed Postgres endpoints, with bundling and dependency updates to keep pg external. The storage client selects Postgres or HTTP transport from the endpoint, and a new lazy client defers creation until first use.

Changes

Self-hosted Postgres storage

Layer / File(s) Summary
Bundle support for pg
esbuild.config.mjs, package.json
pg is added as an optional runtime dependency, @types/pg is added for development, and the daemon bundle marks pg as external.
Transport selection wiring
src/daemon/storage/index.ts
PgDeepLakeTransport and PgPoolFactory are re-exported, createDefaultTransport() selects Postgres or HTTP by endpoint, createStorageClient() uses that helper, and createLazyStorageClient() defers real client creation until first use.
Pg query path
src/daemon/storage/pg-transport.ts
PgDeepLakeTransport adds the dynamic pg pool factory, schema quoting/caching, search_path setup, passthrough SQL execution, and abort/error handling.
Pg transport tests
tests/daemon/storage/pg-transport.test.ts
Tests cover schema ordering, row passthrough, raw error text, identifier escaping, abort behavior, and connect() failures using fake pg infrastructure.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A bunny hopped through pg in a flash,
With schemas and queries that never would crash.
search_path was set,
And I nibbled with glee,
While Postgres and carrots both danced with me 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main change: a new PgDeepLakeTransport for direct self-hosted Postgres storage.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@chrisl10

Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

github-actions Bot added a commit that referenced this pull request Jun 26, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/daemon/storage/index.ts (2)

271-281: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Narrow the deferred-build catch to missing credentials only.

Line 276 catches every createStorageClient failure, so malformed endpoints or construction bugs are reported as “no DeepLake credential resolved yet.” Only swallow the known missing-credential StorageConfigError; preserve or rethrow other failures with their original message.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/daemon/storage/index.ts` around lines 271 - 281, The deferred build logic
in tryBuild currently swallows every createStorageClient failure, which
incorrectly treats non-credential problems as missing credentials. Update the
catch in tryBuild inside the storage client builder to only handle
StorageConfigError as the deferred “no credential yet” case, and rethrow or
preserve all other errors so malformed endpoints or construction bugs surface
with their original message.

289-293: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Keep endpoint side-effect free.

Line 292 initializes the real client from a property read, which breaks the lazy “first query/connect” contract and contradicts the comment about not reading credentials here. Return the already-built endpoint only.

Suggested fix
 		get endpoint(): string {
 			// Before the first successful build there is no resolved endpoint; report a stable
 			// placeholder (no secret, diagnostics-only) rather than reading creds eagerly here.
-			return tryBuild()?.endpoint ?? "pending-credentials";
+			return built?.endpoint ?? "pending-credentials";
 		},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/daemon/storage/index.ts` around lines 289 - 293, The endpoint getter is
triggering client initialization via tryBuild(), which makes it side-effectful
and violates the lazy “first query/connect” contract. Update the storage proxy’s
endpoint property to only return the endpoint from an already-built client
instance, and do not call any build or credential-reading logic inside the
getter. Use the endpoint accessor in src/daemon/storage/index.ts and keep it as
a pure read of existing state.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/daemon/storage/pg-transport.ts`:
- Around line 125-145: The checkout/setup path in pg-transport still ignores
aborts until after pool.connect(), ensureSchema(), and the SET search_path query
complete, so a timeout can leave the request blocked and the client unreleased.
Update the connection setup flow in the transport method to race these awaits
against the same abort handling already used for req.sql, and ensure the abort
path drives the existing destroy/release logic for the checked-out pg client.

In `@tests/daemon/storage/pg-transport.test.ts`:
- Around line 185-191: The abort in this pg transport test can happen before the
in-flight SELECT is actually running, so the test may miss the
connection-destruction path. Update the test around transport.query, makeReq,
and the FakeClient/FakePool setup so abort is triggered only after a
SELECT-start signal from blockFor (or an equivalent marker) confirms
client.query("SELECT pg_sleep(60)") has begun. This should ensure the pending
query is truly active before controller.abort() is called and the error is
asserted.

---

Outside diff comments:
In `@src/daemon/storage/index.ts`:
- Around line 271-281: The deferred build logic in tryBuild currently swallows
every createStorageClient failure, which incorrectly treats non-credential
problems as missing credentials. Update the catch in tryBuild inside the storage
client builder to only handle StorageConfigError as the deferred “no credential
yet” case, and rethrow or preserve all other errors so malformed endpoints or
construction bugs surface with their original message.
- Around line 289-293: The endpoint getter is triggering client initialization
via tryBuild(), which makes it side-effectful and violates the lazy “first
query/connect” contract. Update the storage proxy’s endpoint property to only
return the endpoint from an already-built client instance, and do not call any
build or credential-reading logic inside the getter. Use the endpoint accessor
in src/daemon/storage/index.ts and keep it as a pure read of existing state.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a0163f41-2b65-48be-8eba-03de0fd0032f

📥 Commits

Reviewing files that changed from the base of the PR and between 0cd392b and 29f94c1.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (5)
  • esbuild.config.mjs
  • package.json
  • src/daemon/storage/index.ts
  • src/daemon/storage/pg-transport.ts
  • tests/daemon/storage/pg-transport.test.ts

Comment thread src/daemon/storage/pg-transport.ts Outdated
Comment on lines +185 to +191
const pool = new FakePool(() => new FakeClient({ blockFor: (sql) => sql.startsWith("SELECT") }));
const transport = transportWith(pool);
const { req, controller } = makeReq("SELECT pg_sleep(60)", "ws");

const pending = transport.query(req);
controller.abort();
const err = await pending.catch((e) => e);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate the test file and related transport implementation.
git ls-files | rg '^(tests/daemon/storage/pg-transport\.test\.ts|src/.*pg-transport.*|src/.*transport.*|tests/.*fake.*)$'

echo '--- test file outline ---'
ast-grep outline tests/daemon/storage/pg-transport.test.ts --view expanded || true

echo '--- relevant transport references ---'
rg -n "query\(req\)|blockFor|FakePool|FakeClient|pg_sleep|controller\.abort|SELECT" tests/daemon/storage/pg-transport.test.ts src tests -g '!**/node_modules/**' | sed -n '1,220p'

Repository: legioncodeinc/honeycomb

Length of output: 27486


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show the exact test section around the mid-query abort case.
sed -n '150,230p' tests/daemon/storage/pg-transport.test.ts | cat -n

# Inspect the fake pool/client helpers if they are in this test file or nearby.
rg -n "class FakePool|class FakeClient|function FakePool|function FakeClient|blockFor" tests/daemon/storage -g '!**/node_modules/**' | sed -n '1,220p'

Repository: legioncodeinc/honeycomb

Length of output: 4653


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the transport implementation and the fake client helper around the abort case.
sed -n '1,260p' src/daemon/storage/pg-transport.ts | cat -n
echo '---'
sed -n '1,120p' tests/daemon/storage/pg-transport.test.ts | cat -n

Repository: legioncodeinc/honeycomb

Length of output: 16758


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the transport implementation and the fake client helper around the abort case.
sed -n '1,260p' src/daemon/storage/pg-transport.ts | cat -n
echo '---'
sed -n '1,120p' tests/daemon/storage/pg-transport.test.ts | cat -n

Repository: legioncodeinc/honeycomb

Length of output: 16758


Wait for the SELECT to start before aborting. The current test can abort during schema setup, before client.query("SELECT pg_sleep(60)") is reached, so it doesn’t guarantee the in-flight query path. Gate the abort on a signal from blockFor (or another SELECT-start marker) so this case actually exercises connection destruction.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/daemon/storage/pg-transport.test.ts` around lines 185 - 191, The abort
in this pg transport test can happen before the in-flight SELECT is actually
running, so the test may miss the connection-destruction path. Update the test
around transport.query, makeReq, and the FakeClient/FakePool setup so abort is
triggered only after a SELECT-start signal from blockFor (or an equivalent
marker) confirms client.query("SELECT pg_sleep(60)") has begun. This should
ensure the pending query is truly active before controller.abort() is called and
the error is asserted.

Race the CREATE SCHEMA and SET search_path setup statements against the
abort signal too, not just the final query, so a per-statement timeout
landing during checkout/setup is honored instead of blocking until the
setup awaits settle. Addresses CodeRabbit review on legioncodeinc#130.
@chrisl10

Copy link
Copy Markdown
Contributor Author

Addressed the abort finding in b67e8de — the schema-setup statements (CREATE SCHEMA / SET search_path) now race the abort signal too, so a per-statement timeout during checkout/setup is honored rather than blocking until those awaits settle.

On the static-analyzer SQL injection flags for SET search_path TO ${quoteIdent(...)}: these are false positives. SQL identifiers (schema names) cannot be bound as query parameters, so the correct mitigation is identifier quoting, which quoteIdent() does (doubling embedded quotes per the Postgres rule). The workspace value is also already constrained upstream. The repo's own npm run audit:sql gate passes this code. Happy to add an allowlist on the workspace name as defense-in-depth if you'd prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant