Skip to content

Bulldozer DB#1285

Closed
N2D4 wants to merge 42 commits intodevfrom
bulldozer-db
Closed

Bulldozer DB#1285
N2D4 wants to merge 42 commits intodevfrom
bulldozer-db

Conversation

@N2D4
Copy link
Copy Markdown
Contributor

@N2D4 N2D4 commented Mar 24, 2026

Note

High Risk
High risk due to new database tables/migrations (including PL/pgSQL worker + pg_cron scheduling) and a large new dev-only HTTP server script that mutates data via raw SQL. Also touches passkey registration verification and cron job startup timing, which can affect auth and background processing behavior.

Overview
Introduces Bulldozer DB persistence by adding Prisma models + migrations for BulldozerStorageEngine (hierarchical jsonb[] key paths with generated parent links) and a timefold queue (BulldozerTimeFoldQueue/BulldozerTimeFoldMetadata) with a bulldozer_timefold_process_queue() worker that replays reducers, emits rows/state into storage, and schedules itself via pg_cron.

Adds Bulldozer Studio as a backend dev tool: a local-only HTTP UI (run-bulldozer-studio.ts) to visualize table graphs, inspect/edit raw storage, and run init/delete/set-row actions; wired into pnpm dev and backed by a new dependency (elkjs).

Also includes small robustness tweaks: delay starting cron job polling to allow server startup, harden passkey registration options handling around empty hints, and assert registrationInfo is present after passkey verification. Migration tests are added to validate storage hierarchy behavior and timefold queue processing.

Reviewed by Cursor Bugbot for commit e3c3865. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • New Features

    • Added Bulldozer Studio — interactive web UI to browse table graphs, inspect/edit raw hierarchical storage, and run table-level actions.
  • Infrastructure

    • Introduced persistent hierarchical storage with a DB migration and server API to expose schema/table details.
  • Tests

    • Added extensive integration, fuzz and performance test suites for the new table system.
  • Chores

    • Updated dev start scripts and added a backend dependency to support the studio.
  • Documentation

    • Clarified lint usage and added guidance to document tradeoffs.

@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
stack-auth-hosted-components Ready Ready Preview, Comment Apr 13, 2026 5:55pm
stack-backend Error Error Apr 13, 2026 5:55pm
stack-dashboard Ready Ready Preview, Comment Apr 13, 2026 5:55pm
stack-demo Ready Ready Preview, Comment Apr 13, 2026 5:55pm
stack-docs Error Error Apr 13, 2026 5:55pm
stack-preview-backend Error Error Apr 13, 2026 5:55pm
stack-preview-dashboard Ready Ready Preview, Comment Apr 13, 2026 5:55pm

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Bulldozer feature suite: new storage engine table and migration, extensive Bulldozer table operators and SQL helpers, a standalone "Bulldozer Studio" dev HTTP server + UI, fuzz/perf tests, and supporting scripts/packaging and minor editor/docs tweaks.

Changes

Cohort / File(s) Summary
Editor / Docs
\.vscode/settings.json, AGENTS.md, claude/CLAUDE-KNOWLEDGE.md
Spell-check dictionary edits; lint/dev guidance extended; added Bulldozer/Postgres operational Q&A.
Backend dev tooling
apps/backend/package.json, apps/dev-launchpad/public/index.html, apps/backend/scripts/run-cron-jobs.ts
Added run-bulldozer-studio script and elkjs dep; launchpad UI entry for Bulldozer Studio; cron workers delay first run by 30s.
Database migration & schema
apps/backend/prisma/migrations/.../migration.sql, apps/backend/prisma/migrations/.../tests/ltree-queries.ts, apps/backend/prisma/schema.prisma
New BulldozerStorageEngine table (jsonb[] keyPath, generated parent, unique constraint, FK, index), seeded roots; migration tests validating behavior and constraints; Prisma model added (with unsupported jsonb[] field).
Standalone Studio server
apps/backend/scripts/run-bulldozer-studio.ts, apps/dev-launchpad/public/index.html
New TypeScript HTTP server serving SPA and JSON API endpoints for schema, table details, raw storage inspect/upsert/delete, and table mutations (init/delete/set-row/delete-row).
Bulldozer core SQL helpers
apps/backend/src/lib/bulldozer/db/bulldozer-sort-helpers-sql.ts, .../utilities.ts, .../index.ts
Added large SQL helper blob for pg_temp sort helpers and a typed SQL-construction utilities module; new central index exposing Table types, toExecutableSql* helpers, and re-exports of table factories.
Bulldozer table operators
apps/backend/src/lib/bulldozer/db/tables/* (stored-, group-by, map, flat-map, filter, concat, limit, sort, l-fold, left-join)
Added many new table factory implementations (declareStoredTable, declareGroupByTable, declareMapTable, declareFlatMapTable, declareFilterTable, declareConcatTable, declareLimitTable, declareSortTable, declareLFoldTable, declareLeftJoinTable) implementing init/delete/isInitialized/list/registerRowChangeTrigger and incremental change propagation SQL.
Example schema & tests
apps/backend/src/lib/bulldozer/db/example-schema.ts, .../index.fuzz.test.ts, .../index.perf.test.ts
Added an example fungible ledger schema and large fuzz + perf test suites that create ephemeral DBs, provision storage engine, run randomized mutations and measure/validate correctness and performance.
Prisma client change
apps/backend/src/prisma-client.tsx
Replaced static import with dynamic await import("@/stack") in connection-string resolution to defer module load.

Sequence Diagram(s)

sequenceDiagram
  participant Browser
  participant StudioServer as "Bulldozer Studio\n(HTTP Server)"
  participant BulldozerLib as "Bulldozer Module\n(SQL helpers & Table ops)"
  participant Postgres

  Browser->>StudioServer: GET / (SPA) / API requests
  StudioServer->>BulldozerLib: toExecutableSqlTransaction / handler (init/list/set/delete)
  BulldozerLib->>Postgres: Execute SQL (storage engine reads/writes, temp helpers)
  Postgres-->>BulldozerLib: Rows / success / errors
  BulldozerLib-->>StudioServer: Results / errors
  StudioServer-->>Browser: JSON response / HTML
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • BilalG1

Poem

"I hopped into code at break of day,
Keys clacking like carrots in a fray.
New tables dug, and trees arranged,
Studio lights flicker, graphs exchanged.
Tiny paws applaud the dev bouquet 🐇"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.06% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Bulldozer DB' is concise and clearly describes the main feature being added—a persistent database storage layer for Bulldozer with associated tooling and tests.
Description check ✅ Passed The pull request provides a comprehensive description with risk assessment, feature overview, implementation details, and migration/testing context.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bulldozer-db

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread apps/backend/scripts/run-bulldozer-studio.ts Outdated
Comment thread apps/backend/scripts/run-bulldozer-studio.ts
@nams1570 nams1570 self-requested a review April 6, 2026 16:53
Copy link
Copy Markdown
Collaborator

@nams1570 nams1570 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial Review- some bugs in concattable implementation

(gen_random_uuid(), ${getStorageEnginePath(options.tableId, [])}::jsonb[], 'null'::jsonb),
(gen_random_uuid(), ${getStorageEnginePath(options.tableId, ["metadata"])}::jsonb[], '{ "version": 1 }'::jsonb)
`],
delete: () => [sqlStatement`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bugs: I see the recursive CTE here deletes the bulldozer engine objects, but the triggers aren't deleted right?

What if hypothetically, i create a ConcatTable c_table_a, and two downstream tables p and q have it as an input table. This means that p an q would call c_table_a's registerRowChangeTriggers right?
Now, let's delete c_table_a. would anything break on p and q?
What about if you create a new ConcatTable c_table_a (reusing same name) again? Would p and q now listen for changes to c_table_a?

From my limited testing, i see that the above case will happen.

Also, let's say c_table_a had sourceA as an input table. So sourceA->c_table_a->p. Naturally, changes to sourceA would affect p via triggers. If you delete c_table_a, and then create another c_table_a that DOESN'T take in sourceA as an input, changes to sourceA would still affect the v2 of c_table_a AND p.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same bug applies across all tables.

Comment on lines +33 to +43
const referenceCompareSortKeysSql = firstTable.compareSortKeys(sqlExpression`$1`, sqlExpression`$2`).sql;
for (const table of tables) {
const compareGroupKeysSql = table.compareGroupKeys(sqlExpression`$1`, sqlExpression`$2`).sql;
const compareSortKeysSql = table.compareSortKeys(sqlExpression`$1`, sqlExpression`$2`).sql;
if (compareGroupKeysSql !== referenceCompareGroupKeysSql || compareSortKeysSql !== referenceCompareSortKeysSql) {
throw new StackAssertionError("declareConcatTable requires comparator-compatible input tables", {
tableId: options.tableId,
tableDebugId: tableIdToDebugString(table.tableId),
});
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: isn't the compareSortKeys check here a little too strict? As I see it, later in the file you set the rows' sortKeys to null anyway, and differing sorting upstream isn't going to affect a Union's output right?

RD extends RowData,
>(options: {
tableId: TableId,
tables: Table<GK, any, RD>[],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: if the same table is entered twice here, wouldn't there be duplicated rows?

Also the same change event would be subscribed to twice right? you'd have two callbacks for the same change to the same table run?

Comment thread apps/backend/src/lib/bulldozer/db/tables/concat-table.ts
Comment on lines +69 to +79
"sourceRows"."groupkey" AS "groupKey",
${createConcatenatedRowIdentifierSql(tableIndex, `"sourceRows"."rowidentifier"`)} AS "rowIdentifier",
'null'::jsonb AS "rowSortKey",
"sourceRows"."rowdata" AS "rowData"
FROM (${table.listRowsInGroup({
start: "start",
end: "end",
startInclusive: true,
endInclusive: true,
}).sql}) AS "sourceRows"
WHERE ${getInputInitializedSql(table)}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: here, we read sourceRows from each inputTable.listRowsinGroup. But the listRowsinGroup defined in index.ts doesn't require it to return the groupKey. The type looks like this:

listRowsInGroup(options: { groupKey?: SqlExpression<GK>, start: SqlExpression<SK> | "start", end: SqlExpression<SK> | "end", startInclusive: boolean, endInclusive: boolean }): SqlQuery<Iterable<{ rowIdentifier: RowIdentifier, rowSortKey: SK, rowData: RD }>>,

For example, the StoredTable implementation of listRowsinGroup doesn't return groupkey.

Comment thread apps/backend/src/lib/bulldozer/db/tables/stored-table.ts Outdated
model BulldozerStorageEngine {
id String @id @default(uuid()) @db.Uuid
keyPath Json[]
keyPathParent Unsupported("jsonb[]")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if this is nullable, shouldn't it be Unsupported("jsonb[]")? ?

Comment thread apps/backend/src/lib/bulldozer/db/tables/filter-table.ts
Comment thread apps/backend/src/lib/bulldozer/db/tables/concat-table.ts
Comment thread apps/backend/src/lib/bulldozer/db/tables/stored-table.ts
Comment thread apps/backend/scripts/run-bulldozer-studio.ts
if (current === "\"") inDoubleQuote = false;
index++;
continue;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

splitSqlStatements mishandles escaped double-quote identifiers

Low Severity

The splitSqlStatements parser treats every " as toggling inDoubleQuote mode, but PostgreSQL allows "" inside double-quoted identifiers to represent a literal quote character. A SQL identifier like "column""name" would be incorrectly parsed as two separate quoted regions, causing any semicolons following the premature quote-exit to be treated as statement terminators. No currently generated SQL triggers this, but it breaks the parser contract if future SQL uses escaped identifiers.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5b69a18. Configure here.

`);
});
sendJson(response, 200, { ok: true });
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raw upsert lacks protection for reserved root paths

Medium Severity

The /api/raw/upsert endpoint allows modifying the value of reserved root paths like [] and ["table"], while the /api/raw/delete endpoint explicitly protects them. This inconsistency means a user could accidentally overwrite the root or table-namespace node values, potentially disrupting the hierarchical storage structure that all bulldozer tables depend on.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a7f999f. Configure here.

Comment thread apps/backend/src/lib/bulldozer/db/tables/concat-table.ts
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 9cb5f5b. Configure here.

queued_row."reducerSql"
)
INTO next_state, next_rows_data, next_timestamp
USING current_state, queued_row."rowData", current_timestamp_value;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection via format %s with reducerSql

Medium Severity

The bulldozer_timefold_process_queue function uses EXECUTE format(... %s ..., queued_row."reducerSql") to run SQL stored in the BulldozerTimeFoldQueue table's reducerSql TEXT column. This directly interpolates unparameterized SQL from a database column into executable code. While the column is currently only populated by application code, any path that allows writing to BulldozerTimeFoldQueue (including the studio's raw upsert endpoint) could inject arbitrary SQL that runs with the function's privileges.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 9cb5f5b. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants