feat: new app database shape by mogery · Pull Request #2445 · firecrawl/firecrawl

mogery · 2025-11-26T16:35:12Z

removes posthog for good
removes bigquery data sending in exchange for using the data pipeline
uses new database shape of requests -> sub-units

mogery · 2025-11-27T10:44:06Z

@cubic-dev-ai review pls

cubic-dev-ai · 2025-11-27T10:44:22Z

@cubic-dev-ai review pls

@mogery I've started the AI code review. It'll take a few minutes to complete.

cubic-dev-ai

6 issues found across 51 files

Prompt for AI agents (all 6 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="apps/api/src/lib/gcs-jobs.ts">

<violation number="1" location="apps/api/src/lib/gcs-jobs.ts:215">
Map results upload retries swallow every error, allowing all attempts to fail without logging or throwing, so the data is never written but the caller still sees success.</violation>
</file>

<file name="apps/api/src/controllers/v2/map.ts">

<violation number="1" location="apps/api/src/controllers/v2/map.ts:58">
logRequest no longer stores `req.body.integration`, so integration-originated map requests cannot be attributed in the new requests table.</violation>
</file>

<file name="apps/api/src/controllers/v1/map.ts">

<violation number="1" location="apps/api/src/controllers/v1/map.ts:465">
logMap returns a promise and is never awaited here, so any failure inside logMap (e.g., saveMapToGCS rejecting) becomes an unhandled rejection and the map log may be lost. Await or explicitly handle the promise.</violation>
</file>

<file name="apps/api/src/services/worker/crawl-logic.ts">

<violation number="1" location="apps/api/src/services/worker/crawl-logic.ts:157">
Batch scrapes now record `credits_cost = 0` whenever `credits_billed` is still null, so missing billing data is indistinguishable from a free job. Preserve the null (or defer logging) instead of forcing zero to avoid corrupting billing records.</violation>
</file>

<file name="apps/api/src/lib/extract/extraction-service.ts">

<violation number="1" location="apps/api/src/lib/extract/extraction-service.ts:682">
Persist the actual exception message when logging the multi-entity transform failure so the stored extract record contains actionable diagnostics instead of a generic string.</violation>

<violation number="2" location="apps/api/src/lib/extract/extraction-service.ts:792">
Log the actual scrape exception message instead of a generic string so extract failures remain diagnosable.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

cubic-dev-ai

4 issues found across 68 files

Prompt for AI agents (all 4 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="apps/api/src/controllers/v0/admin/crawl-check.ts">

<violation number="1" location="apps/api/src/controllers/v0/admin/crawl-check.ts:173">
`status` is now always `finished` because the code treats the mere presence of a `requests` row as completion, but those rows are created when the crawl is enqueued, so active crawls no longer report their real status.</violation>
</file>

<file name="apps/api/src/controllers/v1/crawl-status.ts">

<violation number="1" location="apps/api/src/controllers/v1/crawl-status.ts:71">
`getJobs` no longer provides any persisted document payload when the queue job or GCS blob is missing, so websocket consumers stop receiving completed crawl documents even though the scrapes still exist in Supabase.</violation>
</file>

<file name="apps/api/src/controllers/v0/search.ts">

<violation number="1" location="apps/api/src/controllers/v0/search.ts:245">
`logSearch` is async and can throw, but the controller fire-and-forgets it, so any logging/GCS failure produces an unhandled promise rejection. Await the call (or at least add a `.catch`) so failures are captured instead of crashing the process.</violation>
</file>

<file name="apps/api/src/controllers/v0/admin/zdrcleaner.ts">

<violation number="1" location="apps/api/src/controllers/v0/admin/zdrcleaner.ts:141">
Requests are removed from the cleanup schedule after the first successful blob deletion, so blobs that failed to delete are never retried. Defer clearing `dr_clean_by` until all blobs for a request delete successfully.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

cubic-dev-ai · 2025-11-28T12:52:26Z

+          batch.map(async (row: { request_id: string; blob_id: string }) => {
+            try {
+              await cleanUpJob(row.blob_id);
+              cleanedUpRequestIds.add(row.request_id);


Requests are removed from the cleanup schedule after the first successful blob deletion, so blobs that failed to delete are never retried. Defer clearing dr_clean_by until all blobs for a request delete successfully.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At apps/api/src/controllers/v0/admin/zdrcleaner.ts, line 141: <comment>Requests are removed from the cleanup schedule after the first successful blob deletion, so blobs that failed to delete are never retried. Defer clearing `dr_clean_by` until all blobs for a request delete successfully.</comment> <file context> @@ -97,13 +100,102 @@ async function cleanUp(specificTeamId: string | null, _logger: Logger) { + batch.map(async (row: { request_id: string; blob_id: string }) => { + try { + await cleanUpJob(row.blob_id); + cleanedUpRequestIds.add(row.request_id); + } catch (error) { + logger.error(`Error cleaning up blob`, { </file context>

✅ Addressed in 6af8ba8

cubic-dev-ai

1 issue found across 1 file (reviewed changes from recent commits).

Prompt for AI agents (all 1 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="apps/api/src/controllers/v0/search.ts">

<violation number="1" location="apps/api/src/controllers/v0/search.ts:245">
P2: Awaiting `logSearch` makes successful searches fail whenever the analytics insert/upload throws, so a logging hiccup now returns HTTP 500 to the client. Keep the logging best-effort instead of blocking the response.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

mogery · 2025-12-03T09:44:01Z

@cubic-dev-ai please re-review

cubic-dev-ai · 2025-12-03T09:44:20Z

@cubic-dev-ai please re-review

@mogery I've started the AI code review. It'll take a few minutes to complete.

cubic-dev-ai

4 issues found across 72 files

Prompt for AI agents (all 4 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="apps/api/src/controllers/v0/admin/crawl-migration.ts">

<violation number="1" location="apps/api/src/controllers/v0/admin/crawl-migration.ts:36">
P1: Using .single() here throws for the normal “not found” case, so every new crawl fails to migrate. Use maybeSingle() (or handle the error) so missing rows don’t abort the migration.</violation>
</file>

<file name="apps/api/src/controllers/v1/map.ts">

<violation number="1" location="apps/api/src/controllers/v1/map.ts:466">
P2: Handle the `logMap` promise so failures writing map logs don’t become unhandled rejections.</violation>
</file>

<file name="apps/api/src/controllers/v0/admin/zdrcleaner.ts">

<violation number="1" location="apps/api/src/controllers/v0/admin/zdrcleaner.ts:141">
P1: Requests are marked cleaned even when some of their blobs fail to delete, so `dr_clean_by` is cleared and the failed blobs are never retried.</violation>
</file>

<file name="apps/api/src/__tests__/snips/zdr-helpers.ts">

<violation number="1" location="apps/api/src/__tests__/snips/zdr-helpers.ts:117">
P2: `batch_scrapes` rows never contain a url column, so this expectation always throws and the helper can’t verify cleanup.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

cubic-dev-ai · 2025-12-03T10:09:40Z

+        .from("requests")
+        .select("id")
+        .eq("id", crawlId)
+        .single();


P1: Using .single() here throws for the normal “not found” case, so every new crawl fails to migrate. Use maybeSingle() (or handle the error) so missing rows don’t abort the migration.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At apps/api/src/controllers/v0/admin/crawl-migration.ts, line 36: <comment>Using .single() here throws for the normal “not found” case, so every new crawl fails to migrate. Use maybeSingle() (or handle the error) so missing rows don’t abort the migration.</comment> <file context> @@ -0,0 +1,118 @@ + .from("requests") + .select("id") + .eq("id", crawlId) + .single(); + + if (existingRequest) { </file context>

Suggested change

.single();

.maybeSingle();

cubic-dev-ai · 2025-12-03T10:09:40Z

+  logMap({
+    id: result.job_id,
+    request_id: result.job_id,
    url: req.body.url,
-    crawlerOptions: {},
-    scrapeOptions: {},
-    origin: req.body.origin ?? "api",
-    integration: req.body.integration,
-    num_tokens: 0,
-    credits_billed: 1,
+    team_id: req.auth.team_id,
+    options: {
+      search: req.body.search,
+      limit: req.body.limit,
+      ignoreSitemap: req.body.ignoreSitemap,
+      includeSubdomains: req.body.includeSubdomains,
+      filterByPath: req.body.filterByPath !== false,
+      useIndex: req.body.useIndex,
+      timeout: req.body.timeout,
+      location: req.body.location,
+    },
+    results: result.links,
+    credits_cost: 1,
    zeroDataRetention: false, // not supported
  });


P2: Handle the logMap promise so failures writing map logs don’t become unhandled rejections.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At apps/api/src/controllers/v1/map.ts, line 466: <comment>Handle the `logMap` promise so failures writing map logs don’t become unhandled rejections.</comment> <file context> @@ -446,22 +463,23 @@ export async function mapController( - time_taken: result.time_taken, - team_id: req.auth.team_id, - mode: "map", + logMap({ + id: result.job_id, + request_id: result.job_id, </file context>

Suggested change

logMap({

id: result.job_id,

request_id: result.job_id,

url: req.body.url,

crawlerOptions: {},

scrapeOptions: {},

origin: req.body.origin ?? "api",

integration: req.body.integration,

num_tokens: 0,

credits_billed: 1,

team_id: req.auth.team_id,

options: {

search: req.body.search,

limit: req.body.limit,

ignoreSitemap: req.body.ignoreSitemap,

includeSubdomains: req.body.includeSubdomains,

filterByPath: req.body.filterByPath !== false,

useIndex: req.body.useIndex,

timeout: req.body.timeout,

location: req.body.location,

},

results: result.links,

credits_cost: 1,

zeroDataRetention: false, // not supported

});

logMap({

id: result.job_id,

request_id: result.job_id,

url: req.body.url,

team_id: req.auth.team_id,

options: {

search: req.body.search,

limit: req.body.limit,

ignoreSitemap: req.body.ignoreSitemap,

includeSubdomains: req.body.includeSubdomains,

filterByPath: req.body.filterByPath !== false,

useIndex: req.body.useIndex,

timeout: req.body.timeout,

location: req.body.location,

},

results: result.links,

credits_cost: 1,

zeroDataRetention: false, // not supported

}).catch(error => {

logger.error(`Failed to log job for team ${req.auth.team_id}: ${error}`);

});

cubic-dev-ai · 2025-12-03T10:09:41Z

+          batch.map(async (row: { request_id: string; blob_id: string }) => {
+            try {
+              await cleanUpJob(row.blob_id);
+              cleanedUpRequestIds.add(row.request_id);


P1: Requests are marked cleaned even when some of their blobs fail to delete, so dr_clean_by is cleared and the failed blobs are never retried.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At apps/api/src/controllers/v0/admin/zdrcleaner.ts, line 141: <comment>Requests are marked cleaned even when some of their blobs fail to delete, so `dr_clean_by` is cleared and the failed blobs are never retried.</comment> <file context> @@ -97,13 +100,102 @@ async function cleanUp(specificTeamId: string | null, _logger: Logger) { + batch.map(async (row: { request_id: string; blob_id: string }) => { + try { + await cleanUpJob(row.blob_id); + cleanedUpRequestIds.add(row.request_id); + } catch (error) { + logger.error(`Error cleaning up blob`, { </file context>

cubic-dev-ai · 2025-12-03T10:09:41Z

+
+  if (requestData && requestData.length === 1) {
+    const record = requestData[0];
+    expect(record.url).not.toContain("://"); // no url stored


P2: batch_scrapes rows never contain a url column, so this expectation always throws and the helper can’t verify cleanup.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At apps/api/src/__tests__/snips/zdr-helpers.ts, line 117: <comment>`batch_scrapes` rows never contain a url column, so this expectation always throws and the helper can’t verify cleanup.</comment> <file context> @@ -76,70 +85,130 @@ export async function getLogs() { + + if (requestData && requestData.length === 1) { + const record = requestData[0]; + expect(record.url).not.toContain("://"); // no url stored + expect(record.kind).toBe("crawl"); + expect(record.dr_clean_by).not.toBeNull(); </file context>

mogery changed the title ~~feat: rework job logging to new schema (draft impl)~~ feat: new app database shape Nov 26, 2025

mogery force-pushed the mog/new-app-db-shape branch 2 times, most recently from 7cdaa79 to f745f14 Compare November 27, 2025 08:35

mogery mentioned this pull request Nov 27, 2025

feat(api): switch to uuidv7 #2449

Merged

mogery force-pushed the mog/new-app-db-shape branch 2 times, most recently from 8d4811b to 0a249a3 Compare November 27, 2025 09:29

cubic-dev-ai Bot reviewed Nov 27, 2025

View reviewed changes

mogery mentioned this pull request Nov 27, 2025

fix(api): logJob for noq requests #2427

Closed

mogery marked this pull request as ready for review November 28, 2025 12:39

mogery requested a review from nickscamara as a code owner November 28, 2025 12:39

cubic-dev-ai Bot reviewed Nov 28, 2025

View reviewed changes

cubic-dev-ai Bot reviewed Dec 1, 2025

View reviewed changes

Comment thread apps/api/src/controllers/v0/search.ts

mogery force-pushed the mog/new-app-db-shape branch 2 times, most recently from 794fbdc to 3209e88 Compare December 2, 2025 14:26

mogery added 14 commits December 3, 2025 10:38

fix(api/admin/integration/validate-api-key): correct parsing

d8ea8b5

feat: rework job logging to new schema (draft impl)

66a97be

robust supa inserts

ccd6276

fix: change tracking logic only when db auth is on

9015b9e

hook up all request logging and uuidv7

50f4515

fix: integration logging

c4087a6

fix: zdrcleaner

a70ea01

fix: gcs retries correctly

fe5c37c

fix: move off of firecrawl_jobs

9fef0c8

temp: run prod tests

d3b18cf

fix zdr tests

5970d4b

fix zdr on noq

c701c5c

fix(api/tests/zdr): update log ignore list

049602b

fix(log_job): insertion of searches

621dec8

mogery and others added 9 commits December 3, 2025 10:40

fix(api/tests/zdr): wait for finish crons to fire

8789c22

fix(scrape-worker): propagate requestId when adding new jobs

0557f50

chore: re-disable prod test ci

5c8ac6a

feat: prevent crawls from breaking during migration

206fa68

Update apps/api/src/controllers/v0/search.ts

02f9d8e

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

fix(admin): remove crawl check

215545a

feat(api): update RPCs

a194194

fix(log_job): playground team id

2d71f58

fix: success -> is_successful

b82bee6

mogery force-pushed the mog/new-app-db-shape branch from 0be3b16 to b82bee6 Compare December 3, 2025 09:41

feat(v2/search): fix ZDR in new data structure

37dba5f

mogery added 2 commits December 3, 2025 10:45

fix prev use of success

3445610

fix: zdrcleaner issue

6af8ba8

mogery merged commit 3086a27 into main Dec 3, 2025
5 of 8 checks passed

cubic-dev-ai Bot reviewed Dec 3, 2025

View reviewed changes

Conversation

mogery commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mogery commented Nov 27, 2025

Uh oh!

cubic-dev-ai Bot commented Nov 27, 2025

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mogery commented Dec 3, 2025

Uh oh!

cubic-dev-ai Bot commented Dec 3, 2025

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mogery commented Nov 26, 2025 •

edited

Loading

cubic-dev-ai Bot Nov 28, 2025 •

edited

Loading

cubic-dev-ai Bot Dec 3, 2025 •

edited

Loading

cubic-dev-ai Bot Dec 3, 2025 •

edited

Loading

cubic-dev-ai Bot Dec 3, 2025 •

edited

Loading

cubic-dev-ai Bot Dec 3, 2025 •

edited

Loading