Skip to content

[Fix]: Investigate Memory Leak on Verify Data Integrity#1269

Merged
nams1570 merged 6 commits intodevfrom
inspect-verify-data-memory-leak
Mar 23, 2026
Merged

[Fix]: Investigate Memory Leak on Verify Data Integrity#1269
nams1570 merged 6 commits intodevfrom
inspect-verify-data-memory-leak

Conversation

@nams1570
Copy link
Copy Markdown
Collaborator

@nams1570 nams1570 commented Mar 18, 2026

Context

We encountered an out of memory error when running verify-data-integrity against the prod database. This was the error:
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory. This was one of the things preventing verify-data-integrity from running successfully in prod.

Summary of Changes

Local stress testing with constrained heap and memory telemetry revealed that the rise in used heap memory was directly proportional to the number of api calls. Investigation revealed that the currentOutputData array was growing with each api call and was kept in memory. Since it was still being appended to, it was actively kept in the heap. We refactor the script to no longer use it, and for the two flags --save-output and --verify-output that used it before, we refactor them to not need to. --save-output now streams responses to disk as JSONL and --verify-output now compares each response immediately and discards it.
We also note a potential source of a future memory leak in the allUsers array that is populated in memory for each project. We refactor to paginate instead. Note that this didn't cause a memory leak on local, this is a preventive measure.

Out of Scope

fetching all transactions in the payments section of the script is another potential cause for concern, but since the payments section of the script will be refactored soon, we defer that discussion.

Summary by CodeRabbit

  • New Features

    • Streaming save with safe temporary-file finalization and a completeness verification step.
    • Ability to load previously saved output data for verification.
  • Bug Fixes & Improvements

    • Memory-efficient paginated processing to avoid loading all items at once.
    • Count-based output tracking for more reliable verification.
    • Improved progress reporting and mutual-exclusion checks for save/verify operations.

@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
stack-auth-hosted-components Ready Ready Preview, Comment Mar 18, 2026 11:28pm
stack-backend Ready Ready Preview, Comment Mar 18, 2026 11:28pm
stack-dashboard Ready Ready Preview, Comment Mar 18, 2026 11:28pm
stack-demo Ready Ready Preview, Comment Mar 18, 2026 11:28pm
stack-docs Ready Ready Preview, Comment Mar 18, 2026 11:28pm

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

Replace in-memory OutputData with a Map and streaming JSONL output; add loadOutputData(filePath) that reads JSONL or legacy JSON; update createApiHelpers to support streaming via outputFilePath, expose verifyOutputCompleteness and finalizeOutput; index.ts changed to use paginated user processing and the new helpers.

Changes

Cohort / File(s) Summary
Output Data Streaming & API
apps/backend/scripts/verify-data-integrity/api.ts
Convert OutputData to Map<string, EndpointOutput[]>. Add loadOutputData(filePath) that supports JSONL and legacy JSON. Replace per-endpoint arrays with counters, add outputFilePath option to createApiHelpers, stream appended JSONL lines to a temp file, and expose verifyOutputCompleteness() and finalizeOutput().
Integration, Pagination & Exports
apps/backend/scripts/verify-data-integrity/index.ts
Use loadOutputData() to load target outputs and remove deepPlainEquals. Pass outputFilePath to helpers; call verifyOutputCompleteness() and finalizeOutput() instead of deep-compare/write. Replace all-users fetch with paginated per-page/per-user processing and adapt verification to count-based logic. Also export loadOutputData.

Sequence Diagram(s)

sequenceDiagram
    participant Script as Script
    participant Helpers as createApiHelpers
    participant TempFile as TempFile (JSONL)
    participant TargetData as TargetOutputData (Map)

    Script->>Helpers: createApiHelpers({ targetOutputData, outputFilePath })
    Helpers->>TempFile: create temp file (if streaming)
    Script->>Helpers: expectStatusCode(endpoint, output)
    Helpers->>TempFile: append JSONL line `{ endpoint, output }`
    Helpers->>Helpers: increment outputCountByEndpoint[endpoint]
    Script->>Helpers: verifyOutputCompleteness()
    Helpers->>TargetData: compare observed counts vs targetOutputData counts
    Script->>Helpers: finalizeOutput()
    Helpers->>TempFile: rename temp -> final outputFilePath
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibbled lines into JSONL rows,
Counters twitch where the carrot grows,
Pages hop, users fetched with care,
Temp file renamed — completeness there,
A rabbit cheers — outputs all in rows ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[Fix]: Investigate Memory Leak on Verify Data Integrity' directly reflects the primary change: fixing a memory leak issue in the verify-data-integrity script through refactoring.
Description check ✅ Passed The pull request description is comprehensive and well-structured, covering context (the OOM error), summary of changes (removing currentOutputData, streaming outputs, pagination refactoring), and scope boundaries.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch inspect-verify-data-memory-leak
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

This kept slowly growing the old heap leading to a fatal error. I removed it but tried to keep the same functionality for the two flags that depended on it via checkpointed writes and comparisons
we used to read allusers for each project into memory.
For a project with lots of users, this array could grow unbounded.
@nams1570 nams1570 force-pushed the inspect-verify-data-memory-leak branch from caa8cfb to ad43185 Compare March 18, 2026 22:47
@nams1570 nams1570 marked this pull request as ready for review March 18, 2026 22:48
Copilot AI review requested due to automatic review settings March 18, 2026 22:48
@nams1570 nams1570 requested a review from N2D4 March 18, 2026 22:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses an out-of-memory failure in verify-data-integrity by removing in-memory accumulation of API outputs and switching to streamed output recording and incremental verification.

Changes:

  • Replaced currentOutputData accumulation with per-endpoint counters plus optional JSONL streaming to disk for --save-output.
  • Updated --verify-output to validate outputs incrementally during execution and added a final completeness check.
  • Refactored user iteration to paginate and avoid holding all users in memory per project.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
apps/backend/scripts/verify-data-integrity/index.ts Switches to new API helper contract, paginates user processing, and verifies output completeness at end.
apps/backend/scripts/verify-data-integrity/api.ts Adds JSONL/legacy loader, streams outputs to disk, compares outputs incrementally, and verifies expected call counts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apps/backend/scripts/verify-data-integrity/index.ts Outdated
Comment thread apps/backend/scripts/verify-data-integrity/index.ts Outdated
Comment thread apps/backend/scripts/verify-data-integrity/api.ts Outdated
Comment thread apps/backend/scripts/verify-data-integrity/api.ts Outdated
Comment thread apps/backend/scripts/verify-data-integrity/api.ts Outdated
Comment thread apps/backend/scripts/verify-data-integrity/api.ts
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 18, 2026

Greptile Summary

This PR resolves a JavaScript heap OOM error in the verify-data-integrity script by eliminating the in-memory accumulation of all API responses. The two main changes are: (1) JSONL streaming — instead of growing a global currentOutputData object, each API response is appended to a .tmp file as it arrives and the file is atomically finalized on success; and (2) user pagination — instead of pre-fetching all users into memory before processing, users are fetched and processed one page at a time. OutputData is now correctly typed as Map<string, EndpointOutput[]> (per the project's prototype-pollution convention), and --save-output/--verify-output are now mutually exclusive.

Key points:

  • Memory growth is now O(page) rather than O(total API calls), which addresses the reported OOM.
  • The loadOutputData helper handles both the old pretty-printed JSON format and the new JSONL format for backward compatibility.
  • A minor regression was introduced: the progress display denominator (userCount) reflects the total DB count rather than the effective cap from --max-users-per-project, which can show misleading progress (e.g., [user 5/1000000]) when the flag is set.
  • fs.appendFileSync is called synchronously on every API response in --save-output mode; for large production databases this could add measurable latency due to blocking system calls on each write.

Confidence Score: 4/5

  • Safe to merge — no correctness regressions for the core verification logic; two minor issues with display accuracy and I/O performance.
  • The memory-leak fix is well-reasoned and the streaming + pagination approach is correct. The Map-based OutputData type, mutual-exclusion guard, and atomic .tmp rename all improve correctness compared to the previous code. The two flagged issues (progress denominator cap and synchronous appendFileSync) are non-blocking but worth addressing.
  • No files require special attention — both changed files are scripts with no impact on production API paths.

Important Files Changed

Filename Overview
apps/backend/scripts/verify-data-integrity/api.ts Core memory-leak fix: replaces the in-memory currentOutputData array with count-based tracking and JSONL streaming to disk. OutputData is now correctly typed as Map<string, EndpointOutput[]> per the project convention. New loadOutputData handles both legacy JSON and new JSONL format. Synchronous fs.appendFileSync per API call is a minor performance concern for large runs.
apps/backend/scripts/verify-data-integrity/index.ts Removes global currentOutputData accumulation, adds mutual-exclusion guard for --save-output/--verify-output, and converts user loading to per-page streaming. Introduces a minor regression: the progress display denominator (userCount) shows total DB users rather than the effective cap set by --max-users-per-project.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([main]) --> B{mutually\nexclusive\ncheck}
    B -- both flags --> ERR([throw Error])
    B -- ok --> C[loadOutputData\nif --verify-output]
    C --> D[createApiHelpers\noutputFilePath if --save-output]
    D --> E[for each project]
    E --> F[fetch 3 project endpoints\nin Promise.all]
    F --> G{skipUsers?}
    G -- yes --> H[skip user loop]
    G -- no --> I[DB count for display\nuserCount]
    I --> J[while hasMore AND\nusersProcessed < max]
    J --> K[fetch page of users\nPAGE_LIMIT=1000]
    K --> L[for each user in page]
    L --> M[expectStatusCode\nappendOutputData]
    M --> N{tmpFilePath?}
    N -- yes --> O[fs.appendFileSync\none JSONL line]
    N -- no --> P[count only]
    O --> Q{targetOutputData?}
    P --> Q
    Q -- yes --> R[compare vs\ntargetOutputData count-1]
    Q -- no --> S[continue]
    R -- mismatch --> THROW([StackAssertionError])
    R -- match --> S
    L --> L2{more users\nin page?}
    L2 -- yes --> L
    L2 -- no --> J2{hasMore?}
    J2 -- yes --> J
    J2 -- no --> H
    H --> E2{more\nprojects?}
    E2 -- yes --> E
    E2 -- no --> VC[verifyOutputCompleteness\ncheck all endpoint counts]
    VC --> FO{shouldSaveOutput?}
    FO -- yes --> RENAME[finalizeOutput\nrename .tmp to final path]
    FO -- no --> END([done])
    RENAME --> END
Loading

Last reviewed commit: "chore: defensive typ..."

Comment thread apps/backend/scripts/verify-data-integrity/api.ts Outdated
Comment thread apps/backend/scripts/verify-data-integrity/api.ts Outdated
Comment thread apps/backend/scripts/verify-data-integrity/index.ts
Comment thread apps/backend/scripts/verify-data-integrity/index.ts Outdated
Can't pass verify-output and save-output together else file will be overwritten.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
apps/backend/scripts/verify-data-integrity/index.ts (1)

247-285: Encode the dynamic user/team IDs in these URLs.

Line 247, Line 256, Line 274, and Line 285 still interpolate user.id/team.id directly. The cursor is encoded already, but these values are not; reserved characters would break the request, and this bypasses the repo URL rule.

Suggested fix
-              await expectStatusCode(200, `/api/v1/users/${user.id}`, {
+              await expectStatusCode(200, `/api/v1/users/${encodeURIComponent(user.id)}`, {
@@
-              const projectPermissions = await expectStatusCode(200, `/api/v1/project-permissions?user_id=${user.id}`, {
+              const projectPermissions = await expectStatusCode(200, `/api/v1/project-permissions?user_id=${encodeURIComponent(user.id)}`, {
@@
-              const teams = await expectStatusCode(200, `/api/v1/teams?user_id=${user.id}`, {
+              const teams = await expectStatusCode(200, `/api/v1/teams?user_id=${encodeURIComponent(user.id)}`, {
@@
-                  const teamPermissions = await expectStatusCode(200, `/api/v1/team-permissions?team_id=${team.id}`, {
+                  const teamPermissions = await expectStatusCode(200, `/api/v1/team-permissions?team_id=${encodeURIComponent(team.id)}`, {

As per coding guidelines, "Use urlString`` or encodeURIComponent()` instead of normal string interpolation for URLs for consistency."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/scripts/verify-data-integrity/index.ts` around lines 247 - 285,
The URLs that interpolate dynamic IDs (e.g., `/api/v1/users/${user.id}`,
`/api/v1/project-permissions?user_id=${user.id}`,
`/api/v1/teams?user_id=${user.id}`,
`/api/v1/team-permissions?team_id=${team.id}` used with expectStatusCode) must
encode those values to avoid breaking on reserved characters; replace the plain
interpolation with the repo-standard encoder (use urlString(...) or
encodeURIComponent(...) around user.id and team.id) so the constructed request
paths are safe and conform to the URL rule while keeping the same calls to
expectStatusCode and the surrounding logic intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/backend/scripts/verify-data-integrity/api.ts`:
- Around line 29-44: Narrow the catch to only handle SyntaxError and add
defensive type checks before using the `in` operator: after parsing `firstLine`
with `JSON.parse(firstLine)` validate `parsed` is a non-null object (e.g.,
typeof parsed === "object" && parsed !== null) and that `endpoint` and `output`
exist as expected; similarly, when iterating lines parse each line into
`parsedLine`, validate it's an object and contains the `endpoint` and `output`
fields before accessing them and pushing into the `data` OutputData map; only
treat JSON parse failures (SyntaxError) as a JSONL fallback—let other errors
propagate so true bugs surface.

In `@apps/backend/scripts/verify-data-integrity/index.ts`:
- Around line 322-323: The pagination loop is using
usersPage.pagination?.next_cursor which is always undefined because the shared
CRUD pagination exposes the cursor as next_n; update the loop to read
usersPage.pagination?.next_n instead, i.e. set hasMore =
!!usersPage.pagination?.next_n and userCursor = usersPage.pagination?.next_n ??
undefined so subsequent pages are fetched past PAGE_LIMIT when more results
exist.
- Around line 218-221: Replace the non-null assertion on the result of
getPrismaClientForTenancy(tenancy) with an explicit invariant check: call
getPrismaClientForTenancy(tenancy), assign to a local prisma variable, then if
prisma is undefined throw a clear error (e.g., via throwErr(...) or an explicit
Error) that states the expectation (Prisma client must exist for tenancy.id)
before using prisma to compute userCount; update the code around the prisma and
userCount references so the count call only runs after the explicit check.

---

Nitpick comments:
In `@apps/backend/scripts/verify-data-integrity/index.ts`:
- Around line 247-285: The URLs that interpolate dynamic IDs (e.g.,
`/api/v1/users/${user.id}`, `/api/v1/project-permissions?user_id=${user.id}`,
`/api/v1/teams?user_id=${user.id}`,
`/api/v1/team-permissions?team_id=${team.id}` used with expectStatusCode) must
encode those values to avoid breaking on reserved characters; replace the plain
interpolation with the repo-standard encoder (use urlString(...) or
encodeURIComponent(...) around user.id and team.id) so the constructed request
paths are safe and conform to the URL rule while keeping the same calls to
expectStatusCode and the surrounding logic intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cd8c91eb-776c-4c0e-9802-a4bdbe928dca

📥 Commits

Reviewing files that changed from the base of the PR and between 7335040 and ad43185.

📒 Files selected for processing (2)
  • apps/backend/scripts/verify-data-integrity/api.ts
  • apps/backend/scripts/verify-data-integrity/index.ts

Comment thread apps/backend/scripts/verify-data-integrity/api.ts Outdated
Comment thread apps/backend/scripts/verify-data-integrity/index.ts Outdated
Comment thread apps/backend/scripts/verify-data-integrity/index.ts
@nams1570
Copy link
Copy Markdown
Collaborator Author

@greptileai reread PR desc and see new commits? Update summary and rereview please

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
apps/backend/scripts/verify-data-integrity/api.ts (1)

32-45: ⚠️ Potential issue | 🟡 Minor

Narrow the catch block and add defensive type validation for inner parsing.

This partially addresses the previous review comment but still has issues:

  1. Line 43's bare catch {} still catches all errors, not just SyntaxError. Per coding guidelines, "NEVER try-catch-all".
  2. Line 37's inner JSON.parse(line) can throw SyntaxError for malformed lines, but this would be swallowed and misclassified as "not JSONL".
  3. Line 39 uses ! non-null assertion after data.get(endpoint) - prefer defensive coding.
Suggested fix
   try {
     const parsed = JSON.parse(firstLine);
     if (typeof parsed === "object" && parsed !== null && "endpoint" in parsed && "output" in parsed) {
       for (const line of lines) {
         if (!line.trim()) continue;
         const { endpoint, output } = JSON.parse(line);
         if (!data.has(endpoint)) data.set(endpoint, []);
-        data.get(endpoint)!.push(output);
+        const arr = data.get(endpoint);
+        if (arr == null) {
+          throw new StackAssertionError(`Expected endpoint ${endpoint} to exist in data after set`);
+        }
+        arr.push(output);
       }
       return data;
     }
-  } catch {
-    // Not JSONL — fall through to legacy parse
+  } catch (error) {
+    if (!(error instanceof SyntaxError)) throw error;
+    // Not valid JSON on first line — fall through to legacy parse
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/scripts/verify-data-integrity/api.ts` around lines 32 - 45, The
try-catch should only catch JSON parsing errors and you must defensively
validate each inner parsed line and avoid the non-null assertion: change the
outer catch to only handle SyntaxError from JSON.parse(firstLine); inside the
loop wrap JSON.parse(line) in its own try-catch that catches SyntaxError and
continues on malformed lines; after parsing each line, check that the result is
a non-null object and that endpoint is a string (or expected type) and output
exists before using them; replace data.get(endpoint)! usage with a safe pattern
(e.g., const bucket = data.get(endpoint) ?? []; bucket.push(output);
data.set(endpoint, bucket)) so you never rely on the non-null assertion.
🧹 Nitpick comments (1)
apps/backend/scripts/verify-data-integrity/api.ts (1)

64-70: Consider cleaning up temp file on unhandled errors.

The temp file is created at line 69 but only renamed to the final path in finalizeOutput(). If the script crashes between these points, the .tmp file will be orphaned. This is a minor operational concern.

Optional: Register cleanup handler
+  function cleanupTempFile() {
+    if (tmpFilePath && fs.existsSync(tmpFilePath)) {
+      try { fs.unlinkSync(tmpFilePath); } catch { /* ignore */ }
+    }
+  }
+
   if (tmpFilePath) {
     fs.writeFileSync(tmpFilePath, "");
+    process.on("exit", cleanupTempFile);
+    process.on("SIGINT", () => { cleanupTempFile(); process.exit(1); });
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/scripts/verify-data-integrity/api.ts` around lines 64 - 70,
tmpFilePath is created but may be left behind if the script crashes before
finalizeOutput() renames it; register cleanup handlers (process.on for 'exit',
'uncaughtException', 'unhandledRejection', and SIGINT) that check tmpFilePath
and remove the file (fs.unlinkSync within try/catch) to delete the orphaned
.tmp, and ensure finalizeOutput() removes those handlers after successful rename
so you don't attempt double-deletes; reference the tmpFilePath variable and
finalizeOutput() when adding and removing the cleanup logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@apps/backend/scripts/verify-data-integrity/api.ts`:
- Around line 32-45: The try-catch should only catch JSON parsing errors and you
must defensively validate each inner parsed line and avoid the non-null
assertion: change the outer catch to only handle SyntaxError from
JSON.parse(firstLine); inside the loop wrap JSON.parse(line) in its own
try-catch that catches SyntaxError and continues on malformed lines; after
parsing each line, check that the result is a non-null object and that endpoint
is a string (or expected type) and output exists before using them; replace
data.get(endpoint)! usage with a safe pattern (e.g., const bucket =
data.get(endpoint) ?? []; bucket.push(output); data.set(endpoint, bucket)) so
you never rely on the non-null assertion.

---

Nitpick comments:
In `@apps/backend/scripts/verify-data-integrity/api.ts`:
- Around line 64-70: tmpFilePath is created but may be left behind if the script
crashes before finalizeOutput() renames it; register cleanup handlers
(process.on for 'exit', 'uncaughtException', 'unhandledRejection', and SIGINT)
that check tmpFilePath and remove the file (fs.unlinkSync within try/catch) to
delete the orphaned .tmp, and ensure finalizeOutput() removes those handlers
after successful rename so you don't attempt double-deletes; reference the
tmpFilePath variable and finalizeOutput() when adding and removing the cleanup
logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 40e190ea-4e5d-4cea-a8df-d8e7a78c4f30

📥 Commits

Reviewing files that changed from the base of the PR and between d45f88a and 671e337.

📒 Files selected for processing (2)
  • apps/backend/scripts/verify-data-integrity/api.ts
  • apps/backend/scripts/verify-data-integrity/index.ts

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
apps/backend/scripts/verify-data-integrity/index.ts (3)

250-251: Encode dynamic URL IDs consistently.

Dynamic IDs are interpolated directly into path/query strings in these calls. Please encode them for consistency and safety.

🔧 Proposed change
-              await expectStatusCode(200, `/api/v1/users/${user.id}`, {
+              await expectStatusCode(200, `/api/v1/users/${encodeURIComponent(user.id)}`, {
@@
-              const projectPermissions = await expectStatusCode(200, `/api/v1/project-permissions?user_id=${user.id}`, {
+              const projectPermissions = await expectStatusCode(200, `/api/v1/project-permissions?user_id=${encodeURIComponent(user.id)}`, {
@@
-              const teams = await expectStatusCode(200, `/api/v1/teams?user_id=${user.id}`, {
+              const teams = await expectStatusCode(200, `/api/v1/teams?user_id=${encodeURIComponent(user.id)}`, {
@@
-                  const teamPermissions = await expectStatusCode(200, `/api/v1/team-permissions?team_id=${team.id}`, {
+                  const teamPermissions = await expectStatusCode(200, `/api/v1/team-permissions?team_id=${encodeURIComponent(team.id)}`, {

As per coding guidelines, "Use urlString``\" or encodeURIComponent()` instead of normal string interpolation for URLs for consistency."

Also applies to: 259-260, 277-278, 288-289

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/scripts/verify-data-integrity/index.ts` around lines 250 - 251,
The test calls that interpolate dynamic IDs (e.g., the expectStatusCode calls
that build `/api/v1/users/${user.id}` and the similar calls at the other noted
locations) should encode those IDs before inserting into the path; update the
string construction to use a safe encoder (e.g., wrap the ID with
encodeURIComponent or use the project's urlString helper) so the endpoint path
becomes `/api/v1/users/${encodeURIComponent(user.id)}` (and similarly for the
other calls at 259-260, 277-278, 288-289) to ensure consistent, safe URL
encoding when calling expectStatusCode.

236-237: Prefer explicit null checks for cursor state transitions.

The cursor checks currently rely on truthiness/boolean coercion. Using != null keeps the intent explicit and avoids conflating nullability with empty-string truthiness.

🔍 Proposed change
-          const cursorParam: string = userCursor ? `&cursor=${encodeURIComponent(userCursor)}` : "";
+          const cursorParam: string = userCursor != null ? `&cursor=${encodeURIComponent(userCursor)}` : "";
@@
-          hasMore = !!usersPage.pagination?.next_cursor;
+          hasMore = usersPage.pagination?.next_cursor != null;
           userCursor = usersPage.pagination?.next_cursor ?? undefined;

As per coding guidelines, "Unless very clearly equivalent from types, prefer explicit null/undefinedness checks over boolean checks, e.g., foo == null instead of !foo."

Also applies to: 325-326

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/scripts/verify-data-integrity/index.ts` around lines 236 - 237,
The cursor construction and conditional requests use truthiness on userCursor
which can conflate empty string with null/undefined; update the checks to use
explicit null/undefined comparisons (e.g., userCursor != null) when building
cursorParam and where userCursor is tested later (references: cursorParam,
userCursor and the call to expectStatusCode(`/api/v1/users?...`)), so that only
null/undefined are treated as “no cursor” and empty strings are preserved if
they are valid values.

267-300: Avoid repeated linear permission-definition scans in hot loops.

Both permission checks currently run .some(...) for every permission item. Building Sets once per project gives O(1) membership checks and reduces work on large projects.

♻️ Proposed refactor
+      const projectPermissionDefinitionIds = new Set(
+        projectPermissionDefinitions.items.map((p: any) => p.id),
+      );
+      const teamPermissionDefinitionIds = new Set(
+        teamPermissionDefinitions.items.map((p: any) => p.id),
+      );

               const projectPermissions = await expectStatusCode(200, `/api/v1/project-permissions?user_id=${user.id}`, {
@@
-              for (const projectPermission of projectPermissions.items) {
-                // `any` because these endpoint response types aren't imported here,
-                // and this script is intentionally tolerant of response shape changes.
-                if (!projectPermissionDefinitions.items.some((p: any) => p.id === projectPermission.id)) {
+              for (const projectPermission of projectPermissions.items) {
+                if (!projectPermissionDefinitionIds.has(projectPermission.id)) {
                   throw new StackAssertionError(deindent`
                       Project permission ${projectPermission.id} not found in project permission definitions.
                     `);
                 }
               }
@@
-                  for (const teamPermission of teamPermissions.items) {
-                    // `any` because these endpoint response types aren't imported here,
-                    // and this script is intentionally tolerant of response shape changes.
-                    if (!teamPermissionDefinitions.items.some((p: any) => p.id === teamPermission.id)) {
+                  for (const teamPermission of teamPermissions.items) {
+                    if (!teamPermissionDefinitionIds.has(teamPermission.id)) {
                       throw new StackAssertionError(deindent`
                           Team permission ${teamPermission.id} not found in team permission definitions.
                         `);
                     }
                   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/scripts/verify-data-integrity/index.ts` around lines 267 - 300,
The permission checks inside the project and team loops currently call
projectPermissionDefinitions.items.some(...) and
teamPermissionDefinitions.items.some(...) for every permission, causing repeated
linear scans; fix this by creating Sets of definition ids once before the loops
(e.g., const projectPermissionDefIds = new
Set(projectPermissionDefinitions.items.map((p:any) => p.id)) and const
teamPermissionDefIds = new Set(teamPermissionDefinitions.items.map((p:any) =>
p.id))) and then replace the per-item .some checks with O(1) membership checks
(projectPermissionDefIds.has(projectPermission.id) and
teamPermissionDefIds.has(teamPermission.id)) in the loops that iterate
projectPermissions.items and teamPermissions.items respectively.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/backend/scripts/verify-data-integrity/index.ts`:
- Around line 250-251: The test calls that interpolate dynamic IDs (e.g., the
expectStatusCode calls that build `/api/v1/users/${user.id}` and the similar
calls at the other noted locations) should encode those IDs before inserting
into the path; update the string construction to use a safe encoder (e.g., wrap
the ID with encodeURIComponent or use the project's urlString helper) so the
endpoint path becomes `/api/v1/users/${encodeURIComponent(user.id)}` (and
similarly for the other calls at 259-260, 277-278, 288-289) to ensure
consistent, safe URL encoding when calling expectStatusCode.
- Around line 236-237: The cursor construction and conditional requests use
truthiness on userCursor which can conflate empty string with null/undefined;
update the checks to use explicit null/undefined comparisons (e.g., userCursor
!= null) when building cursorParam and where userCursor is tested later
(references: cursorParam, userCursor and the call to
expectStatusCode(`/api/v1/users?...`)), so that only null/undefined are treated
as “no cursor” and empty strings are preserved if they are valid values.
- Around line 267-300: The permission checks inside the project and team loops
currently call projectPermissionDefinitions.items.some(...) and
teamPermissionDefinitions.items.some(...) for every permission, causing repeated
linear scans; fix this by creating Sets of definition ids once before the loops
(e.g., const projectPermissionDefIds = new
Set(projectPermissionDefinitions.items.map((p:any) => p.id)) and const
teamPermissionDefIds = new Set(teamPermissionDefinitions.items.map((p:any) =>
p.id))) and then replace the per-item .some checks with O(1) membership checks
(projectPermissionDefIds.has(projectPermission.id) and
teamPermissionDefIds.has(teamPermission.id)) in the loops that iterate
projectPermissions.items and teamPermissions.items respectively.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e61c688c-3b16-4ed1-9ea2-f57352b941aa

📥 Commits

Reviewing files that changed from the base of the PR and between 671e337 and 8773d61.

📒 Files selected for processing (1)
  • apps/backend/scripts/verify-data-integrity/index.ts

@nams1570 nams1570 merged commit 1d00ed2 into dev Mar 23, 2026
36 checks passed
@nams1570 nams1570 deleted the inspect-verify-data-memory-leak branch March 23, 2026 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants