feat(api): migrate GET /api/apify/runs/{runId} by arpitgupta1214 · Pull Request #463 · recoupable/api

arpitgupta1214 · 2026-04-21T18:51:13Z

Ports the Apify run-status endpoint to GET /api/apify/runs/{runId} using the Apify SDK; response renames datasetId to dataset_id (snake_case). Errors now surface as 500 rather than being masked as RUNNING with empty dataset. Auth required; no per-account access check since runId is not an account-scoped resource.

Test plan

Preview: GET /api/apify/runs/{runId} with x-api-key returns 200 with status and dataset_id
Preview: no auth header returns 401
Preview: FAILED / ABORTED / missing dataset returns 500

Summary by CodeRabbit

New Features
- New API endpoint for retrieving scraper run status and dataset results
- Request validation and authentication checks included
- CORS support for cross-origin API calls
- Returns run status and dataset items upon successful completion

Ports the Apify run-status endpoint from the legacy Express service into mono/api as a RESTful Next.js route. Uses the Apify SDK (not raw fetch) to match sibling start-scrape helpers. Wire format renames datasetId -> dataset_id (snake_case). Auth is required via validateAuthContext; no per-account access check (runId is an Apify-scoped identifier, not user-scoped). Does not preserve the legacy silent-error-to-RUNNING behaviour; errors propagate to a clean 500. Row 27 of the Agent API migration.

vercel · 2026-04-21T18:51:19Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
api	Ready	Preview	Apr 23, 2026 10:12pm

coderabbitai · 2026-04-21T18:51:26Z

Warning

Rate limit exceeded

@sweetmantech has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 57 minutes and 50 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 57 minutes and 50 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c4059bbd-fe37-42c9-a8af-f506a1b17c30

📥 Commits

Reviewing files that changed from the base of the PR and between beada44 and 27c470d.

⛔ Files ignored due to path filters (1)

lib/apify/__tests__/validateGetScraperResultsRequest.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**

📒 Files selected for processing (2)

lib/apify/getScraperResultsHandler.ts
lib/apify/validateGetScraperResultsRequest.ts

📝 Walkthrough

Walkthrough

A new GET API endpoint is introduced at /api/apify/runs/[runId] to retrieve Apify scraper run status and dataset results. The implementation includes request validation with authentication checks, CORS support, and conditional response handling based on run status.

Changes

Cohort / File(s)	Summary
API Route Handler `app/api/apify/runs/[runId]/route.ts`	Establishes a dynamic Next.js route with 30-second max duration, implements OPTIONS handler for CORS preflight, and delegates GET requests to handler business logic via route parameter extraction.
Request Validation `lib/apify/validateGetScraperResultsRequest.ts`	Introduces Zod-based schema validation for `runId` parameter and enforces authentication via `validateAuthContext`, returning typed results or error responses with CORS headers.
Response Handler `lib/apify/getScraperResultsHandler.ts`	Implements core business logic to fetch Apify run data, conditionally retrieve dataset items based on status, and construct typed JSON responses with appropriate HTTP status codes and error handling.

Sequence Diagram

sequenceDiagram
    actor Client
    participant Route as Route Handler<br/>/api/apify/runs/[runId]
    participant Validator as validateGetScraperResultsRequest
    participant Handler as getScraperResultsHandler
    participant Apify as Apify Client
    
    Client->>Route: GET /api/apify/runs/[runId]
    Route->>Validator: validateGetScraperResultsRequest(request, runId)
    Validator->>Validator: Parse & validate runId (Zod)
    Validator->>Validator: validateAuthContext(request)
    alt Validation or Auth Failed
        Validator-->>Route: NextResponse (400/403)
    else Success
        Validator-->>Route: { runId }
    end
    
    alt Validation Passed
        Route->>Handler: getScraperResultsHandler(request, runId)
        Handler->>Apify: apifyClient.run(runId).get()
        Apify-->>Handler: Run data (status, defaultDatasetId)
        
        alt Status is SUCCEEDED & dataset_id exists
            Handler->>Apify: apifyClient.dataset(dataset_id).listItems()
            Apify-->>Handler: Dataset items
            Handler-->>Route: 200 { status, dataset_id, data }
        else Status is FAILED or ABORTED
            Handler-->>Route: 500 { status, dataset_id }
        else Status is SUCCEEDED (no dataset)
            Handler-->>Route: 500 { status, dataset_id }
        else Any other status
            Handler-->>Route: 200 { status, dataset_id }
        end
    end
    
    Route-->>Client: JSON Response + CORS Headers

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

sweetmantech

Poem

🔗 New endpoint takes its stage,
Apify runs on every page,
Status checks with graceful care,
CORS headers floating in the air—
Validation guards the gate so tight,
Results flow forth, a shining light! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Solid & Clean Code	✅ Passed	Pull request demonstrates strong adherence to SOLID principles with clear separation of concerns across routing, validation, and business logic layers.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/migrate-apify-scraper

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

arpitgupta1214 · 2026-04-21T18:52:58Z

Preview smoke — api-git-feat-migrate-apify-scraper-recoupable-ad724970.vercel.app

Request	Expected	Got
`GET /api/apify/runs/bogus-runid-123` (no header)	401	401 `{"status":"error","error":"Exactly one of x-api-key or Authorization must be provided"}`
`GET /api/apify/runs/bogus-runid-123` with `x-api-key: $RECOUP_TEST_API_KEY`	200 snake_case `dataset_id`	200 `{"status":"UNKNOWN","dataset_id":null}`

dataset_id is snake_case as specified. UNKNOWN is the SDK's response for a non-existent run (apifyClient.run(bogus).get() resolves to undefined, which the helper maps to {status: "UNKNOWN", dataset_id: null}).

The recoup-api-*.vercel.app alias is not wired for this preview branch (404 DEPLOYMENT_NOT_FOUND); only the api-*.vercel.app preview host exists at PR time. The post-merge promotion to test will republish to both production aliases.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (2)

lib/apify/getScraperResultsHandler.ts (1)
27-63: Split the status response branching into a small helper.

getScraperResultsHandler exceeds the 20-line guideline and currently handles validation, Apify orchestration, dataset fetch, and response mapping in one function. Extracting the status-to-response branch would keep the route orchestration easier to maintain.

As per coding guidelines, **/*.{js,ts,tsx,jsx,py,java,cs,go,rb,php}: “Flag functions longer than 20 lines”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/apify/getScraperResultsHandler.ts` around lines 27 - 63, The function
getScraperResultsHandler is doing both orchestration and response-mapping;
extract the status-to-response branching into a small helper (e.g.,
mapActorStatusToResponse or buildScraperResponse) that takes the actor status
object ({ status, dataset_id }), plus optional data (from getDataset) and
getCorsHeaders(), and returns the proper NextResponse for each branch (SUCCEEDED
with/without dataset_id or data, FAILED/ABORTED -> 500, other -> 200). In
practice keep validateGetScraperResultsRequest, getActorStatus and getDataset
calls in getScraperResultsHandler but delegate all conditional logic that
inspects status and dataset_id to the new helper (reference symbols:
getScraperResultsHandler, validateGetScraperResultsRequest, getActorStatus,
getDataset, getCorsHeaders); ensure the new helper returns NextResponse and
replace the inline branching with a single call to it.
lib/apify/validateGetScraperResultsRequest.ts (1)
6-32: Export the actual Zod schema, not just the shape.

getScraperResultsParamsSchema is named/exported as a schema, but Line 6 exports only the object shape. This makes the exported API less reusable and forces Line 32 to recreate the schema. Export the z.object(...) directly and infer from it.
♻️ Proposed cleanup
-export const getScraperResultsParamsSchema = {
+export const getScraperResultsParamsSchema = z.object({
   runId: z.string().min(1).describe("The Apify run identifier from the URL path."),
-};
+});
 
-export type GetScraperResultsParams = z.infer<z.ZodObject<typeof getScraperResultsParamsSchema>>;
+export type GetScraperResultsParams = z.infer<typeof getScraperResultsParamsSchema>;
@@
-  const parsed = z.object(getScraperResultsParamsSchema).safeParse({ runId });
+  const parsed = getScraperResultsParamsSchema.safeParse({ runId });
As per coding guidelines, lib/**/validate*.ts: “Create validate functions in validate<EndpointName>Body.ts or validate<EndpointName>Query.ts files that export both the schema and inferred TypeScript type”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/apify/validateGetScraperResultsRequest.ts` around lines 6 - 32,
getScraperResultsParamsSchema currently exports a plain object shape instead of
a Zod schema which forces recreate of the schema in
validateGetScraperResultsRequest; change getScraperResultsParamsSchema to export
the actual Zod object (e.g. const getScraperResultsParamsSchema = z.object({
runId: z.string().min(1).describe(...) })) and update GetScraperResultsParams to
infer from z.infer<typeof getScraperResultsParamsSchema>, then in
validateGetScraperResultsRequest use getScraperResultsParamsSchema.safeParse({
runId }) (and remove the inline z.object(...) there) so the single exported
schema is reused across the file and by callers.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/api/apify/runs/`[runId]/route.ts:
- Around line 13-36: The route lacks route-level tests for the OPTIONS preflight
and the async GET params resolution; add tests that call the exported OPTIONS
function and assert it returns status 200 with headers from getCorsHeaders(),
and add a test that invokes the exported GET function with a mock NextRequest
and a promise-based params object (resolving to { runId }) to ensure the async
params are awaited and that GET delegates to getScraperResultsHandler with the
resolved runId; reference the exported functions OPTIONS and GET and the handler
getScraperResultsHandler (and helper getCorsHeaders) when locating the code to
test.

In `@lib/apify/getActorStatus.ts`:
- Around line 20-25: The getActorStatus function silently returns a fallback
"UNKNOWN" and null dataset when apifyClient.run(...).get() returns undefined;
update getActorStatus to guard for a missing run (the local variable run from
apifyClient.run(runId).get()) and throw an Error (or propagate a descriptive
error) instead of returning a default status, removing the ?? "UNKNOWN" and ??
null fallbacks; update callers/tests by adding a unit/integration test that
simulates apifyClient.run(...).get() returning undefined and asserts that
getActorStatus throws so this regression cannot recur.

In `@lib/apify/getDataset.ts`:
- Around line 12-15: getDataset currently calls
apifyClient.dataset(datasetId).listItems() once and returns only the first page
(default 1000 items); change getDataset to page through results by calling
listItems repeatedly with a page limit (e.g., 1000) and an increasing offset (or
using the API's pagination token) until you've collected result.total items (or
a page returns no items). Accumulate items into an array and return the full
array (or null if initial call fails); update references to
apifyClient.dataset(...).listItems and the getDataset function to implement this
loop and respect result.total and per-page result.items.

In `@lib/apify/getScraperResultsHandler.ts`:
- Around line 64-65: The catch block inside getScraperResultsHandler currently
logs the raw caught value (error); change it to log a sanitized representation
instead: extract and log only safe fields such as error.name, error.message, and
a truncated error.stack (or omit stack in production), and if the error looks
like an HTTP/axios error (presence of config/headers/request/response), remove
or redact sensitive subfields (headers, authorization tokens, cookies, and full
request config) before logging; ensure the symbol getScraperResultsHandler's
catch uses this sanitized object and avoid logging the original error variable
directly.

In `@lib/apify/validateGetScraperResultsRequest.ts`:
- Around line 16-19: Update the validator comment to use account-scoped
terminology: replace "user-" with "account-" (and any other "user"/"entity"
occurrences) so it reads that a `runId` is an Apify-scoped identifier, not an
account- or artist-scoped resource; ensure the doc block in
validateGetScraperResultsRequest.ts consistently uses "account" (or specific
terms like "artist", "workspace", "organization" if applicable) to follow repo
guidelines.

---

Nitpick comments:
In `@lib/apify/getScraperResultsHandler.ts`:
- Around line 27-63: The function getScraperResultsHandler is doing both
orchestration and response-mapping; extract the status-to-response branching
into a small helper (e.g., mapActorStatusToResponse or buildScraperResponse)
that takes the actor status object ({ status, dataset_id }), plus optional data
(from getDataset) and getCorsHeaders(), and returns the proper NextResponse for
each branch (SUCCEEDED with/without dataset_id or data, FAILED/ABORTED -> 500,
other -> 200). In practice keep validateGetScraperResultsRequest, getActorStatus
and getDataset calls in getScraperResultsHandler but delegate all conditional
logic that inspects status and dataset_id to the new helper (reference symbols:
getScraperResultsHandler, validateGetScraperResultsRequest, getActorStatus,
getDataset, getCorsHeaders); ensure the new helper returns NextResponse and
replace the inline branching with a single call to it.

In `@lib/apify/validateGetScraperResultsRequest.ts`:
- Around line 6-32: getScraperResultsParamsSchema currently exports a plain
object shape instead of a Zod schema which forces recreate of the schema in
validateGetScraperResultsRequest; change getScraperResultsParamsSchema to export
the actual Zod object (e.g. const getScraperResultsParamsSchema = z.object({
runId: z.string().min(1).describe(...) })) and update GetScraperResultsParams to
infer from z.infer<typeof getScraperResultsParamsSchema>, then in
validateGetScraperResultsRequest use getScraperResultsParamsSchema.safeParse({
runId }) (and remove the inline z.object(...) there) so the single exported
schema is reused across the file and by callers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f0b9d110-1bc1-4f1e-b469-a905e0195ace

📥 Commits

Reviewing files that changed from the base of the PR and between f276a5a and 37d05a6.

⛔ Files ignored due to path filters (4)

lib/apify/__tests__/getActorStatus.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
lib/apify/__tests__/getDataset.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
lib/apify/__tests__/getScraperResultsHandler.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
lib/apify/__tests__/validateGetScraperResultsRequest.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**

📒 Files selected for processing (5)

app/api/apify/runs/[runId]/route.ts
lib/apify/getActorStatus.ts
lib/apify/getDataset.ts
lib/apify/getScraperResultsHandler.ts
lib/apify/validateGetScraperResultsRequest.ts

coderabbitai · 2026-04-21T18:58:54Z

+export async function OPTIONS() {
+  return new NextResponse(null, {
+    status: 200,
+    headers: getCorsHeaders(),
+  });
+}
+
+/**
+ * GET /api/apify/runs/{runId}
+ *
+ * Returns the status (and, on SUCCEEDED, the dataset items) of an Apify actor
+ * run. Authentication is required via `x-api-key` or `Authorization: Bearer`.
+ *
+ * @param request - The incoming request.
+ * @param options - Route options containing params.
+ * @param options.params - Route params containing the Apify `runId`.
+ * @returns A NextResponse with `{ status, dataset_id, data? }` shape.
+ */
+export async function GET(
+  request: NextRequest,
+  { params }: { params: Promise<{ runId: string }> },
+) {
+  const { runId } = await params;
+  return getScraperResultsHandler(request, runId);


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Search for tests that exercise the new /api/apify/runs route surface. # Expectation: Tests should cover OPTIONS and GET delegation/async params behavior. rg -nP 'api/apify/runs|\bOPTIONS\s*\(|\bGET\s*\(' --glob '*.{test,spec}.{ts,tsx}' -C 3

Repository: recoupable/api

Length of output: 9747

🏁 Script executed:

# Check if route-level test file exists for the apify runs endpoint find . -path "*/api/apify/runs*/__tests__/route*" -o -path "*/api/apify/runs*/*.test.ts" 2>/dev/null | head -20

Repository: recoupable/api

Length of output: 40

🏁 Script executed:

# Also check the directory structure of the apify runs endpoint ls -la app/api/apify/runs/\[runId\]/ 2>/dev/null || find app/api/apify -type d -name "runs" | xargs ls -la 2>/dev/null | head -30

Repository: recoupable/api

Length of output: 226

Add route-level tests for OPTIONS and async GET parameter handling.

This endpoint lacks route-level test coverage. Handler tests cover the business logic, but the route surface—CORS preflight via OPTIONS and async parameter resolution—should have dedicated tests per project pattern. Similar routes (e.g., app/api/admins/sandboxes) include tests for both. Add tests covering the OPTIONS response status and CORS headers, and verify the async params resolution is exercised before delegation to the handler.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@app/api/apify/runs/`[runId]/route.ts around lines 13 - 36, The route lacks route-level tests for the OPTIONS preflight and the async GET params resolution; add tests that call the exported OPTIONS function and assert it returns status 200 with headers from getCorsHeaders(), and add a test that invokes the exported GET function with a mock NextRequest and a promise-based params object (resolving to { runId }) to ensure the async params are awaited and that GET delegates to getScraperResultsHandler with the resolved runId; reference the exported functions OPTIONS and GET and the handler getScraperResultsHandler (and helper getCorsHeaders) when locating the code to test.

coderabbitai · 2026-04-21T18:58:54Z

+export async function getActorStatus(runId: string) {
+  const run = await apifyClient.run(runId).get();
+  return {
+    status: run?.status ?? "UNKNOWN",
+    dataset_id: run?.defaultDatasetId ?? null,
+  };


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Apify JavaScript client run(runId).get() behavior when the run ID does not exist: does it return undefined/null or throw?

💡 Result:

The Apify JavaScript client method run(runId).get returns undefined when the run ID does not exist, rather than throwing an error.

Citations:

1: https://docs.apify.com/api/client/js/reference/class/RunClient

2: https://docs.apify.com/api/client/js/reference/next/class/RunClient

🏁 Script executed:

# Check if there are other related files that handle errors from apifyClient calls rg "apifyClient" --type ts --type tsx -B 2 -A 2 | head -50

Repository: recoupable/api

Length of output: 85

🏁 Script executed:

# Search for how apifyClient is used and error handling patterns rg "apifyClient" -A 3 -B 1

Repository: recoupable/api

Length of output: 8389

🏁 Script executed:

# Check the test file to see expected behavior cat -n lib/apify/__tests__/getActorStatus.test.ts

Repository: recoupable/api

Length of output: 1494

🏁 Script executed:

# Check the current implementation of getActorStatus.ts cat -n lib/apify/getActorStatus.ts

Repository: recoupable/api

Length of output: 1339

Align with documented error-propagation design: throw on missing Apify runs.

Lines 23-24 return "UNKNOWN" for undefined runs, contradicting the JSDoc (lines 8-14) which explicitly states errors should propagate so handlers respond 500 cleanly. The SDK returns undefined for nonexistent run IDs; silently converting that to "UNKNOWN" masks missing runs as valid states, defeating the design goal of distinguishing real statuses from upstream outages.

Add a guard to throw when the run is missing, and remove the fallback:

Proposed fix

export async function getActorStatus(runId: string) { const run = await apifyClient.run(runId).get(); + if (!run) { + throw new Error(`Apify run not found: ${runId}`); + } return { - status: run?.status ?? "UNKNOWN", - dataset_id: run?.defaultDatasetId ?? null, + status: run.status, + dataset_id: run.defaultDatasetId ?? null, }; }

Also add a test case for the undefined run scenario to prevent regression.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export async function getActorStatus(runId: string) {

const run = await apifyClient.run(runId).get();

return {

status: run?.status ?? "UNKNOWN",

dataset_id: run?.defaultDatasetId ?? null,

};

export async function getActorStatus(runId: string) {

const run = await apifyClient.run(runId).get();

if (!run) {

throw new Error(`Apify run not found: ${runId}`);

}

return {

status: run.status,

dataset_id: run?.defaultDatasetId ?? null,

};

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@lib/apify/getActorStatus.ts` around lines 20 - 25, The getActorStatus function silently returns a fallback "UNKNOWN" and null dataset when apifyClient.run(...).get() returns undefined; update getActorStatus to guard for a missing run (the local variable run from apifyClient.run(runId).get()) and throw an Error (or propagate a descriptive error) instead of returning a default status, removing the ?? "UNKNOWN" and ?? null fallbacks; update callers/tests by adding a unit/integration test that simulates apifyClient.run(...).get() returning undefined and asserts that getActorStatus throws so this regression cannot recur.

coderabbitai · 2026-04-21T18:58:54Z

+export async function getDataset(datasetId: string): Promise<unknown[] | null> {
+  const result = await apifyClient.dataset(datasetId).listItems();
+  if (!result) return null;
+  return result.items as unknown[];


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Apify JavaScript client DatasetClient listItems default limit pagination result total offset limit

💡 Result:

The Apify JavaScript API client DatasetClient.listItems method lists items from a dataset using offset-based pagination. It returns a PaginatedList (or PaginatedIterator) object with these key properties: - items: Array of dataset items - total: Total number of items available in the dataset - offset: Starting position of the current page (default: 0) - count: Number of items returned in this response - limit: Number of items requested per page Parameters (from DatasetClientListItemOptions): - offset (number, optional): Position of first returned entry (default: 0) - limit (number, optional): Maximum number of entries requested. Default/maximum not explicitly stated in docs but API endpoint limits to 1000. - chunkSize (number, optional): For async iteration, max items per API call (respects API limits) - desc (boolean): Sort in descending order - Other filters: fields, clean, skipEmpty, etc. Examples: const { items, total } = await datasetClient.listItems({ limit: 100 }); const { items } = await datasetClient.listItems({ offset: 100, limit: 50, desc: true }); For full iteration: for await (const item of datasetClient.listItems({ limit: 1000 })) { // process item } Default limit is unspecified but practically 1000 max per call per API docs. Use offset/limit for pagination, total for knowing when to stop (offset + count >= total).

Citations:

1: https://docs.apify.com/api/client/js/reference/class/DatasetClient

2: https://github.com/apify/apify-client-js

3: https://docs.apify.com/api/client/js/reference/next/class/DatasetClient

4: https://docs.apify.com/api/client/js/docs/concepts/pagination.md

5: https://docs.apify.com/api/client/js/docs/concepts/pagination

6: https://docs.apify.com/api/client/js/reference/interface/DatasetClientListItemOptions

🏁 Script executed:

# Find and check package.json for Apify SDK version fd -e package.json | head -5 | xargs -I {} grep -l "apify" {} 2>/dev/null || grep -r "apify" package.json 2>/dev/null | head -10

Repository: recoupable/api

Length of output: 40

🏁 Script executed:

# Also check if package-lock.json exists for exact version git ls-files | grep -E "package(-lock)?\.json" | head -3

Repository: recoupable/api

Length of output: 70

🏁 Script executed:

# Search for usage of getDataset throughout the codebase rg "getDataset" --type ts --type tsx -B 2 -A 2

Repository: recoupable/api

Length of output: 85

🏁 Script executed:

# Check if there are other pagination patterns in lib/apify/ fd . lib/apify/ -e ts -e tsx | head -10 | xargs -I {} sh -c 'echo "=== {} ===" && cat -n {}'

Repository: recoupable/api

Length of output: 16869

🏁 Script executed:

cat package.json | grep -A 2 -B 2 "apify"

Repository: recoupable/api

Length of output: 204

Add pagination loop to fetch all dataset items beyond the 1000-item default limit.

The Apify SDK's listItems() defaults to limit 1000, returning only the first page. For datasets with >1000 items, this silently truncates results without error. The response includes a total field indicating available items—checking this reveals the truncation risk.

Implement pagination to accumulate all items across pages:

♻️ Pagination fix

export async function getDataset(datasetId: string): Promise<unknown[] | null> { - const result = await apifyClient.dataset(datasetId).listItems(); - if (!result) return null; - return result.items as unknown[]; + const items: unknown[] = []; + const limit = 1000; + let offset = 0; + + while (true) { + const result = await apifyClient.dataset(datasetId).listItems({ offset, limit }); + if (!result) return null; + + items.push(...(result.items as unknown[])); + + offset += result.items.length; + if (result.items.length === 0 || offset >= result.total) break; + } + + return items; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@lib/apify/getDataset.ts` around lines 12 - 15, getDataset currently calls apifyClient.dataset(datasetId).listItems() once and returns only the first page (default 1000 items); change getDataset to page through results by calling listItems repeatedly with a page limit (e.g., 1000) and an increasing offset (or using the API's pagination token) until you've collected result.total items (or a page returns no items). Accumulate items into an array and return the full array (or null if initial call fails); update references to apifyClient.dataset(...).listItems and the getDataset function to implement this loop and respect result.total and per-page result.items.

coderabbitai · 2026-04-21T18:58:54Z

+  } catch (error) {
+    console.error("[ERROR] getScraperResultsHandler error:", error);


⚠️ Potential issue | 🟠 Major

Avoid logging raw caught errors.

Line 65 logs the full thrown value. Upstream/auth/client errors can carry headers, tokens, request config, or other sensitive metadata. Log sanitized fields instead.

🛡️ Proposed sanitized logging

} catch (error) { - console.error("[ERROR] getScraperResultsHandler error:", error); + const message = error instanceof Error ? error.message : String(error); + console.error("[ERROR] getScraperResultsHandler error:", { message }); return NextResponse.json(

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

} catch (error) {

console.error("[ERROR] getScraperResultsHandler error:", error);

} catch (error) {

const message = error instanceof Error ? error.message : String(error);

console.error("[ERROR] getScraperResultsHandler error:", { message });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@lib/apify/getScraperResultsHandler.ts` around lines 64 - 65, The catch block inside getScraperResultsHandler currently logs the raw caught value (error); change it to log a sanitized representation instead: extract and log only safe fields such as error.name, error.message, and a truncated error.stack (or omit stack in production), and if the error looks like an HTTP/axios error (presence of config/headers/request/response), remove or redact sensitive subfields (headers, authorization tokens, cookies, and full request config) before logging; ensure the symbol getScraperResultsHandler's catch uses this sanitized object and avoid logging the original error variable directly.

cubic-dev-ai

3 issues found across 9 files

Confidence score: 3/5

There is concrete API behavior risk in lib/apify/getActorStatus.ts: missing Apify runs can fall through to UNKNOWN and return HTTP 200 for nonexistent runIds, which can mislead clients and mask errors.
lib/apify/getScraperResultsHandler.ts is flagged for missing rate-limiting on a scraping request path, creating operational/abuse risk and inconsistency with the project’s API rules.
The app/api/apify/runs/[runId]/route.ts multi-export finding may be partly convention/framework-driven, but with severity 7/10 items present, this sits in a moderate-risk range rather than a clearly safe merge.
Pay close attention to lib/apify/getActorStatus.ts, lib/apify/getScraperResultsHandler.ts, app/api/apify/runs/[runId]/route.ts - correct status handling for missing runs, enforce rate limiting, and validate route export conventions.

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="app/api/apify/runs/[runId]/route.ts">

<violation number="1" location="app/api/apify/runs/[runId]/route.ts:13">
P1: Custom agent: **Module should export a single primary function whose name matches the filename**

Module violates single-primary-export rule by exporting multiple top-level functions (`OPTIONS`, `GET`) and none matches filename basename `route`.</violation>
</file>

<file name="lib/apify/getScraperResultsHandler.ts">

<violation number="1" location="lib/apify/getScraperResultsHandler.ts:32">
P1: Custom agent: **API Design Consistency and Maintainability**

Scraping results endpoint is missing rate-limiting enforcement in its request path, violating the rule requiring rate limiting for scraping APIs.</violation>
</file>

<file name="lib/apify/getActorStatus.ts">

<violation number="1" location="lib/apify/getActorStatus.ts:21">
P1: Handle missing Apify runs explicitly instead of defaulting to `UNKNOWN`; otherwise nonexistent `runId`s are returned as HTTP 200.</violation>
</file>

Architecture diagram

sequenceDiagram
    participant Client
    participant API as "mono/api (Next.js)"
    participant Auth as "Auth Service"
    participant Apify as "Apify SDK/API"

    Note over Client,Apify: NEW: migrated route GET /api/apify/runs/{runId}

    Client->>API: GET /api/apify/runs/{runId}
    
    API->>API: Validate runId (Zod)
    
    API->>Auth: validateAuthContext(request)
    alt Auth Failed
        Auth-->>Client: 401 Unauthorized
    end
    Auth-->>API: AuthContext (Account/Org)

    Note over API,Apify: Interaction via Apify SDK (CHANGED from raw fetch)
    
    API->>Apify: getActorStatus(runId)
    Apify-->>API: { status, defaultDatasetId }

    alt status == "SUCCEEDED"
        opt has dataset_id
            API->>Apify: getDataset(dataset_id)
            Apify-->>API: { items }
        end
        
        alt dataset found
            API-->>Client: 200 OK { status, dataset_id, data: items }
        else dataset missing/null
            API-->>Client: 500 Internal Server Error
        end

    else status == "RUNNING" | "READY"
        API-->>Client: 200 OK { status, dataset_id }

    else status == "FAILED" | "ABORTED"
        API-->>Client: CHANGED: 500 Internal Server Error { status, dataset_id }

    else SDK/Network Error
        API-->>Client: CHANGED: 500 Internal Server Error (No longer masks as RUNNING)
    end

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-04-21T19:03:33Z

+ *
+ * @returns A NextResponse with CORS headers.
+ */
+export async function OPTIONS() {


P1: Custom agent: Module should export a single primary function whose name matches the filename

Module violates single-primary-export rule by exporting multiple top-level functions (OPTIONS, GET) and none matches filename basename route.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At app/api/apify/runs/[runId]/route.ts, line 13: <comment>Module violates single-primary-export rule by exporting multiple top-level functions (`OPTIONS`, `GET`) and none matches filename basename `route`.</comment> <file context> @@ -0,0 +1,37 @@ + * + * @returns A NextResponse with CORS headers. + */ +export async function OPTIONS() { + return new NextResponse(null, { + status: 200, </file context>

cubic-dev-ai · 2026-04-21T19:03:33Z

+  runId: string,
+): Promise<NextResponse> {
+  try {
+    const validated = await validateGetScraperResultsRequest(request, runId);


P1: Custom agent: API Design Consistency and Maintainability

Scraping results endpoint is missing rate-limiting enforcement in its request path, violating the rule requiring rate limiting for scraping APIs.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lib/apify/getScraperResultsHandler.ts, line 32: <comment>Scraping results endpoint is missing rate-limiting enforcement in its request path, violating the rule requiring rate limiting for scraping APIs.</comment> <file context> @@ -0,0 +1,73 @@ + runId: string, +): Promise<NextResponse> { + try { + const validated = await validateGetScraperResultsRequest(request, runId); + if (validated instanceof NextResponse) { + return validated; </file context>

- Drop lib/apify/getActorStatus.ts and lib/apify/getDataset.ts helpers; call the SDK directly from the handler. - Flatten getScraperResultsHandler branching into a single success path plus a shared status-code fallback. - Declare getScraperResultsParamsSchema as a z.object directly. - Trim jsdoc across the PR.

cubic-dev-ai

1 issue found across 8 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="lib/apify/getScraperResultsHandler.ts">

<violation number="1" location="lib/apify/getScraperResultsHandler.ts:22">
P2: Handle `undefined` from `apifyClient.run(...).get()` explicitly; otherwise nonexistent runs return 200 with `status: "UNKNOWN"` and mask an error condition.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-04-23T00:54:53Z

+    const status = run?.status ?? "UNKNOWN";
+    const dataset_id = run?.defaultDatasetId ?? null;


P2: Handle undefined from apifyClient.run(...).get() explicitly; otherwise nonexistent runs return 200 with status: "UNKNOWN" and mask an error condition.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lib/apify/getScraperResultsHandler.ts, line 22: <comment>Handle `undefined` from `apifyClient.run(...).get()` explicitly; otherwise nonexistent runs return 200 with `status: "UNKNOWN"` and mask an error condition.</comment> <file context> @@ -1,72 +1,39 @@ - const { status, dataset_id } = await getActorStatus(validated.runId); + const run = await apifyClient.run(validated.runId).get(); + const status = run?.status ?? "UNKNOWN"; + const dataset_id = run?.defaultDatasetId ?? null; </file context>

Suggested change

const status = run?.status ?? "UNKNOWN";

const dataset_id = run?.defaultDatasetId ?? null;

if (!run) {

throw new Error("Apify run not found");

}

const status = run.status;

const dataset_id = run.defaultDatasetId ?? null;

This reverts commit c64fc8b.

cubic-dev-ai

0 issues found across 2 files (changes from recent commits).

_{Requires human review: Auto-approval blocked by 3 unresolved issues from previous reviews.}

- Switch validator from validateAuthContext to validateAdminAuth so only admin accounts can poll Apify run status. - Run auth before the runId schema check — an unauthenticated request should never reveal param-level errors.

cubic-dev-ai

0 issues found across 2 files (changes from recent commits).

_{Requires human review: Auto-approval blocked by 3 unresolved issues from previous reviews.}

…raper

sweetmantech · 2026-04-23T22:18:03Z

Preview smoke test

Against preview https://api-git-feat-migrate-apify-scraper-recoupable-ad724970.vercel.app at commit 27c470d1 (synced with test).

Results

#	Case	Expected	Got
1	No auth header	401	✅ 401 `"Exactly one of x-api-key or Authorization must be provided"`
2	Admin auth + nonexistent runId	200 `UNKNOWN`	✅ 200 `{"status":"UNKNOWN","dataset_id":null}`
3	Legacy `GET /api/apify/scraper?runId=…`	404 (clean cut)	✅ 404 — `app/api/apify/scraper/route.ts` no longer exists
4	Admin auth + real SUCCEEDED runId	200 + populated `data`	✅ 200 with full Apify dataset payload

End-to-end chain exercised

POST /api/socials/{socialId}/scrape        → returns {runId, datasetId}
GET  /api/apify/runs/{runId}               → returns {status, dataset_id, data}

Triggered scrape on PinkPantheress's Instagram social profile (social_id=02061320-978a-4394-a2c1-6062272683a8):

// POST /api/socials/02061320-978a-4394-a2c1-6062272683a8/scrape → 200
{ "runId": "VpqICClParjRjKNCf", "datasetId": "c4t9gsY5fAWbX0GNu" }

Polling the new endpoint returned SUCCEEDED on the first call with populated data:

// GET /api/apify/runs/VpqICClParjRjKNCf → 200
{
  "status": "SUCCEEDED",
  "dataset_id": "c4t9gsY5fAWbX0GNu",
  "data": [
    {
      "inputUrl": "https://www.instagram.com/pinkpantheress",
      "id": "39559476848",
      "username": "pinkpantheress",
      "fullName": "🫀",
      "biography": "",
      "externalUrls": [
        { "title": "VISIT MY STORE 💋🤭🤓❤️", "url": "..." }
      ]
      // ...rest of the Instagram profile scrape
    }
  ]
}

Findings

✅ Admin-only auth gate works (validateAdminAuth) — no auth → 401; my admin key → passes the check.
✅ Handler's branching matches design: SUCCEEDED with items → 200 + data; nonexistent run → 200 with UNKNOWN (not in FAILED/ABORTED/SUCCEEDED set, correctly non-terminal).
✅ Legacy /api/apify/scraper fully removed — migration is a clean cut rather than dual-path.
✅ Response shape ({status, dataset_id, data?}) matches JSDoc on getScraperResultsHandler and handler logic.

Not directly exercised (no fixtures available)

403 for a non-admin caller — I only have an admin key.
500 for FAILED/ABORTED runs — would need a broken Apify run. Handler logic covers it (status === "FAILED" || "ABORTED" || "SUCCEEDED" with no dataset → 500).

🤖 Tested with Claude Code

vercel Bot deployed to Preview April 21, 2026 18:52 View deployment

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Apr 21, 2026

View reviewed changes

arpitgupta1214 changed the title ~~feat: migrate GET /api/apify/scraper to GET /api/apify/runs/{runId}~~ feat(api): migrate GET /api/apify/runs/{runId} Apr 22, 2026

vercel Bot deployed to Preview April 23, 2026 00:52 View deployment

cubic-dev-ai Bot reviewed Apr 23, 2026

View reviewed changes

arpitgupta1214 added 2 commits April 23, 2026 06:27

refactor: drop unused default exports

c64fc8b

Revert "refactor: drop unused default exports"

263511f

This reverts commit c64fc8b.

vercel Bot deployed to Preview April 23, 2026 00:59 View deployment

refactor: drop default exports to match handler/validator pattern

be38e53

vercel Bot deployed to Preview April 23, 2026 01:01 View deployment

cubic-dev-ai Bot reviewed Apr 23, 2026

View reviewed changes

This was referenced Apr 23, 2026

chore: remove dead get_apify_scraper tool recoupable/chat#1698

Closed

feat(tasks): migrate GET /api/apify/runs/{runId} recoupable/tasks#145

Merged

refactor: admin-only auth, auth before param validation

96e799f

- Switch validator from validateAuthContext to validateAdminAuth so only admin accounts can poll Apify run status. - Run auth before the runId schema check — an unauthenticated request should never reveal param-level errors.

vercel Bot deployed to Preview April 23, 2026 01:17 View deployment

cubic-dev-ai Bot reviewed Apr 23, 2026

View reviewed changes

sweetmantech reviewed Apr 23, 2026

View reviewed changes

Comment thread lib/apify/validateGetScraperResultsRequest.ts

sweetmantech approved these changes Apr 23, 2026

View reviewed changes

Merge remote-tracking branch 'origin/test' into feat/migrate-apify-sc…

27c470d

…raper

vercel Bot deployed to Preview April 23, 2026 22:12 View deployment

sweetmantech merged commit 306340b into test Apr 23, 2026
6 checks passed

sweetmantech mentioned this pull request Apr 23, 2026

Promote test to main #478

Merged

		} catch (error) {
		console.error("[ERROR] getScraperResultsHandler error:", error);

		const status = run?.status ?? "UNKNOWN";
		const dataset_id = run?.defaultDatasetId ?? null;

-    const status = run?.status ?? "UNKNOWN";
-    const dataset_id = run?.defaultDatasetId ?? null;
+    if (!run) {
+      throw new Error("Apify run not found");
+    }
+    const status = run.status;
+    const dataset_id = run.defaultDatasetId ?? null;

Conversation

arpitgupta1214 commented Apr 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Summary by CodeRabbit

Uh oh!

vercel Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

arpitgupta1214 commented Apr 21, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sweetmantech commented Apr 23, 2026

Preview smoke test

Results

End-to-end chain exercised

Findings

Not directly exercised (no fixtures available)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arpitgupta1214 commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented Apr 21, 2026 •

edited

Loading

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

cubic-dev-ai Bot Apr 21, 2026 •

edited

Loading

cubic-dev-ai Bot Apr 21, 2026 •

edited

Loading

cubic-dev-ai Bot Apr 23, 2026 •

edited

Loading