Skip to content

feat(api): migrate GET /api/apify/runs/{runId}#463

Merged
sweetmantech merged 7 commits intotestfrom
feat/migrate-apify-scraper
Apr 23, 2026
Merged

feat(api): migrate GET /api/apify/runs/{runId}#463
sweetmantech merged 7 commits intotestfrom
feat/migrate-apify-scraper

Conversation

@arpitgupta1214
Copy link
Copy Markdown
Collaborator

@arpitgupta1214 arpitgupta1214 commented Apr 21, 2026

Ports the Apify run-status endpoint to GET /api/apify/runs/{runId} using the Apify SDK; response renames datasetId to dataset_id (snake_case). Errors now surface as 500 rather than being masked as RUNNING with empty dataset. Auth required; no per-account access check since runId is not an account-scoped resource.

Test plan

  • Preview: GET /api/apify/runs/{runId} with x-api-key returns 200 with status and dataset_id
  • Preview: no auth header returns 401
  • Preview: FAILED / ABORTED / missing dataset returns 500

Summary by CodeRabbit

  • New Features
    • New API endpoint for retrieving scraper run status and dataset results
    • Request validation and authentication checks included
    • CORS support for cross-origin API calls
    • Returns run status and dataset items upon successful completion

Ports the Apify run-status endpoint from the legacy Express service into
mono/api as a RESTful Next.js route. Uses the Apify SDK (not raw fetch)
to match sibling start-scrape helpers. Wire format renames
datasetId -> dataset_id (snake_case). Auth is required via
validateAuthContext; no per-account access check (runId is an
Apify-scoped identifier, not user-scoped). Does not preserve the legacy
silent-error-to-RUNNING behaviour; errors propagate to a clean 500.

Row 27 of the Agent API migration.
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
api Ready Ready Preview Apr 23, 2026 10:12pm

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Warning

Rate limit exceeded

@sweetmantech has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 57 minutes and 50 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 57 minutes and 50 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c4059bbd-fe37-42c9-a8af-f506a1b17c30

📥 Commits

Reviewing files that changed from the base of the PR and between beada44 and 27c470d.

⛔ Files ignored due to path filters (1)
  • lib/apify/__tests__/validateGetScraperResultsRequest.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
📒 Files selected for processing (2)
  • lib/apify/getScraperResultsHandler.ts
  • lib/apify/validateGetScraperResultsRequest.ts
📝 Walkthrough

Walkthrough

A new GET API endpoint is introduced at /api/apify/runs/[runId] to retrieve Apify scraper run status and dataset results. The implementation includes request validation with authentication checks, CORS support, and conditional response handling based on run status.

Changes

Cohort / File(s) Summary
API Route Handler
app/api/apify/runs/[runId]/route.ts
Establishes a dynamic Next.js route with 30-second max duration, implements OPTIONS handler for CORS preflight, and delegates GET requests to handler business logic via route parameter extraction.
Request Validation
lib/apify/validateGetScraperResultsRequest.ts
Introduces Zod-based schema validation for runId parameter and enforces authentication via validateAuthContext, returning typed results or error responses with CORS headers.
Response Handler
lib/apify/getScraperResultsHandler.ts
Implements core business logic to fetch Apify run data, conditionally retrieve dataset items based on status, and construct typed JSON responses with appropriate HTTP status codes and error handling.

Sequence Diagram

sequenceDiagram
    actor Client
    participant Route as Route Handler<br/>/api/apify/runs/[runId]
    participant Validator as validateGetScraperResultsRequest
    participant Handler as getScraperResultsHandler
    participant Apify as Apify Client
    
    Client->>Route: GET /api/apify/runs/[runId]
    Route->>Validator: validateGetScraperResultsRequest(request, runId)
    Validator->>Validator: Parse & validate runId (Zod)
    Validator->>Validator: validateAuthContext(request)
    alt Validation or Auth Failed
        Validator-->>Route: NextResponse (400/403)
    else Success
        Validator-->>Route: { runId }
    end
    
    alt Validation Passed
        Route->>Handler: getScraperResultsHandler(request, runId)
        Handler->>Apify: apifyClient.run(runId).get()
        Apify-->>Handler: Run data (status, defaultDatasetId)
        
        alt Status is SUCCEEDED & dataset_id exists
            Handler->>Apify: apifyClient.dataset(dataset_id).listItems()
            Apify-->>Handler: Dataset items
            Handler-->>Route: 200 { status, dataset_id, data }
        else Status is FAILED or ABORTED
            Handler-->>Route: 500 { status, dataset_id }
        else Status is SUCCEEDED (no dataset)
            Handler-->>Route: 500 { status, dataset_id }
        else Any other status
            Handler-->>Route: 200 { status, dataset_id }
        end
    end
    
    Route-->>Client: JSON Response + CORS Headers
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • sweetmantech

Poem

🔗 New endpoint takes its stage,
Apify runs on every page,
Status checks with graceful care,
CORS headers floating in the air—
Validation guards the gate so tight,
Results flow forth, a shining light! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Solid & Clean Code ✅ Passed Pull request demonstrates strong adherence to SOLID principles with clear separation of concerns across routing, validation, and business logic layers.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/migrate-apify-scraper

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@arpitgupta1214
Copy link
Copy Markdown
Collaborator Author

Preview smokeapi-git-feat-migrate-apify-scraper-recoupable-ad724970.vercel.app

Request Expected Got
GET /api/apify/runs/bogus-runid-123 (no header) 401 401 {"status":"error","error":"Exactly one of x-api-key or Authorization must be provided"}
GET /api/apify/runs/bogus-runid-123 with x-api-key: $RECOUP_TEST_API_KEY 200 snake_case dataset_id 200 {"status":"UNKNOWN","dataset_id":null}

dataset_id is snake_case as specified. UNKNOWN is the SDK's response for a non-existent run (apifyClient.run(bogus).get() resolves to undefined, which the helper maps to {status: "UNKNOWN", dataset_id: null}).

The recoup-api-*.vercel.app alias is not wired for this preview branch (404 DEPLOYMENT_NOT_FOUND); only the api-*.vercel.app preview host exists at PR time. The post-merge promotion to test will republish to both production aliases.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
lib/apify/getScraperResultsHandler.ts (1)

27-63: Split the status response branching into a small helper.

getScraperResultsHandler exceeds the 20-line guideline and currently handles validation, Apify orchestration, dataset fetch, and response mapping in one function. Extracting the status-to-response branch would keep the route orchestration easier to maintain.

As per coding guidelines, **/*.{js,ts,tsx,jsx,py,java,cs,go,rb,php}: “Flag functions longer than 20 lines”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/apify/getScraperResultsHandler.ts` around lines 27 - 63, The function
getScraperResultsHandler is doing both orchestration and response-mapping;
extract the status-to-response branching into a small helper (e.g.,
mapActorStatusToResponse or buildScraperResponse) that takes the actor status
object ({ status, dataset_id }), plus optional data (from getDataset) and
getCorsHeaders(), and returns the proper NextResponse for each branch (SUCCEEDED
with/without dataset_id or data, FAILED/ABORTED -> 500, other -> 200). In
practice keep validateGetScraperResultsRequest, getActorStatus and getDataset
calls in getScraperResultsHandler but delegate all conditional logic that
inspects status and dataset_id to the new helper (reference symbols:
getScraperResultsHandler, validateGetScraperResultsRequest, getActorStatus,
getDataset, getCorsHeaders); ensure the new helper returns NextResponse and
replace the inline branching with a single call to it.
lib/apify/validateGetScraperResultsRequest.ts (1)

6-32: Export the actual Zod schema, not just the shape.

getScraperResultsParamsSchema is named/exported as a schema, but Line 6 exports only the object shape. This makes the exported API less reusable and forces Line 32 to recreate the schema. Export the z.object(...) directly and infer from it.

♻️ Proposed cleanup
-export const getScraperResultsParamsSchema = {
+export const getScraperResultsParamsSchema = z.object({
   runId: z.string().min(1).describe("The Apify run identifier from the URL path."),
-};
+});
 
-export type GetScraperResultsParams = z.infer<z.ZodObject<typeof getScraperResultsParamsSchema>>;
+export type GetScraperResultsParams = z.infer<typeof getScraperResultsParamsSchema>;
@@
-  const parsed = z.object(getScraperResultsParamsSchema).safeParse({ runId });
+  const parsed = getScraperResultsParamsSchema.safeParse({ runId });

As per coding guidelines, lib/**/validate*.ts: “Create validate functions in validate<EndpointName>Body.ts or validate<EndpointName>Query.ts files that export both the schema and inferred TypeScript type”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/apify/validateGetScraperResultsRequest.ts` around lines 6 - 32,
getScraperResultsParamsSchema currently exports a plain object shape instead of
a Zod schema which forces recreate of the schema in
validateGetScraperResultsRequest; change getScraperResultsParamsSchema to export
the actual Zod object (e.g. const getScraperResultsParamsSchema = z.object({
runId: z.string().min(1).describe(...) })) and update GetScraperResultsParams to
infer from z.infer<typeof getScraperResultsParamsSchema>, then in
validateGetScraperResultsRequest use getScraperResultsParamsSchema.safeParse({
runId }) (and remove the inline z.object(...) there) so the single exported
schema is reused across the file and by callers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/api/apify/runs/`[runId]/route.ts:
- Around line 13-36: The route lacks route-level tests for the OPTIONS preflight
and the async GET params resolution; add tests that call the exported OPTIONS
function and assert it returns status 200 with headers from getCorsHeaders(),
and add a test that invokes the exported GET function with a mock NextRequest
and a promise-based params object (resolving to { runId }) to ensure the async
params are awaited and that GET delegates to getScraperResultsHandler with the
resolved runId; reference the exported functions OPTIONS and GET and the handler
getScraperResultsHandler (and helper getCorsHeaders) when locating the code to
test.

In `@lib/apify/getActorStatus.ts`:
- Around line 20-25: The getActorStatus function silently returns a fallback
"UNKNOWN" and null dataset when apifyClient.run(...).get() returns undefined;
update getActorStatus to guard for a missing run (the local variable run from
apifyClient.run(runId).get()) and throw an Error (or propagate a descriptive
error) instead of returning a default status, removing the ?? "UNKNOWN" and ??
null fallbacks; update callers/tests by adding a unit/integration test that
simulates apifyClient.run(...).get() returning undefined and asserts that
getActorStatus throws so this regression cannot recur.

In `@lib/apify/getDataset.ts`:
- Around line 12-15: getDataset currently calls
apifyClient.dataset(datasetId).listItems() once and returns only the first page
(default 1000 items); change getDataset to page through results by calling
listItems repeatedly with a page limit (e.g., 1000) and an increasing offset (or
using the API's pagination token) until you've collected result.total items (or
a page returns no items). Accumulate items into an array and return the full
array (or null if initial call fails); update references to
apifyClient.dataset(...).listItems and the getDataset function to implement this
loop and respect result.total and per-page result.items.

In `@lib/apify/getScraperResultsHandler.ts`:
- Around line 64-65: The catch block inside getScraperResultsHandler currently
logs the raw caught value (error); change it to log a sanitized representation
instead: extract and log only safe fields such as error.name, error.message, and
a truncated error.stack (or omit stack in production), and if the error looks
like an HTTP/axios error (presence of config/headers/request/response), remove
or redact sensitive subfields (headers, authorization tokens, cookies, and full
request config) before logging; ensure the symbol getScraperResultsHandler's
catch uses this sanitized object and avoid logging the original error variable
directly.

In `@lib/apify/validateGetScraperResultsRequest.ts`:
- Around line 16-19: Update the validator comment to use account-scoped
terminology: replace "user-" with "account-" (and any other "user"/"entity"
occurrences) so it reads that a `runId` is an Apify-scoped identifier, not an
account- or artist-scoped resource; ensure the doc block in
validateGetScraperResultsRequest.ts consistently uses "account" (or specific
terms like "artist", "workspace", "organization" if applicable) to follow repo
guidelines.

---

Nitpick comments:
In `@lib/apify/getScraperResultsHandler.ts`:
- Around line 27-63: The function getScraperResultsHandler is doing both
orchestration and response-mapping; extract the status-to-response branching
into a small helper (e.g., mapActorStatusToResponse or buildScraperResponse)
that takes the actor status object ({ status, dataset_id }), plus optional data
(from getDataset) and getCorsHeaders(), and returns the proper NextResponse for
each branch (SUCCEEDED with/without dataset_id or data, FAILED/ABORTED -> 500,
other -> 200). In practice keep validateGetScraperResultsRequest, getActorStatus
and getDataset calls in getScraperResultsHandler but delegate all conditional
logic that inspects status and dataset_id to the new helper (reference symbols:
getScraperResultsHandler, validateGetScraperResultsRequest, getActorStatus,
getDataset, getCorsHeaders); ensure the new helper returns NextResponse and
replace the inline branching with a single call to it.

In `@lib/apify/validateGetScraperResultsRequest.ts`:
- Around line 6-32: getScraperResultsParamsSchema currently exports a plain
object shape instead of a Zod schema which forces recreate of the schema in
validateGetScraperResultsRequest; change getScraperResultsParamsSchema to export
the actual Zod object (e.g. const getScraperResultsParamsSchema = z.object({
runId: z.string().min(1).describe(...) })) and update GetScraperResultsParams to
infer from z.infer<typeof getScraperResultsParamsSchema>, then in
validateGetScraperResultsRequest use getScraperResultsParamsSchema.safeParse({
runId }) (and remove the inline z.object(...) there) so the single exported
schema is reused across the file and by callers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f0b9d110-1bc1-4f1e-b469-a905e0195ace

📥 Commits

Reviewing files that changed from the base of the PR and between f276a5a and 37d05a6.

⛔ Files ignored due to path filters (4)
  • lib/apify/__tests__/getActorStatus.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
  • lib/apify/__tests__/getDataset.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
  • lib/apify/__tests__/getScraperResultsHandler.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
  • lib/apify/__tests__/validateGetScraperResultsRequest.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
📒 Files selected for processing (5)
  • app/api/apify/runs/[runId]/route.ts
  • lib/apify/getActorStatus.ts
  • lib/apify/getDataset.ts
  • lib/apify/getScraperResultsHandler.ts
  • lib/apify/validateGetScraperResultsRequest.ts

Comment on lines +13 to +36
export async function OPTIONS() {
return new NextResponse(null, {
status: 200,
headers: getCorsHeaders(),
});
}

/**
* GET /api/apify/runs/{runId}
*
* Returns the status (and, on SUCCEEDED, the dataset items) of an Apify actor
* run. Authentication is required via `x-api-key` or `Authorization: Bearer`.
*
* @param request - The incoming request.
* @param options - Route options containing params.
* @param options.params - Route params containing the Apify `runId`.
* @returns A NextResponse with `{ status, dataset_id, data? }` shape.
*/
export async function GET(
request: NextRequest,
{ params }: { params: Promise<{ runId: string }> },
) {
const { runId } = await params;
return getScraperResultsHandler(request, runId);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Search for tests that exercise the new /api/apify/runs route surface.
# Expectation: Tests should cover OPTIONS and GET delegation/async params behavior.

rg -nP 'api/apify/runs|\bOPTIONS\s*\(|\bGET\s*\(' --glob '*.{test,spec}.{ts,tsx}' -C 3

Repository: recoupable/api

Length of output: 9747


🏁 Script executed:

# Check if route-level test file exists for the apify runs endpoint
find . -path "*/api/apify/runs*/__tests__/route*" -o -path "*/api/apify/runs*/*.test.ts" 2>/dev/null | head -20

Repository: recoupable/api

Length of output: 40


🏁 Script executed:

# Also check the directory structure of the apify runs endpoint
ls -la app/api/apify/runs/\[runId\]/ 2>/dev/null || find app/api/apify -type d -name "runs" | xargs ls -la 2>/dev/null | head -30

Repository: recoupable/api

Length of output: 226


Add route-level tests for OPTIONS and async GET parameter handling.

This endpoint lacks route-level test coverage. Handler tests cover the business logic, but the route surface—CORS preflight via OPTIONS and async parameter resolution—should have dedicated tests per project pattern. Similar routes (e.g., app/api/admins/sandboxes) include tests for both. Add tests covering the OPTIONS response status and CORS headers, and verify the async params resolution is exercised before delegation to the handler.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/api/apify/runs/`[runId]/route.ts around lines 13 - 36, The route lacks
route-level tests for the OPTIONS preflight and the async GET params resolution;
add tests that call the exported OPTIONS function and assert it returns status
200 with headers from getCorsHeaders(), and add a test that invokes the exported
GET function with a mock NextRequest and a promise-based params object
(resolving to { runId }) to ensure the async params are awaited and that GET
delegates to getScraperResultsHandler with the resolved runId; reference the
exported functions OPTIONS and GET and the handler getScraperResultsHandler (and
helper getCorsHeaders) when locating the code to test.

Comment thread lib/apify/getActorStatus.ts Outdated
Comment on lines +20 to +25
export async function getActorStatus(runId: string) {
const run = await apifyClient.run(runId).get();
return {
status: run?.status ?? "UNKNOWN",
dataset_id: run?.defaultDatasetId ?? null,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Apify JavaScript client run(runId).get() behavior when the run ID does not exist: does it return undefined/null or throw?

💡 Result:

The Apify JavaScript client method run(runId).get returns undefined when the run ID does not exist, rather than throwing an error.

Citations:


🏁 Script executed:

# Check if there are other related files that handle errors from apifyClient calls
rg "apifyClient" --type ts --type tsx -B 2 -A 2 | head -50

Repository: recoupable/api

Length of output: 85


🏁 Script executed:

# Search for how apifyClient is used and error handling patterns
rg "apifyClient" -A 3 -B 1

Repository: recoupable/api

Length of output: 8389


🏁 Script executed:

# Check the test file to see expected behavior
cat -n lib/apify/__tests__/getActorStatus.test.ts

Repository: recoupable/api

Length of output: 1494


🏁 Script executed:

# Check the current implementation of getActorStatus.ts
cat -n lib/apify/getActorStatus.ts

Repository: recoupable/api

Length of output: 1339


Align with documented error-propagation design: throw on missing Apify runs.

Lines 23-24 return "UNKNOWN" for undefined runs, contradicting the JSDoc (lines 8-14) which explicitly states errors should propagate so handlers respond 500 cleanly. The SDK returns undefined for nonexistent run IDs; silently converting that to "UNKNOWN" masks missing runs as valid states, defeating the design goal of distinguishing real statuses from upstream outages.

Add a guard to throw when the run is missing, and remove the fallback:

Proposed fix
 export async function getActorStatus(runId: string) {
   const run = await apifyClient.run(runId).get();
+  if (!run) {
+    throw new Error(`Apify run not found: ${runId}`);
+  }
   return {
-    status: run?.status ?? "UNKNOWN",
-    dataset_id: run?.defaultDatasetId ?? null,
+    status: run.status,
+    dataset_id: run.defaultDatasetId ?? null,
   };
 }

Also add a test case for the undefined run scenario to prevent regression.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function getActorStatus(runId: string) {
const run = await apifyClient.run(runId).get();
return {
status: run?.status ?? "UNKNOWN",
dataset_id: run?.defaultDatasetId ?? null,
};
export async function getActorStatus(runId: string) {
const run = await apifyClient.run(runId).get();
if (!run) {
throw new Error(`Apify run not found: ${runId}`);
}
return {
status: run.status,
dataset_id: run?.defaultDatasetId ?? null,
};
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/apify/getActorStatus.ts` around lines 20 - 25, The getActorStatus
function silently returns a fallback "UNKNOWN" and null dataset when
apifyClient.run(...).get() returns undefined; update getActorStatus to guard for
a missing run (the local variable run from apifyClient.run(runId).get()) and
throw an Error (or propagate a descriptive error) instead of returning a default
status, removing the ?? "UNKNOWN" and ?? null fallbacks; update callers/tests by
adding a unit/integration test that simulates apifyClient.run(...).get()
returning undefined and asserts that getActorStatus throws so this regression
cannot recur.

Comment thread lib/apify/getDataset.ts Outdated
Comment on lines +12 to +15
export async function getDataset(datasetId: string): Promise<unknown[] | null> {
const result = await apifyClient.dataset(datasetId).listItems();
if (!result) return null;
return result.items as unknown[];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Apify JavaScript client DatasetClient listItems default limit pagination result total offset limit

💡 Result:

The Apify JavaScript API client DatasetClient.listItems method lists items from a dataset using offset-based pagination. It returns a PaginatedList (or PaginatedIterator) object with these key properties: - items: Array of dataset items - total: Total number of items available in the dataset - offset: Starting position of the current page (default: 0) - count: Number of items returned in this response - limit: Number of items requested per page Parameters (from DatasetClientListItemOptions): - offset (number, optional): Position of first returned entry (default: 0) - limit (number, optional): Maximum number of entries requested. Default/maximum not explicitly stated in docs but API endpoint limits to 1000. - chunkSize (number, optional): For async iteration, max items per API call (respects API limits) - desc (boolean): Sort in descending order - Other filters: fields, clean, skipEmpty, etc. Examples: const { items, total } = await datasetClient.listItems({ limit: 100 }); const { items } = await datasetClient.listItems({ offset: 100, limit: 50, desc: true }); For full iteration: for await (const item of datasetClient.listItems({ limit: 1000 })) { // process item } Default limit is unspecified but practically 1000 max per call per API docs. Use offset/limit for pagination, total for knowing when to stop (offset + count >= total).

Citations:


🏁 Script executed:

# Find and check package.json for Apify SDK version
fd -e package.json | head -5 | xargs -I {} grep -l "apify" {} 2>/dev/null || grep -r "apify" package.json 2>/dev/null | head -10

Repository: recoupable/api

Length of output: 40


🏁 Script executed:

# Also check if package-lock.json exists for exact version
git ls-files | grep -E "package(-lock)?\.json" | head -3

Repository: recoupable/api

Length of output: 70


🏁 Script executed:

# Search for usage of getDataset throughout the codebase
rg "getDataset" --type ts --type tsx -B 2 -A 2

Repository: recoupable/api

Length of output: 85


🏁 Script executed:

# Check if there are other pagination patterns in lib/apify/
fd . lib/apify/ -e ts -e tsx | head -10 | xargs -I {} sh -c 'echo "=== {} ===" && cat -n {}'

Repository: recoupable/api

Length of output: 16869


🏁 Script executed:

cat package.json | grep -A 2 -B 2 "apify"

Repository: recoupable/api

Length of output: 204


Add pagination loop to fetch all dataset items beyond the 1000-item default limit.

The Apify SDK's listItems() defaults to limit 1000, returning only the first page. For datasets with >1000 items, this silently truncates results without error. The response includes a total field indicating available items—checking this reveals the truncation risk.

Implement pagination to accumulate all items across pages:

♻️ Pagination fix
 export async function getDataset(datasetId: string): Promise<unknown[] | null> {
-  const result = await apifyClient.dataset(datasetId).listItems();
-  if (!result) return null;
-  return result.items as unknown[];
+  const items: unknown[] = [];
+  const limit = 1000;
+  let offset = 0;
+
+  while (true) {
+    const result = await apifyClient.dataset(datasetId).listItems({ offset, limit });
+    if (!result) return null;
+
+    items.push(...(result.items as unknown[]));
+
+    offset += result.items.length;
+    if (result.items.length === 0 || offset >= result.total) break;
+  }
+
+  return items;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/apify/getDataset.ts` around lines 12 - 15, getDataset currently calls
apifyClient.dataset(datasetId).listItems() once and returns only the first page
(default 1000 items); change getDataset to page through results by calling
listItems repeatedly with a page limit (e.g., 1000) and an increasing offset (or
using the API's pagination token) until you've collected result.total items (or
a page returns no items). Accumulate items into an array and return the full
array (or null if initial call fails); update references to
apifyClient.dataset(...).listItems and the getDataset function to implement this
loop and respect result.total and per-page result.items.

Comment thread lib/apify/getScraperResultsHandler.ts Outdated
Comment on lines +64 to +65
} catch (error) {
console.error("[ERROR] getScraperResultsHandler error:", error);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid logging raw caught errors.

Line 65 logs the full thrown value. Upstream/auth/client errors can carry headers, tokens, request config, or other sensitive metadata. Log sanitized fields instead.

🛡️ Proposed sanitized logging
   } catch (error) {
-    console.error("[ERROR] getScraperResultsHandler error:", error);
+    const message = error instanceof Error ? error.message : String(error);
+    console.error("[ERROR] getScraperResultsHandler error:", { message });
     return NextResponse.json(
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} catch (error) {
console.error("[ERROR] getScraperResultsHandler error:", error);
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
console.error("[ERROR] getScraperResultsHandler error:", { message });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/apify/getScraperResultsHandler.ts` around lines 64 - 65, The catch block
inside getScraperResultsHandler currently logs the raw caught value (error);
change it to log a sanitized representation instead: extract and log only safe
fields such as error.name, error.message, and a truncated error.stack (or omit
stack in production), and if the error looks like an HTTP/axios error (presence
of config/headers/request/response), remove or redact sensitive subfields
(headers, authorization tokens, cookies, and full request config) before
logging; ensure the symbol getScraperResultsHandler's catch uses this sanitized
object and avoid logging the original error variable directly.

Comment thread lib/apify/validateGetScraperResultsRequest.ts Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 9 files

Confidence score: 3/5

  • There is concrete API behavior risk in lib/apify/getActorStatus.ts: missing Apify runs can fall through to UNKNOWN and return HTTP 200 for nonexistent runIds, which can mislead clients and mask errors.
  • lib/apify/getScraperResultsHandler.ts is flagged for missing rate-limiting on a scraping request path, creating operational/abuse risk and inconsistency with the project’s API rules.
  • The app/api/apify/runs/[runId]/route.ts multi-export finding may be partly convention/framework-driven, but with severity 7/10 items present, this sits in a moderate-risk range rather than a clearly safe merge.
  • Pay close attention to lib/apify/getActorStatus.ts, lib/apify/getScraperResultsHandler.ts, app/api/apify/runs/[runId]/route.ts - correct status handling for missing runs, enforce rate limiting, and validate route export conventions.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="app/api/apify/runs/[runId]/route.ts">

<violation number="1" location="app/api/apify/runs/[runId]/route.ts:13">
P1: Custom agent: **Module should export a single primary function whose name matches the filename**

Module violates single-primary-export rule by exporting multiple top-level functions (`OPTIONS`, `GET`) and none matches filename basename `route`.</violation>
</file>

<file name="lib/apify/getScraperResultsHandler.ts">

<violation number="1" location="lib/apify/getScraperResultsHandler.ts:32">
P1: Custom agent: **API Design Consistency and Maintainability**

Scraping results endpoint is missing rate-limiting enforcement in its request path, violating the rule requiring rate limiting for scraping APIs.</violation>
</file>

<file name="lib/apify/getActorStatus.ts">

<violation number="1" location="lib/apify/getActorStatus.ts:21">
P1: Handle missing Apify runs explicitly instead of defaulting to `UNKNOWN`; otherwise nonexistent `runId`s are returned as HTTP 200.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant Client
    participant API as "mono/api (Next.js)"
    participant Auth as "Auth Service"
    participant Apify as "Apify SDK/API"

    Note over Client,Apify: NEW: migrated route GET /api/apify/runs/{runId}

    Client->>API: GET /api/apify/runs/{runId}
    
    API->>API: Validate runId (Zod)
    
    API->>Auth: validateAuthContext(request)
    alt Auth Failed
        Auth-->>Client: 401 Unauthorized
    end
    Auth-->>API: AuthContext (Account/Org)

    Note over API,Apify: Interaction via Apify SDK (CHANGED from raw fetch)
    
    API->>Apify: getActorStatus(runId)
    Apify-->>API: { status, defaultDatasetId }

    alt status == "SUCCEEDED"
        opt has dataset_id
            API->>Apify: getDataset(dataset_id)
            Apify-->>API: { items }
        end
        
        alt dataset found
            API-->>Client: 200 OK { status, dataset_id, data: items }
        else dataset missing/null
            API-->>Client: 500 Internal Server Error
        end

    else status == "RUNNING" | "READY"
        API-->>Client: 200 OK { status, dataset_id }

    else status == "FAILED" | "ABORTED"
        API-->>Client: CHANGED: 500 Internal Server Error { status, dataset_id }

    else SDK/Network Error
        API-->>Client: CHANGED: 500 Internal Server Error (No longer masks as RUNNING)
    end
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

*
* @returns A NextResponse with CORS headers.
*/
export async function OPTIONS() {
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Custom agent: Module should export a single primary function whose name matches the filename

Module violates single-primary-export rule by exporting multiple top-level functions (OPTIONS, GET) and none matches filename basename route.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/api/apify/runs/[runId]/route.ts, line 13:

<comment>Module violates single-primary-export rule by exporting multiple top-level functions (`OPTIONS`, `GET`) and none matches filename basename `route`.</comment>

<file context>
@@ -0,0 +1,37 @@
+ *
+ * @returns A NextResponse with CORS headers.
+ */
+export async function OPTIONS() {
+  return new NextResponse(null, {
+    status: 200,
</file context>
Fix with Cubic

runId: string,
): Promise<NextResponse> {
try {
const validated = await validateGetScraperResultsRequest(request, runId);
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Custom agent: API Design Consistency and Maintainability

Scraping results endpoint is missing rate-limiting enforcement in its request path, violating the rule requiring rate limiting for scraping APIs.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lib/apify/getScraperResultsHandler.ts, line 32:

<comment>Scraping results endpoint is missing rate-limiting enforcement in its request path, violating the rule requiring rate limiting for scraping APIs.</comment>

<file context>
@@ -0,0 +1,73 @@
+  runId: string,
+): Promise<NextResponse> {
+  try {
+    const validated = await validateGetScraperResultsRequest(request, runId);
+    if (validated instanceof NextResponse) {
+      return validated;
</file context>
Fix with Cubic

Comment thread lib/apify/getActorStatus.ts Outdated
@arpitgupta1214 arpitgupta1214 changed the title feat: migrate GET /api/apify/scraper to GET /api/apify/runs/{runId} feat(api): migrate GET /api/apify/runs/{runId} Apr 22, 2026
- Drop lib/apify/getActorStatus.ts and lib/apify/getDataset.ts helpers;
  call the SDK directly from the handler.
- Flatten getScraperResultsHandler branching into a single success path
  plus a shared status-code fallback.
- Declare getScraperResultsParamsSchema as a z.object directly.
- Trim jsdoc across the PR.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 8 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="lib/apify/getScraperResultsHandler.ts">

<violation number="1" location="lib/apify/getScraperResultsHandler.ts:22">
P2: Handle `undefined` from `apifyClient.run(...).get()` explicitly; otherwise nonexistent runs return 200 with `status: "UNKNOWN"` and mask an error condition.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment on lines +22 to +23
const status = run?.status ?? "UNKNOWN";
const dataset_id = run?.defaultDatasetId ?? null;
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Handle undefined from apifyClient.run(...).get() explicitly; otherwise nonexistent runs return 200 with status: "UNKNOWN" and mask an error condition.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lib/apify/getScraperResultsHandler.ts, line 22:

<comment>Handle `undefined` from `apifyClient.run(...).get()` explicitly; otherwise nonexistent runs return 200 with `status: "UNKNOWN"` and mask an error condition.</comment>

<file context>
@@ -1,72 +1,39 @@
 
-    const { status, dataset_id } = await getActorStatus(validated.runId);
+    const run = await apifyClient.run(validated.runId).get();
+    const status = run?.status ?? "UNKNOWN";
+    const dataset_id = run?.defaultDatasetId ?? null;
 
</file context>
Suggested change
const status = run?.status ?? "UNKNOWN";
const dataset_id = run?.defaultDatasetId ?? null;
if (!run) {
throw new Error("Apify run not found");
}
const status = run.status;
const dataset_id = run.defaultDatasetId ?? null;
Fix with Cubic

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Requires human review: Auto-approval blocked by 3 unresolved issues from previous reviews.

- Switch validator from validateAuthContext to validateAdminAuth so only
  admin accounts can poll Apify run status.
- Run auth before the runId schema check — an unauthenticated request
  should never reveal param-level errors.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Requires human review: Auto-approval blocked by 3 unresolved issues from previous reviews.

Comment thread lib/apify/validateGetScraperResultsRequest.ts
@sweetmantech
Copy link
Copy Markdown
Contributor

Preview smoke test

Against preview https://api-git-feat-migrate-apify-scraper-recoupable-ad724970.vercel.app at commit 27c470d1 (synced with test).

Results

# Case Expected Got
1 No auth header 401 ✅ 401 "Exactly one of x-api-key or Authorization must be provided"
2 Admin auth + nonexistent runId 200 UNKNOWN ✅ 200 {"status":"UNKNOWN","dataset_id":null}
3 Legacy GET /api/apify/scraper?runId=… 404 (clean cut) ✅ 404 — app/api/apify/scraper/route.ts no longer exists
4 Admin auth + real SUCCEEDED runId 200 + populated data ✅ 200 with full Apify dataset payload

End-to-end chain exercised

POST /api/socials/{socialId}/scrape        → returns {runId, datasetId}
GET  /api/apify/runs/{runId}               → returns {status, dataset_id, data}

Triggered scrape on PinkPantheress's Instagram social profile (social_id=02061320-978a-4394-a2c1-6062272683a8):

// POST /api/socials/02061320-978a-4394-a2c1-6062272683a8/scrape → 200
{ "runId": "VpqICClParjRjKNCf", "datasetId": "c4t9gsY5fAWbX0GNu" }

Polling the new endpoint returned SUCCEEDED on the first call with populated data:

// GET /api/apify/runs/VpqICClParjRjKNCf → 200
{
  "status": "SUCCEEDED",
  "dataset_id": "c4t9gsY5fAWbX0GNu",
  "data": [
    {
      "inputUrl": "https://www.instagram.com/pinkpantheress",
      "id": "39559476848",
      "username": "pinkpantheress",
      "fullName": "🫀",
      "biography": "",
      "externalUrls": [
        { "title": "VISIT MY STORE 💋🤭🤓❤️", "url": "..." }
      ]
      // ...rest of the Instagram profile scrape
    }
  ]
}

Findings

  • ✅ Admin-only auth gate works (validateAdminAuth) — no auth → 401; my admin key → passes the check.
  • ✅ Handler's branching matches design: SUCCEEDED with items → 200 + data; nonexistent run → 200 with UNKNOWN (not in FAILED/ABORTED/SUCCEEDED set, correctly non-terminal).
  • ✅ Legacy /api/apify/scraper fully removed — migration is a clean cut rather than dual-path.
  • ✅ Response shape ({status, dataset_id, data?}) matches JSDoc on getScraperResultsHandler and handler logic.

Not directly exercised (no fixtures available)

  • 403 for a non-admin caller — I only have an admin key.
  • 500 for FAILED/ABORTED runs — would need a broken Apify run. Handler logic covers it (status === "FAILED" || "ABORTED" || "SUCCEEDED" with no dataset → 500).

🤖 Tested with Claude Code

@sweetmantech sweetmantech merged commit 306340b into test Apr 23, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants