Skip to content

Conversation

@amhsirak
Copy link
Member

@amhsirak amhsirak commented Nov 28, 2025

closes : #898

Summary by CodeRabbit

  • New Features

    • Added browser fingerprinting support for improved automation resilience
    • Added pagination validation testing capability
  • Bug Fixes

    • Improved lifecycle/timeouts and cleanup for browsers and runs
    • Enhanced webhook notifications, socket recovery broadcasts, and run failure reporting
    • Added circuit-breaker and retry safeguards for DB, Airtable, and Google Sheets integrations
  • Chores

    • Tuned background job processing, concurrency and DB pooling for reliability

✏️ Tip: You can customize this high-level summary in your review settings.

@amhsirak amhsirak marked this pull request as draft November 28, 2025 10:34
@amhsirak amhsirak changed the title fix: db leaks fix: db leaks + socket cleanup Nov 28, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 28, 2025

Walkthrough

Adds fingerprinting libs and integrates enhanced fingerprint injection; hardens browser lifecycle with timeouts, retries, and memory cleanup; replaces direct PgBoss usage with a pgBossClient singleton and queued-task helpers; introduces bounded integration task queues, circuit breaker for queued runs, extensive socket cleanup, and multiple timeout guards.

Changes

Cohort / File(s) Change Summary
Package Dependencies
package.json
Added fingerprint-generator ^2.1.77, fingerprint-injector ^2.1.77, and fortawesome ^0.0.1-security.
RemoteBrowser & Pool
server/src/browser-management/classes/RemoteBrowser.ts, server/src/browser-management/classes/BrowserPool.ts
Added enhanced fingerprinting (imports + injector application), consolidated initialization with overall timeouts and retries, improved memory-management intervals, centralized socket listener removal, removed deprecated public helpers, and removed legacy reserveBrowserSlot wrapper.
Browser Controller
server/src/browser-management/controller.ts
Switched to atomic slot reservation for run-mode; added guarded initialization with retry/backoff and BROWSER_INIT_TIMEOUT; robust destroy pipeline with DESTROY_TIMEOUT and namespace cleanup.
Input Handlers & Socket Connection
server/src/browser-management/inputHandlers.ts, server/src/socket-connection/connection.ts
Added onTestPaginationScroll, removeInputHandlers; moved to named exports; connection disconnect now calls removeInputHandlers.
Run Execution / API
server/src/api/record.ts, server/src/workflow-management/scheduler/index.ts
Replaced in-place integration task mutations with addGoogleSheetUpdateTask/addAirtableUpdateTask; introduced generic withTimeout; enforced SCRAPE_TIMEOUT and INTERPRETATION_TIMEOUT; added webhook dispatches and run-started/run-completed socket events; changed readyForRunHandler to require a socket and added cleanupSocketConnection.
PgBoss & Workers
server/src/storage/pgboss.ts, server/src/pgboss-worker.ts, server/src/schedule-worker.ts
Introduced pgBossClient singleton with startPgBossClient/stopPgBossClient; refactored worker registration and exports (pgBoss not exported); added withTimeout helper and applied tighter timeouts for browser init/page ops/scrape/interpretation; removed schedule helpers from schedule-worker.
Routes / Storage
server/src/routes/record.ts, server/src/routes/storage.ts
Switched queue interactions to pgBossClient; added processQueuedRuns export; implemented circuit breaker for queued-run processing (error counter, cooldown/open-until behavior).
Integration Task Queues
server/src/workflow-management/integrations/airtable.ts, server/src/workflow-management/integrations/gsheet.ts
Added bounded queues (MAX_QUEUE_SIZE) with drop-oldest policy; added addAirtableUpdateTask and addGoogleSheetUpdateTask; added isProcessing guards and safer merge/typing changes (gsheet: scrapeSchema/scrapeListRecord<string, any[]>).
Generator & Interpreter lifecycle
server/src/workflow-management/classes/Generator.ts, server/src/workflow-management/classes/Interpreter.ts
Added pageCloseListeners and socket listener removal; new cleanup() in Generator; introduced persistence retry timer with MAX_PERSISTENCE_RETRIES and removePausingListeners() in Interpreter; ensured timers/listeners are cleared on cleanup.
Server & DB startup/shutdown
server/src/server.ts, server/src/storage/db.ts
Added Sequelize pool and dialectOptions (statement_timeout); reworked server startup to start pgBossClient, recover orphans, start workers; introduced recentRecoveries map; improved Socket.IO instantiation/options and managed serverIntervals; expanded graceful shutdown steps.
Scheduler storage helpers
server/src/storage/schedule.ts
New scheduling helpers: scheduleWorkflow and cancelScheduledWorkflow using pgBossClient.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

  • socket lifecycle and namespace cleanup (multiple cleanup paths across api/record.ts, controller.ts, server.ts)
  • coordinated timeouts (browser init, page ops, scrape, interpretation, integration) to avoid unintended races
  • pgBossClient singleton init/stop ordering and all call-site migrations
  • circuit breaker correctness and interaction with run recovery
  • bounded integration queues drop-oldest behavior and task loss implications
  • persistence retry timers and listener removal to prevent leaks

Possibly related PRs

Suggested labels

Type: Bug, Type: Infra, Scope: Infra

Poem

🐰 I scrubbed the sockets, tuned the clocks,
I lined up fingerprints in clever flocks.
Queues now bounded, timeouts in place,
I hop and tidy every namespace.
Bravo — a clean, well-timed codebase race!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.70% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: db leaks + socket cleanup' accurately summarizes the main changes: addressing database connection pooling/lifecycle issues and adding comprehensive socket cleanup mechanisms throughout the codebase.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch db-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@amhsirak amhsirak requested a review from RohitR311 November 28, 2025 10:36
Copy link
Collaborator

@RohitR311 RohitR311 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amhsirak Ready for review

@amhsirak amhsirak marked this pull request as ready for review November 30, 2025 13:15
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
server/src/workflow-management/integrations/airtable.ts (1)

184-196: Leftover loop is unreachable code.

The loops at lines 185-196 will never execute because maxLength is already Math.max(schemaData.length, listData.length, screenshotData.length), so i < schemaData.length after starting at maxLength is always false.

This dead code should be removed:

-  // Push leftovers
-  for (let i = maxLength; i < schemaData.length; i++) {
-    allRecords.push({ Label: schemaData[i].Field, Value: schemaData[i].Value });
-  }
-  for (let i = maxLength; i < listData.length; i++) {
-    allRecords.push(listData[i]);
-  }
-  for (let i = maxLength; i < screenshotData.length; i++) {
-    allRecords.push({
-      Key: screenshotData[i].key,
-      Screenshot: screenshotData[i].url,
-    });
-  }
server/src/browser-management/controller.ts (1)

290-298: Replace switchOff() with interpreter.stopInterpretation() to stop only the workflow execution.

Calling browserSession?.switchOff() closes the entire browser session including intervals and DOM streaming, whereas stopRunningInterpretation should only halt the current workflow execution. Use browserSession?.interpreter.stopInterpretation() instead to properly stop the interpretation workflow without terminating the browser session.

server/src/workflow-management/scheduler/index.ts (2)

116-140: Fire-and-forget pattern for integration updates may mask failures.

The withTimeout calls on lines 132-136 use .catch() to handle errors, but the function doesn't await these promises. This is intentional fire-and-forget, but if a task is added and the process immediately fails or restarts, the update may never complete. This is acceptable given the retry logic in the processors, but worth noting.

Also, retries: 5 is used as the initial value, but looking at the relevant snippets, the processors check task.retries < MAX_RETRIES and increment on failure. Setting initial retries to 5 means the task will already be at max retries from the start.

     addGoogleSheetUpdateTask(runId, {
       robotId: robotMetaId,
       runId: runId,
       status: 'pending',
-      retries: 5,
+      retries: 0,
     });

     addAirtableUpdateTask(runId, {
       robotId: robotMetaId,
       runId: runId,
       status: 'pending',
-      retries: 5,
+      retries: 0,
     });

649-652: resetRecordingState has no effect.

This function reassigns local parameters rather than any external state. The assignments browserId = '' and id = '' modify only the local copies of these values, not any external variables.

If this function is intended to reset some global or module-level state, the implementation needs to be changed. If it's dead code, consider removing it.

server/src/pgboss-worker.ts (1)

98-122: Same retries: 5 issue as in scheduler/index.ts.

The initial retries value of 5 will cause tasks to immediately exceed the retry limit.

Apply the same fix as in scheduler/index.ts:

     addGoogleSheetUpdateTask(runId, {
       robotId: robotMetaId,
       runId: runId,
       status: 'pending',
-      retries: 5,
+      retries: 0,
     });

     addAirtableUpdateTask(runId, {
       robotId: robotMetaId,
       runId: runId,
       status: 'pending',
-      retries: 5,
+      retries: 0,
     });
server/src/api/record.ts (1)

620-623: resetRecordingState has no effect here either.

Same issue as in scheduler/index.ts - this function only modifies local parameter copies.

🧹 Nitpick comments (15)
server/src/workflow-management/integrations/gsheet.ts (2)

25-38: Bounded queue implementation has potential ordering issues.

The queue eviction uses Object.keys()[0] to find the "oldest" task, but object key ordering is only guaranteed for string keys in modern JS engines and may not reflect true insertion order in all scenarios.

Consider using a Map which guarantees insertion order, or track timestamps explicitly:

-export let googleSheetUpdateTasks: { [runId: string]: GoogleSheetUpdateTask } = {};
+export let googleSheetUpdateTasks: Map<string, GoogleSheetUpdateTask> = new Map();

Then update the eviction logic accordingly:

-  const oldestKey = Object.keys(googleSheetUpdateTasks)[0];
-  if (oldestKey) {
-    delete googleSheetUpdateTasks[oldestKey];
-  }
+  const oldestKey = googleSheetUpdateTasks.keys().next().value;
+  if (oldestKey) {
+    googleSheetUpdateTasks.delete(oldestKey);
+  }

312-364: Concurrency guard and timeout are well-implemented.

The isProcessingGoogleSheets flag with a finally block ensures proper cleanup. The 60-second timeout prevents indefinite processing.

However, note the inconsistent indentation within the while loop body (lines 325-359) - the inner code appears to have mixed indentation levels which affects readability.

Consider aligning the indentation inside the while loop for consistency.

server/src/browser-management/classes/RemoteBrowser.ts (1)

420-440: Redundant nested try-catch blocks.

The inner try-catch (lines 425-435) handles all fingerprint injection errors, making the outer try-catch (lines 424, 436-439) redundant since it can never catch anything.

  private async applyEnhancedFingerprinting(context: BrowserContext): Promise<void> {
    try {
-      try {
-        const fingerprintGenerator = new FingerprintGenerator();
-        const fingerprint = fingerprintGenerator.getFingerprint();
-        const fingerprintInjector = new FingerprintInjector();
-
-        await fingerprintInjector.attachFingerprintToPlaywright(context as any, fingerprint);
-
-        logger.info("Enhanced fingerprinting applied successfully");
-      } catch (fingerprintError: any) {
-        logger.warn(`Modern fingerprint injection failed: ${fingerprintError.message}. Using existing protection.`);
-      }
+      const fingerprintGenerator = new FingerprintGenerator();
+      const fingerprint = fingerprintGenerator.getFingerprint();
+      const fingerprintInjector = new FingerprintInjector();
+
+      await fingerprintInjector.attachFingerprintToPlaywright(context as any, fingerprint);
+
+      logger.info("Enhanced fingerprinting applied successfully");
    } catch (error: any) {
-      logger.error(`Enhanced fingerprinting failed: ${error.message}`);
-      // Don't throw - fallback to basic functionality
+      logger.warn(`Enhanced fingerprinting failed: ${error.message}. Using existing protection.`);
    }
  }
server/src/routes/record.ts (1)

82-84: Queue creation pattern is idempotent but potentially inefficient.

createQueue is called before every send operation. While this is safe (idempotent), it adds unnecessary overhead on every request.

Consider creating queues once during application startup instead of on every request:

+// At module level or in app initialization
+const QUEUES_INITIALIZED = pgBossClient.start().then(async () => {
+  await pgBossClient.createQueue('initialize-browser-recording');
+  await pgBossClient.createQueue('destroy-browser');
+  await pgBossClient.createQueue('interpret-workflow');
+  await pgBossClient.createQueue('stop-interpretation');
+});

Then in handlers, just call send directly after ensuring initialization is complete.

server/src/storage/pgboss.ts (1)

27-30: Consider adding connection timeout and retry options.

The PgBoss client is configured with only max: 3 connections. For production resilience, consider adding timeout and retry configurations to handle transient database connection issues.

 export const pgBossClient = new PgBoss({
   connectionString: pgBossConnectionString,
   max: 3,
+  connectionTimeoutSeconds: 30,
+  retryLimit: 3,
+  retryDelay: 1000,
 });
server/src/workflow-management/classes/Generator.ts (1)

907-908: Inconsistent logger API usage.

Lines 908 and 995 use logger.warn() and logger.error() directly, while the rest of the codebase uses logger.log('warn', ...) and logger.log('error', ...).

-      logger.warn(`Error removing Generator socket listeners: ${error.message}`);
+      logger.log('warn', `Error removing Generator socket listeners: ${error.message}`);
-      logger.error(`Error during Generator cleanup: ${error.message}`);
+      logger.log('error', `Error during Generator cleanup: ${error.message}`);

Also applies to: 994-995

server/src/storage/schedule.ts (1)

44-67: Consider returning the count of cancelled jobs for better visibility.

The function always returns true even when no matching jobs are found. This could mask issues where cancellation was expected but the job didn't exist.

-export async function cancelScheduledWorkflow(robotId: string) {
+export async function cancelScheduledWorkflow(robotId: string): Promise<number> {
   try {
     const jobs = await pgBossClient.getSchedules();

     const matchingJobs = jobs.filter((job: any) => {
       try {
         const data = job.data;
         return data && data.id === robotId;
       } catch {
         return false;
       }
     });

     for (const job of matchingJobs) {
       logger.log('info', `Cancelling scheduled job ${job.name} for robot ${robotId}`);
       await pgBossClient.unschedule(job.name);
     }

-    return true;
+    if (matchingJobs.length === 0) {
+      logger.log('debug', `No scheduled jobs found for robot ${robotId}`);
+    }
+    return matchingJobs.length;
   } catch (error: unknown) {
server/src/routes/storage.ts (1)

1151-1159: Circuit breaker may trip on non-database errors.

The error handler increments consecutiveDbErrors for any error in processQueuedRuns, including PgBoss queue errors or browser creation failures. Consider distinguishing database errors from other errors if you want the breaker to specifically protect against DB issues.

If intentional to protect against all errors (not just DB), consider renaming to clarify:

-// Circuit breaker for database connection issues
-let consecutiveDbErrors = 0;
+// Circuit breaker for processQueuedRuns failures
+let consecutiveErrors = 0;
server/src/browser-management/inputHandlers.ts (1)

865-1014: Consider extracting duplicated evaluateSelector function.

The evaluateSelector function is defined identically twice within onTestPaginationScroll (lines 906-932 and 958-983). This duplicates ~25 lines of code.

Extract to a shared function outside the page.evaluate calls:

+const evaluateSelectorScript = `
+  function evaluateSelector(sel) {
+    try {
+      const isXPath = sel.startsWith('//') || sel.startsWith('(//');
+      if (isXPath) {
+        const result = document.evaluate(sel, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
+        const elements = [];
+        for (let i = 0; i < result.snapshotLength; i++) {
+          const node = result.snapshotItem(i);
+          if (node && node.nodeType === Node.ELEMENT_NODE) {
+            elements.push(node);
+          }
+        }
+        return elements;
+      } else {
+        return Array.from(document.querySelectorAll(sel));
+      }
+    } catch (err) {
+      return [];
+    }
+  }
+`;

 const onTestPaginationScroll = async (...) => {
   // ...
-  const initialCount = await currentPage.evaluate((selector) => {
-    function evaluateSelector(sel: string): Element[] { ... }
-    return evaluateSelector(selector).length;
-  }, listSelector);
+  const initialCount = await currentPage.evaluate((selector) => {
+    ${evaluateSelectorScript}
+    return evaluateSelector(selector).length;
+  }, listSelector);
server/src/workflow-management/integrations/airtable.ts (1)

26-39: Object key order for "oldest" task may not be reliable.

Using Object.keys(airtableUpdateTasks)[0] to find the "oldest" task relies on object property insertion order. While modern Node.js maintains this order, using a Map would provide guaranteed ordering and clearer semantics.

Consider using a Map instead:

-export let airtableUpdateTasks: { [runId: string]: AirtableUpdateTask } = {};
+export let airtableUpdateTasks: Map<string, AirtableUpdateTask> = new Map();

 export function addAirtableUpdateTask(runId: string, task: AirtableUpdateTask): boolean {
-  const currentSize = Object.keys(airtableUpdateTasks).length;
+  const currentSize = airtableUpdateTasks.size;

   if (currentSize >= MAX_QUEUE_SIZE) {
     logger.log('warn', `Airtable task queue full (${currentSize}/${MAX_QUEUE_SIZE}), dropping oldest task`);
-    const oldestKey = Object.keys(airtableUpdateTasks)[0];
-    if (oldestKey) {
-      delete airtableUpdateTasks[oldestKey];
-    }
+    const oldestKey = airtableUpdateTasks.keys().next().value;
+    if (oldestKey) airtableUpdateTasks.delete(oldestKey);
   }

-  airtableUpdateTasks[runId] = task;
+  airtableUpdateTasks.set(runId, task);
   return true;
 }
server/src/workflow-management/scheduler/index.ts (1)

696-716: Comprehensive socket cleanup implementation.

The cleanupSocketConnection function properly:

  1. Removes all listeners from the client socket
  2. Disconnects the client socket
  3. Cleans up the server-side namespace (removes listeners, disconnects sockets, deletes from _nsps map)

One consideration: directly accessing (serverIo as any)._nsps is fragile as it relies on Socket.IO internals that could change between versions.

The internal _nsps map access is a known pattern for namespace cleanup but may break in future Socket.IO versions. Consider wrapping this in a more defensive check or documenting the Socket.IO version dependency.

server/src/pgboss-worker.ts (1)

89-96: Duplicate withTimeout utility across files.

This function is duplicated in scheduler/index.ts (line 107-114) and api/record.ts. Consider extracting to a shared utility module.

// server/src/utils/timeout.ts
export function withTimeout<T>(promise: Promise<T>, timeoutMs: number, operation: string): Promise<T> {
  return Promise.race([
    promise,
    new Promise<T>((_, reject) =>
      setTimeout(() => reject(new Error(`${operation} timed out after ${timeoutMs}ms`)), timeoutMs)
    )
  ]);
}
server/src/api/record.ts (2)

560-567: Third copy of withTimeout utility.

This is now duplicated in three files. Strongly recommend extracting to a shared module.


1145-1165: cleanupSocketConnection duplicated from scheduler.

This is identical to the implementation in scheduler/index.ts. Consider extracting to a shared module.

Extract to a shared utility:

// server/src/utils/socket-cleanup.ts
import { Socket } from "socket.io-client";
import { Server } from "socket.io";
import logger from "../logger";

export function cleanupSocketConnection(socket: Socket, serverIo: Server, browserId: string, id: string) {
  // ... implementation
}
server/src/server.ts (1)

401-419: Unhandled rejection and uncaught exception handlers.

These handlers log errors and exit in production. The delay before exit (1s for rejections, 5s for exceptions) allows logs to flush.

Note: In production, exiting on unhandled rejections is the correct approach, but you may want to consider a more graceful shutdown that attempts resource cleanup.

Consider calling the SIGINT handler logic instead of raw process.exit() to ensure proper cleanup:

process.on('unhandledRejection', (reason, promise) => {
  console.error('Unhandled promise rejection:', reason);
  if (process.env.NODE_ENV === 'production') {
    // Trigger graceful shutdown instead of immediate exit
    process.emit('SIGINT');
  }
});
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad8df66 and 926db0a.

📒 Files selected for processing (20)
  • package.json (1 hunks)
  • server/src/api/record.ts (10 hunks)
  • server/src/browser-management/classes/BrowserPool.ts (0 hunks)
  • server/src/browser-management/classes/RemoteBrowser.ts (17 hunks)
  • server/src/browser-management/controller.ts (6 hunks)
  • server/src/browser-management/inputHandlers.ts (2 hunks)
  • server/src/pgboss-worker.ts (13 hunks)
  • server/src/routes/record.ts (6 hunks)
  • server/src/routes/storage.ts (9 hunks)
  • server/src/schedule-worker.ts (3 hunks)
  • server/src/server.ts (8 hunks)
  • server/src/socket-connection/connection.ts (2 hunks)
  • server/src/storage/db.ts (1 hunks)
  • server/src/storage/pgboss.ts (1 hunks)
  • server/src/storage/schedule.ts (1 hunks)
  • server/src/workflow-management/classes/Generator.ts (3 hunks)
  • server/src/workflow-management/classes/Interpreter.ts (4 hunks)
  • server/src/workflow-management/integrations/airtable.ts (8 hunks)
  • server/src/workflow-management/integrations/gsheet.ts (5 hunks)
  • server/src/workflow-management/scheduler/index.ts (10 hunks)
💤 Files with no reviewable changes (1)
  • server/src/browser-management/classes/BrowserPool.ts
🧰 Additional context used
🧬 Code graph analysis (9)
server/src/storage/pgboss.ts (1)
server/src/db/config/database.js (1)
  • dotenv (1-1)
server/src/routes/record.ts (1)
server/src/storage/pgboss.ts (1)
  • pgBossClient (27-30)
server/src/storage/schedule.ts (1)
server/src/storage/pgboss.ts (1)
  • pgBossClient (27-30)
server/src/socket-connection/connection.ts (1)
server/src/browser-management/inputHandlers.ts (1)
  • removeInputHandlers (1086-1086)
server/src/routes/storage.ts (1)
server/src/storage/pgboss.ts (1)
  • pgBossClient (27-30)
server/src/pgboss-worker.ts (3)
server/src/workflow-management/integrations/gsheet.ts (2)
  • addGoogleSheetUpdateTask (25-38)
  • processGoogleSheetUpdates (312-365)
server/src/workflow-management/integrations/airtable.ts (1)
  • addAirtableUpdateTask (26-39)
server/src/markdownify/scrape.ts (2)
  • convertPageToMarkdown (8-57)
  • convertPageToHTML (63-111)
server/src/browser-management/inputHandlers.ts (2)
server/src/server.ts (1)
  • browserPool (99-99)
maxun-core/src/browserSide/scraper.js (2)
  • selector (28-28)
  • node (664-664)
server/src/workflow-management/scheduler/index.ts (7)
server/src/workflow-management/integrations/gsheet.ts (2)
  • addGoogleSheetUpdateTask (25-38)
  • processGoogleSheetUpdates (312-365)
server/src/workflow-management/integrations/airtable.ts (2)
  • addAirtableUpdateTask (26-39)
  • processAirtableUpdates (514-569)
server/src/markdownify/scrape.ts (2)
  • convertPageToMarkdown (8-57)
  • convertPageToHTML (63-111)
server/src/routes/index.ts (1)
  • workflow (11-11)
server/src/browser-management/controller.ts (1)
  • destroyRemoteBrowser (124-200)
maxun-core/src/interpret.ts (1)
  • run (1638-1674)
server/src/server.ts (1)
  • io (87-94)
server/src/browser-management/classes/RemoteBrowser.ts (1)
server/src/routes/proxy.ts (1)
  • getDecryptedProxyConfig (165-183)
🔇 Additional comments (50)
server/src/storage/db.ts (1)

20-29: Good connection pooling configuration for leak prevention.

The pool and timeout settings are well-configured:

  • max: 10 caps concurrent connections to prevent exhaustion
  • min: 0 allows the pool to release all idle connections
  • evict: 1000 aggressively cleans up stale connections
  • statement_timeout: 60000 prevents runaway queries

This aligns well with the PR objective of fixing DB leaks.

server/src/workflow-management/integrations/gsheet.ts (1)

240-250: Good use of .once() for token listener.

Switching from .on() to .once() prevents listener accumulation when tokens are refreshed, which helps address the socket/listener cleanup objectives of this PR.

server/src/browser-management/classes/RemoteBrowser.ts (5)

18-19: Fingerprinting imports added.

The imports for FingerprintInjector and FingerprintGenerator support the new browser fingerprinting functionality used in applyEnhancedFingerprinting.


460-613: Well-structured initialization with timeouts and cleanup.

The initialization flow now has:

  • Overall 120s timeout via Promise.race
  • Per-context creation 15s timeout
  • Retry logic (3 attempts) with proper browser cleanup between attempts
  • Fingerprinting applied to new contexts

This is a significant improvement for handling hung browser launches.


672-689: Good centralized socket listener cleanup.

The removeAllSocketListeners() method provides a single point for cleaning up socket listeners, called both before re-registration and during shutdown. This directly addresses the socket cleanup objectives of the PR.


896-1028: Comprehensive and robust shutdown sequence.

The switchOff method now:

  • Clears all timers/intervals
  • Removes all listeners (socket, page)
  • Cleans up generator and interpreter
  • Uses individual 5-second timeouts for CDP detach, page close, context close, and browser close
  • Each operation has its own try-catch with finally blocks to null references

This thorough cleanup addresses the DB leaks and socket cleanup objectives effectively.


285-292: Good defensive listener cleanup with closed-page check.

Checking !this.currentPage.isClosed() before calling removeAllListeners prevents errors when operating on already-closed pages. The try-catch ensures robustness.

server/src/routes/record.ts (2)

20-20: Migrated to centralized pgBossClient.

The import change from local pgBoss instance to the shared pgBossClient from storage/pgboss aligns with the PR's goal of centralizing database connection management.


29-61: Job polling implementation is functional.

The waitForJobCompletion function correctly handles job state transitions and timeout. The 200ms polling interval is reasonable for the expected job durations.

server/src/storage/pgboss.ts (1)

1-80: Good singleton pattern for centralizing PgBoss job submission.

The module correctly separates job submission from job processing (handled by pgboss-worker.ts), preventing duplicate connection pools. The idempotent start/stop pattern and environment variable validation are well implemented.

server/src/socket-connection/connection.ts (1)

20-24: Good socket cleanup implementation.

The disconnect handler now properly removes input handlers for sockets that had them registered, preventing memory leaks from accumulated event listeners.

server/src/workflow-management/classes/Generator.ts (1)

889-910: Socket listener cleanup is well-structured.

The explicit removal of specific event listeners prevents memory leaks when generators are cleaned up. The try-catch ensures cleanup continues even if individual removals fail.

server/src/routes/storage.ts (3)

1079-1083: Good circuit breaker implementation for resilience.

The circuit breaker pattern protects against cascading failures during database issues. The configuration (3 errors, 30s cooldown) provides reasonable protection while allowing recovery.


667-673: Correct migration to centralized pgBossClient.

Queue creation and job submission now use the shared pgBossClient singleton, preventing connection pool leaks that would occur with multiple PgBoss instances.


1090-1093: Good addition of ORDER BY for deterministic queue processing.

The order: [['startedAt', 'ASC']] ensures FIFO processing of queued runs, which is important for predictable behavior.

server/src/browser-management/inputHandlers.ts (1)

1028-1050: Good addition of socket listener cleanup to prevent memory leaks.

The removeInputHandlers function properly removes all registered listeners. This is essential for preventing memory leaks when sockets disconnect.

Also applies to: 1059-1084

server/src/workflow-management/integrations/airtable.ts (1)

514-568: Good concurrency guard and timeout implementation.

The isProcessingAirtable flag prevents duplicate processing, and the 60-second timeout with finally block ensures the flag is always reset. The bounded processing loop is a solid pattern.

server/src/schedule-worker.ts (1)

16-20: PgBoss configuration looks reasonable.

The max: 3 connection limit and expireInHours: 23 are sensible defaults. The 23-hour expiration prevents job queue buildup while allowing daily jobs to complete.

server/src/workflow-management/classes/Interpreter.ts (2)

387-410: Good cleanup of timers and listeners to prevent resource leaks.

The clearState method now properly:

  1. Clears persistenceRetryTimer (lines 387-390)
  2. Calls interpreter cleanup if available (lines 400-403)
  3. Removes pausing listeners (line 409)

This ensures no dangling timers or listeners remain after cleanup.


851-866: Good retry logic with proper timer cleanup.

The persistence retry now:

  1. Clears existing retry timer before scheduling new one (lines 857-859)
  2. Sets timer reference to null when it fires (line 862)
  3. Uses exponential backoff with max delay cap

This prevents timer accumulation on repeated failures.

server/src/browser-management/controller.ts (2)

124-200: Good timeout protection for browser destruction.

The DESTROY_TIMEOUT with Promise.race ensures destruction doesn't hang indefinitely. The fallback to force-delete on timeout is appropriate for resource cleanup.


42-77: Good error handling for browser initialization failures.

The try/catch properly:

  1. Notifies frontend via socket events
  2. Waits briefly for events to propagate
  3. Cleans up the failed session with switchOff()
  4. Returns early to prevent partial initialization
server/src/workflow-management/scheduler/index.ts (6)

107-114: Utility function looks correct.

The withTimeout helper cleanly races a promise against a timeout. Implementation is straightforward and reusable.


265-284: Timeout logic for scrape operations is well-implemented.

The per-format timeout using Promise.race provides good protection against hanging conversions. The 120-second timeout is reasonable given the 100-second networkidle timeout in the underlying convertPageToMarkdown/convertPageToHTML functions.


430-441: Interpretation timeout enforcement looks good.

The 600-second (10-minute) timeout for workflow interpretation provides a reasonable upper bound for long-running workflows while preventing indefinite hangs.


557-566: Good addition of browser cleanup in error handler.

The try-catch around interpreter state clearing and browser destruction ensures cleanup is attempted even if one step fails. This helps prevent resource leaks.


628-647: readyForRunHandler now properly cleans up socket in finally block.

The addition of the socket parameter and the finally block ensures socket cleanup happens regardless of success or failure. This is a solid improvement for resource management.


654-693: Socket lifecycle handling in handleRunRecording is improved.

Good additions:

  • CONNECTION_TIMEOUT of 30 seconds
  • connect_error handler that triggers cleanup
  • disconnect handler that triggers cleanup
  • Error path cleanup when socket exists

However, the socket is created but there's no timeout handling if the ready-for-run event never fires. The socket would remain open indefinitely until the browser emits the event or disconnects.

Consider adding a connection establishment timeout that triggers cleanup if ready-for-run isn't received within a reasonable timeframe.

server/src/pgboss-worker.ts (8)

59-63: PgBoss configuration with max: 5 limits concurrent connections.

The max: 5 setting limits the connection pool size for PgBoss. This is reasonable for controlling resource usage, but ensure this aligns with expected concurrent job processing needs.


127-129: Reduced browser timeouts may cause issues on slower systems.

BROWSER_INIT_TIMEOUT reduced to 30s and BROWSER_PAGE_TIMEOUT to 15s. The summary mentions these were reduced from 60s and 45s respectively. On systems under load or with slower network conditions, these tighter timeouts could cause premature failures.

Verify these timeout values are sufficient for your production environment. Consider making them configurable via environment variables for different deployment scenarios.


416-421: Good interpreter state cleanup on abort.

The try-catch around browser.interpreter.clearState() ensures the abort flow continues even if state clearing fails.


668-675: Good interpreter state cleanup on execution error.

This mirrors the abort cleanup pattern, ensuring resources are released even on failures.


846-847: Worker intervals properly tracked for cleanup.

The workerIntervals array and its use in registerRunExecutionWorker ensures intervals can be cleared during shutdown.

Also applies to: 920-927


956-963: Consistent interval tracking for abort queue checker.

Same pattern applied to the abort queue interval checker.


1068-1087: Graceful shutdown properly clears intervals.

Both SIGTERM and SIGINT handlers now clear the worker intervals before stopping PgBoss. This prevents orphaned interval callbacks after shutdown begins.


1089-1089: Export reduced to only startWorkers.

No external dependencies found on pgBoss, registerWorkerForQueue, or registerAbortWorkerForQueue. This reduction in export surface is safe.

server/src/api/record.ts (6)

595-617: readyForRunHandler properly cleans up socket in finally block.

Consistent with the scheduler implementation. Good.


705-723: Scrape timeout logic consistent with scheduler.

Same implementation pattern, good consistency.


879-890: Interpretation timeout consistent with scheduler.

Same 600-second timeout implementation.


894-896: Interpreter state cleared after successful interpretation.

Good addition to clean up interpreter state before destroying the browser.


1021-1030: Comprehensive cleanup in error handler.

The try-catch around interpreter state clearing and browser destruction is consistent with the pattern in other files.


1167-1189: waitForRunCompletion has reasonable 3-hour timeout.

The 180-minute maximum wait time prevents indefinite polling. The function also properly handles aborted/aborting states.

One consideration: the polling interval of 2000ms means up to 5400 database queries for a 3-hour wait. This is likely acceptable but worth noting for high-volume scenarios.

server/src/server.ts (8)

41-46: PostgreSQL pool configuration is reasonable.

max: 10, min: 0 with appropriate timeout settings. The allowExitOnIdle: true helps with graceful shutdown.


87-94: Socket.IO server initialized with comprehensive options.

Good configuration:

  • cleanupEmptyChildNamespaces: true helps prevent namespace leaks
  • pingTimeout: 60000 and pingInterval: 25000 are reasonable for detecting stale connections
  • maxHttpBufferSize: 1e8 (100MB) is generous but may be needed for large payloads
  • transports: ['websocket', 'polling'] with allowEIO3: true provides broad compatibility

101-102: recentRecoveries Map for storing recovery notifications.

This pattern allows recovery notifications to be stored and sent to users when they reconnect. Good for ensuring users are notified of recovered runs even if they weren't connected when recovery happened.


152-166: Server intervals properly tracked for cleanup.

Both processQueuedRuns and browserPoolCleanupInterval are tracked in serverIntervals for proper cleanup during shutdown.


154-160: Good error handling around processQueuedRuns.

The try-catch ensures a single failure doesn't stop the interval from running again.


176-181: Startup sequence is well-ordered.

The sequence: recover orphaned runs → start PgBoss client → start workers ensures the system is ready before processing new work.


191-198: Recovery notification emission on client connection is well-implemented.

Users connecting to /queued-run receive any stored recovery notifications, then the data is cleaned up to prevent duplicate emissions.


266-399: Comprehensive SIGINT shutdown handler.

The shutdown sequence is thorough:

  1. Wait 2 seconds for in-flight operations
  2. Preserve partial data from running browsers
  3. Clear server intervals
  4. Clean up all browsers
  5. Terminate worker processes (dev mode)
  6. Close Socket.IO server
  7. Close HTTP server
  8. Close PostgreSQL pool
  9. Stop PgBoss client
  10. Close Sequelize connection
  11. Exit with appropriate code

The shutdownSuccessful flag tracking is good for debugging.

Comment on lines +41 to +42
"fingerprint-generator": "^2.1.77",
"fingerprint-injector": "^2.1.77",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

fingerprint-generator npm latest version

💡 Result:

The npm package "fingerprint-generator" is published as version 2.1.72 on npmjs.com (published 4 days ago). [1] jsDelivr shows a later registry entry at 2.1.76. [2]

Sources:

  • npm package page: fingerprint-generator — version 2.1.72. [1]
  • jsDelivr package listing: fingerprint-generator — version 2.1.76. [2]

🌐 Web query:

fingerprint-injector npm latest version

💡 Result:

The latest published version on npm is 2.1.72 (published Nov 27, 2025). (npmjs.com)

Citations:


Version mismatch: fingerprinting dependencies specify non-existent versions.

The fingerprint-generator and fingerprint-injector packages are pinned to ^2.1.77, but this version does not exist on npm. The latest available versions are 2.1.72 (published Nov 27, 2025) and 2.1.76 (via jsDelivr). Update both dependencies to use actual published versions, such as ^2.1.72.

🤖 Prompt for AI Agents
In package.json around lines 41-42, the fingerprint-generator and
fingerprint-injector dependencies are pinned to a non-existent version ^2.1.77;
update both entries to a published version (for example ^2.1.72) or any other
valid released version available on npm/jsDelivr, save the file, and run npm
install (or yarn) to update lockfile and verify installs succeed.

Comment on lines 569 to 593
async function triggerIntegrationUpdates(runId: string, robotMetaId: string): Promise<void> {
try {
googleSheetUpdateTasks[runId] = {
addGoogleSheetUpdateTask(runId, {
robotId: robotMetaId,
runId: runId,
status: 'pending',
retries: 5,
};
});

airtableUpdateTasks[runId] = {
addAirtableUpdateTask(runId, {
robotId: robotMetaId,
runId: runId,
status: 'pending',
retries: 5,
};
});

processAirtableUpdates().catch(err => logger.log('error', `Airtable update error: ${err.message}`));
processGoogleSheetUpdates().catch(err => logger.log('error', `Google Sheets update error: ${err.message}`));
withTimeout(processAirtableUpdates(), 65000, 'Airtable update')
.catch(err => logger.log('error', `Airtable update error: ${err.message}`));

withTimeout(processGoogleSheetUpdates(), 65000, 'Google Sheets update')
.catch(err => logger.log('error', `Google Sheets update error: ${err.message}`));
} catch (err: any) {
logger.log('error', `Failed to update integrations for run: ${runId}: ${err.message}`);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same retries: 5 issue as in other files.

This is the third occurrence of the same bug.

     addGoogleSheetUpdateTask(runId, {
       robotId: robotMetaId,
       runId: runId,
       status: 'pending',
-      retries: 5,
+      retries: 0,
     });

     addAirtableUpdateTask(runId, {
       robotId: robotMetaId,
       runId: runId,
       status: 'pending',
-      retries: 5,
+      retries: 0,
     });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async function triggerIntegrationUpdates(runId: string, robotMetaId: string): Promise<void> {
try {
googleSheetUpdateTasks[runId] = {
addGoogleSheetUpdateTask(runId, {
robotId: robotMetaId,
runId: runId,
status: 'pending',
retries: 5,
};
});
airtableUpdateTasks[runId] = {
addAirtableUpdateTask(runId, {
robotId: robotMetaId,
runId: runId,
status: 'pending',
retries: 5,
};
});
processAirtableUpdates().catch(err => logger.log('error', `Airtable update error: ${err.message}`));
processGoogleSheetUpdates().catch(err => logger.log('error', `Google Sheets update error: ${err.message}`));
withTimeout(processAirtableUpdates(), 65000, 'Airtable update')
.catch(err => logger.log('error', `Airtable update error: ${err.message}`));
withTimeout(processGoogleSheetUpdates(), 65000, 'Google Sheets update')
.catch(err => logger.log('error', `Google Sheets update error: ${err.message}`));
} catch (err: any) {
logger.log('error', `Failed to update integrations for run: ${runId}: ${err.message}`);
}
}
async function triggerIntegrationUpdates(runId: string, robotMetaId: string): Promise<void> {
try {
addGoogleSheetUpdateTask(runId, {
robotId: robotMetaId,
runId: runId,
status: 'pending',
retries: 0,
});
addAirtableUpdateTask(runId, {
robotId: robotMetaId,
runId: runId,
status: 'pending',
retries: 0,
});
withTimeout(processAirtableUpdates(), 65000, 'Airtable update')
.catch(err => logger.log('error', `Airtable update error: ${err.message}`));
withTimeout(processGoogleSheetUpdates(), 65000, 'Google Sheets update')
.catch(err => logger.log('error', `Google Sheets update error: ${err.message}`));
} catch (err: any) {
logger.log('error', `Failed to update integrations for run: ${runId}: ${err.message}`);
}
}

Comment on lines 157 to 178
private initializeMemoryManagement(): void {
setInterval(() => {
const memoryUsage = process.memoryUsage();
const heapUsageRatio = memoryUsage.heapUsed / MEMORY_CONFIG.maxHeapSize;

if (heapUsageRatio > MEMORY_CONFIG.heapUsageThreshold * 1.2) {
logger.warn('Critical memory pressure detected, triggering emergency cleanup');
this.performMemoryCleanup();
} else if (heapUsageRatio > MEMORY_CONFIG.heapUsageThreshold) {
logger.warn('High memory usage detected, triggering cleanup');

if (this.screenshotQueue.length > 0) {
this.screenshotQueue = [];
logger.info('Screenshot queue cleared due to memory pressure');
}

if (global.gc && heapUsageRatio > MEMORY_CONFIG.heapUsageThreshold * 1.1) {
global.gc();
}
}

if (this.screenshotQueue.length > SCREENCAST_CONFIG.maxQueueSize) {
this.screenshotQueue = this.screenshotQueue.slice(-SCREENCAST_CONFIG.maxQueueSize);
}
}, MEMORY_CONFIG.gcInterval);
this.memoryManagementInterval = setInterval(() => {
const memoryUsage = process.memoryUsage();
const heapUsageRatio = memoryUsage.heapUsed / MEMORY_CONFIG.maxHeapSize;

if (heapUsageRatio > MEMORY_CONFIG.heapUsageThreshold * 1.2) {
logger.warn(
"Critical memory pressure detected, triggering emergency cleanup"
);
this.performMemoryCleanup();
} else if (heapUsageRatio > MEMORY_CONFIG.heapUsageThreshold) {
logger.warn("High memory usage detected, triggering cleanup");

if (
global.gc &&
heapUsageRatio > MEMORY_CONFIG.heapUsageThreshold * 1.1
) {
global.gc();
}
}
}, MEMORY_CONFIG.gcInterval);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

initializeMemoryManagement() is never called.

The method sets up a memory management interval but is never invoked during browser initialization or elsewhere in this class. The interval will never run, making this dead code.

Either call this method during initialization (e.g., at the end of initialize()), or remove it if memory management is not yet needed:

// At the end of the initialize method, after success = true:
+              this.initializeMemoryManagement();
               success = true;

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In server/src/browser-management/classes/RemoteBrowser.ts around lines 157 to
178, initializeMemoryManagement() is defined but never called so its interval
never runs; to fix, invoke this.initializeMemoryManagement() at the end of the
class initialization flow (for example at the end of the initialize() method) so
the interval is started when the browser is created, or if memory management is
not desired remove the method and associated MEMORY_CONFIG usage and tests;
ensure any call is guarded so it only runs once and that teardown/clearInterval
is handled in the browser shutdown path.

Comment on lines +155 to +164
const nsps = (io as any)._nsps;
if (nsps && nsps.has(`/${id}`)) {
const ns = nsps.get(`/${id}`);
if (ns && ns.sockets && ns.sockets.size === 0) {
nsps.delete(`/${id}`);
logger.log('debug', `Deleted empty namespace /${id} from io._nsps Map`);
} else {
logger.log('warn', `Namespace /${id} still has ${ns?.sockets?.size || 0} sockets, skipping manual deletion`);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Socket.IO v4 dynamic namespace cleanup best practices public API

💡 Result:

Short answer — use the built‑in public APIs introduced in v4 (and v4.6.0) rather than poking internal fields.

Recommended best practices (with public APIs)

  • Enable automatic cleanup for dynamic child namespaces (v4.6.0): new Server option cleanupEmptyChildNamespaces: true — Socket.IO will remove a child namespace and close its adapter when its last socket disconnects. [1][2]
    • Example: const io = new Server(httpServer, { cleanupEmptyChildNamespaces: true });. [1][2]
  • Track and attach logic to namespace creation with the public "new_namespace" event so you can hook lifecycle/cleanup logic centrally. Example: io.on("new_namespace", nsp => { /* attach middleware, metrics, cleanup hooks */ });. [3]
    • Use nsp.sockets (Map) to check number of connected sockets when you need to decide to clean up. nsp.sockets.size is public. [3]
  • If you must perform manual cleanup (when not using the automatic option):
    1. Disconnect remaining sockets via socket.disconnect() or io.in(nspName).disconnectSockets() (public API). [4]
    2. Remove event listeners from the namespace (nsp.removeAllListeners()) so handlers don’t keep references. (Namespace is an EventEmitter.)
    3. Close the namespace adapter if your adapter exposes a close() method (adapters added support for close() to cooperate with cleanupEmptyChildNamespaces). [5]
      Note: avoid touching internal/private server maps (e.g., io.nsps / io._nsps) — that’s not a public API and may break across versions. [6]
  • If you use a pub/sub adapter (Redis, etc.), prefer adapter options tuned for dynamic namespaces (e.g., redis adapter subscriptionMode "static") and rely on adapter close() to free resources when namespaces are removed. [5]
  • Monitor for leaks: when creating many dynamic namespaces, instrument new_namespace, adapter create/delete events and namespace.sockets size to ensure namespaces are removed as expected. Adapter emits create-room/delete-room/join-room/leave-room events you can listen to. [5]

Primary references

  • v4.6.0 changelog + cleanupEmptyChildNamespaces addition. [1]
  • Server options / cleanupEmptyChildNamespaces docs. [2]
  • Namespaces (dynamic namespaces, nsp.sockets) and io.of() docs. [3]
  • Server API (new_namespace event; disconnectSockets). [4]
  • Adapter docs (adapter events and adapter close support for cleanup). [5]
  • Community/manual method (older guidance showing disconnect + remove listeners + delete namespace — not public API; use only as last resort). [6]

If you want, I can paste minimal example code that demonstrates (a) enabling cleanupEmptyChildNamespaces and (b) a safe manual cleanup routine using only public APIs.


🌐 Web query:

Socket.IO v4 _nsps internal API alternatives remove namespace

💡 Result:

Short answer: don't use the internal io._nsps. Use the public APIs + the v4.6+ cleanup option.

Options

  • If you want to disconnect all clients and effectively remove a namespace:

    1. Disconnect sockets in that namespace: await io.of('/my-nsp').disconnectSockets(true);
    2. Enable automatic cleanup so empty dynamic child namespaces are removed: const io = new Server(httpServer, { cleanupEmptyChildNamespaces: true }); (added in v4.6.0). [1][2]
  • If you only need to stop a namespace from accepting new connections, keep a reference to the Namespace (io.of('/nsp')) and remove/replace your connection handlers; then disconnect existing sockets as above. [2]

Notes

  • io._nsps is an internal/private map (not part of the public API) and may change; avoid relying on it. If you must force-remove a namespace (last resort) you can delete from that map in older code, but it’s unsupported. See the library types showing the internal _nsps field. [3]

Sources

  • cleanupEmptyChildNamespaces & server options (v4.6.0). [1]
  • Server API (io.of, disconnectSockets, new_namespace event). [2]
  • socket.io types / internal _nsps reference. [3]

🌐 Web query:

Socket.IO v4 namespace management public API documentation

💡 Result:

Official Socket.IO v4 namespace management docs — key pages and main public API surface:

  • Namespace overview (creation, main namespace, custom namespaces, dynamic namespaces, middleware, rooms, broadcasting). See “Namespaces” in the v4 docs. [1]

  • Server API (Server / io instance inherits main-namespace methods; emits new_namespace; adapter setting). [2]

Main public API surface (server-side, v4)

  • io.of(name | regexp | fn) — create/get a namespace (string for static, RegExp or function for dynamic namespaces). Use returned Namespace for further ops. [1]
  • Namespace.use(fn) — register a middleware for that namespace (or parent namespace to apply to children). [1]
  • Namespace.on('connection', socket => { ... }) — listen for connections in that namespace. [1]
  • Namespace.emit(event, ...args) — broadcast from the namespace to its connected sockets. [1]
  • Namespace.to(room) / Namespace.in(room) — target a room within the namespace for emitting. [1]
  • io.on('new_namespace', nsp => { ... }) — server-level event fired when a dynamic namespace is created. [2]
  • io.of('/') / io.sockets — the main (default) namespace; io inherits namespace methods. [1][2]

Client-side

  • io(url?) or io('/namespace') — connect to main or a specific namespace; multiplexing behavior described. [1]

For full method/property lists and examples (complete API for Namespace, Socket, and Server), consult the v4 docs pages above. [1][2]

Sources:
[1] Socket.IO v4 — Namespaces.
[2] Socket.IO v4 — Server API.


🏁 Script executed:

find . -name "package.json" -type f -exec grep -l "socket.io" {} \;

Repository: getmaxun/maxun

Length of output: 72


🏁 Script executed:

find . -name "package.json" -type f | head -5 | xargs cat

Repository: getmaxun/maxun

Length of output: 5736


Refactor to use Socket.IO's public API instead of internal _nsps.

The code accesses (io as any)._nsps, which is Socket.IO's private internal map and not part of the public API. While the project uses Socket.IO v4.4.1 (predating the v4.6.0+ cleanupEmptyChildNamespaces option), manual cleanup can still be achieved using public APIs:

  • Disconnect sockets: io.of('/namespace').disconnectSockets(true)
  • Remove listeners: nsp.removeAllListeners()
  • Close adapter: call adapter.close() if your adapter supports it

Even with a version check or documentation, relying on _nsps remains fragile and unsupported. Consider upgrading to Socket.IO v4.6.0+ to use the built-in cleanupEmptyChildNamespaces option, or refactor this cleanup logic to use public methods.

🤖 Prompt for AI Agents
In server/src/browser-management/controller.ts around lines 155 to 164, replace
the direct access to the internal (io as any)._nsps map with Socket.IO public
APIs: obtain the namespace with io.of(`/${id}`), check if it exists and has zero
connected sockets, call nsp.disconnectSockets(true) to force-disconnect any
sockets, then call nsp.removeAllListeners() and, if nsp.adapter && typeof
nsp.adapter.close === 'function', call nsp.adapter.close() to let the adapter
clean up; finally avoid manipulating private maps—if you must remove the
namespace reference ensure you do so using supported Socket.IO APIs or upgrade
to v4.6.0+ to enable cleanupEmptyChildNamespaces instead.

Comment on lines +328 to +350
const connectWithRetry = async (maxRetries: number = 3): Promise<Socket | null> => {
let retryCount = 0;

while (retryCount < maxRetries) {
try {
const socket = await waitForConnection;
if (socket || retryCount === maxRetries - 1) {
return socket;
}
} catch (error: any) {
logger.log('warn', `Connection attempt ${retryCount + 1} failed for browser ${id}: ${error.message}`);
}

retryCount++;
if (retryCount < maxRetries) {
const delay = Math.pow(2, retryCount) * 1000;
logger.log('info', `Retrying connection for browser ${id} in ${delay}ms (attempt ${retryCount + 1}/${maxRetries})`);
await new Promise(resolve => setTimeout(resolve, delay));
}
}

return null;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Retry logic is ineffective: same promise awaited in each iteration.

The waitForConnection promise is created once before connectWithRetry is called, but the retry loop awaits the same promise on each iteration. Once a promise is settled, awaiting it again returns the same result immediately without retrying.

The retry should create a new connection attempt each time:

-    const connectWithRetry = async (maxRetries: number = 3): Promise<Socket | null> => {
-      let retryCount = 0;
-      
-      while (retryCount < maxRetries) {
-        try {
-          const socket = await waitForConnection;
-          if (socket || retryCount === maxRetries - 1) {
-            return socket;
-          }
-        } catch (error: any) {
-          logger.log('warn', `Connection attempt ${retryCount + 1} failed for browser ${id}: ${error.message}`);
-        }
-        
-        retryCount++;
-        if (retryCount < maxRetries) {
-          const delay = Math.pow(2, retryCount) * 1000;
-          logger.log('info', `Retrying connection for browser ${id} in ${delay}ms (attempt ${retryCount + 1}/${maxRetries})`);
-          await new Promise(resolve => setTimeout(resolve, delay));
-        }
-      }
-      
-      return null;
-    };
-
-    const socket = await connectWithRetry(3);
+    // Wait for connection with single timeout - retries not effective for websocket listeners
+    const socket = await waitForConnection;

Alternatively, if true retries are needed, the waitForConnection promise factory should be created inside the loop.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In server/src/browser-management/controller.ts around lines 328 to 350, the
retry loop currently awaits the same pre-created waitForConnection promise each
iteration so retries are ineffective; change the code to create a fresh
connection promise on every attempt (e.g. call a waitForConnectionFactory or
move creation of waitForConnection inside the loop) so each retry triggers a new
connection attempt, handle its success/error the same way, increment retryCount,
and keep the existing backoff/delay and logging behavior.

await startWorkers();

io = new Server(server);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Socket.IO server is re-initialized after startup.

Line 182 creates a new Server(server) instance, overwriting the one initialized on lines 87-94. This means the carefully configured options (pingTimeout, cleanupEmptyChildNamespaces, etc.) are lost.

Either remove line 182 to keep the existing configured instance, or move the configuration to this location:

-      io = new Server(server);
+      // io is already initialized with options at module level

If re-initialization is intentional (e.g., to attach to the now-listening server), preserve the options:

-      io = new Server(server);
+      io = new Server(server, {
+        cleanupEmptyChildNamespaces: true,
+        pingTimeout: 60000,
+        pingInterval: 25000,
+        maxHttpBufferSize: 1e8,
+        transports: ['websocket', 'polling'],
+        allowEIO3: true
+      });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
io = new Server(server);
io = new Server(server, {
cleanupEmptyChildNamespaces: true,
pingTimeout: 60000,
pingInterval: 25000,
maxHttpBufferSize: 1e8,
transports: ['websocket', 'polling'],
allowEIO3: true
});
🤖 Prompt for AI Agents
In server/src/server.ts around line 182, a new Socket.IO Server is being created
(io = new Server(server)), which overwrites the previously configured instance
from lines 87–94 and drops options like pingTimeout and
cleanupEmptyChildNamespaces; either remove this re-initialization so the
original configured `io` is used, or if you must create it here (e.g., after
server.listen), re-create it with the same options as the original configuration
(copy pingTimeout, cleanupEmptyChildNamespaces, transports, cors, etc.) and
re-attach any namespaces/middlewares previously registered so no configuration
is lost.

Comment on lines +72 to +78
process.on('SIGTERM', async () => {
await stopPgBossClient();
});

process.on('SIGINT', async () => {
await stopPgBossClient();
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -n -A5 'stopPgBossClient' --type=ts

Repository: getmaxun/maxun

Length of output: 1459


🏁 Script executed:

rg -n -B5 -A10 'process\.on.*SIG' server/src/server.ts

Repository: getmaxun/maxun

Length of output: 565


🏁 Script executed:

rg -n -A30 "process\.on\('SIGINT'" server/src/server.ts

Repository: getmaxun/maxun

Length of output: 1626


🏁 Script executed:

rg -n 'SIGTERM' server/src/server.ts

Repository: getmaxun/maxun

Length of output: 156


🏁 Script executed:

rg -n -B5 -A20 'await stopPgBossClient' server/src/server.ts

Repository: getmaxun/maxun

Length of output: 1014


Signal handlers will not reliably prevent process exit.

The process.on() handlers for SIGTERM and SIGINT do not guarantee that stopPgBossClient() completes before the process exits because process.on() doesn't await async callbacks. For SIGTERM, there is no handler in server.ts, making shutdown of PgBoss unreliable. For SIGINT, both pgboss.ts and server.ts define handlers, creating redundancy and potential race conditions.

Remove these signal handlers from pgboss.ts entirely—server.ts already implements proper graceful shutdown with explicit await of stopPgBossClient() (line 384) and should be the sole coordinator of shutdown. If SIGTERM handling is needed, add it to server.ts's shutdown sequence rather than relying on async signal callbacks.

🤖 Prompt for AI Agents
In server/src/storage/pgboss.ts around lines 72-78, remove the
process.on('SIGTERM'...) and process.on('SIGINT'...) signal handlers entirely so
pgboss.ts no longer registers async shutdown callbacks; the application-level
server.ts should be the sole coordinator of graceful shutdown (it already awaits
stopPgBossClient()), so ensure stopPgBossClient remains exported and is invoked
from server.ts's shutdown sequence (and do not add duplicate signal handlers
elsewhere).

Comment on lines +16 to +29
export async function scheduleWorkflow(id: string, userId: string, cronExpression: string, timezone: string): Promise<void> {
try {
const runId = uuid();

const queueName = `scheduled-workflow-${id}`;

logger.log('info', `Scheduling workflow ${id} with cron expression ${cronExpression} in timezone ${timezone}`);

await pgBossClient.createQueue(queueName);

await pgBossClient.schedule(queueName, cronExpression,
{ id, runId, userId },
{ tz: timezone }
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

runId is generated at schedule time, not execution time.

The runId is generated once when the workflow is scheduled (line 18) and embedded in the job data. For recurring cron jobs, this means every execution will have the same runId, which could cause issues with run tracking and deduplication.

Consider generating the runId in the worker that processes the scheduled job, not in the scheduling function:

 export async function scheduleWorkflow(id: string, userId: string, cronExpression: string, timezone: string): Promise<void> {
   try {
-    const runId = uuid();
-
     const queueName = `scheduled-workflow-${id}`;

     logger.log('info', `Scheduling workflow ${id} with cron expression ${cronExpression} in timezone ${timezone}`);

     await pgBossClient.createQueue(queueName);

     await pgBossClient.schedule(queueName, cronExpression,
-      { id, runId, userId },
+      { id, userId },  // runId should be generated by the worker on each execution
       { tz: timezone }
     );
🤖 Prompt for AI Agents
In server/src/storage/schedule.ts around lines 16–29, the code currently
generates a runId at schedule time and embeds it in the scheduled job payload
which causes every cron occurrence to share the same runId; remove the uuid()
generation here and stop including runId in the data passed to
pgBossClient.schedule, and instead generate a new runId inside the job
processing worker when the scheduled job executes (update the job payload
typing/consumers to expect runId to be created at execution time and propagate
that new runId to any run-tracking/deduplication logic).

Comment on lines +74 to +75
private pageCloseListeners: Map<Page, () => void> = new Map();

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

pageCloseListeners map is declared but never populated.

The cleanup method iterates over pageCloseListeners (lines 974-983) to remove page close listeners, but the map is never populated. Looking at notifyOnNewTab (line 1221), page close listeners are added directly without being tracked in this map.

Either populate the map when adding listeners, or remove the unused cleanup logic:

  public notifyOnNewTab = (page: Page, pageIndex: number) => {
    if (this.socket) {
-     page.on('close', () => {
+     const closeListener = () => {
        this.socket.emit('tabHasBeenClosed', pageIndex);
-     })
+     };
+     page.on('close', closeListener);
+     this.pageCloseListeners.set(page, closeListener);
      const parsedUrl = new URL(page.url());
      const host = parsedUrl.hostname?.match(/\b(?!www\.)[a-zA-Z0-9]+/g)?.join('.');
      this.socket.emit('newTab', host ? host : 'new tab')
    }
  }

Also applies to: 974-983

🤖 Prompt for AI Agents
In server/src/workflow-management/classes/Generator.ts around lines 74-75 (and
refer to lines 974-983 and 1221), the pageCloseListeners Map is declared but
never populated while notifyOnNewTab adds close listeners directly; update
notifyOnNewTab (near line 1221) to store each listener in pageCloseListeners
keyed by the Page when you add it (so cleanup at lines 974-983 can iterate and
remove them), or if you prefer not to track them, remove the cleanup loop and
the pageCloseListeners declaration; implement whichever option you choose
consistently (either populate the map when adding listeners and remove them in
cleanup, or delete the unused map and cleanup code).

Comment on lines +177 to +192
/**
* Removes pausing-related socket listeners to prevent memory leaks
* Must be called before re-registering listeners or during cleanup
* @private
*/
private removePausingListeners(): void {
try {
this.socket.removeAllListeners('pause');
this.socket.removeAllListeners('resume');
this.socket.removeAllListeners('step');
this.socket.removeAllListeners('breakpoints');
logger.log('debug', 'Removed pausing socket listeners');
} catch (error: any) {
logger.warn(`Error removing pausing listeners: ${error.message}`);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect logger method: logger.warn() should be logger.log('warn', ...).

Line 190 uses logger.warn() but based on the import and usage pattern throughout the file, the logger uses logger.log('warn', ...) syntax.

     } catch (error: any) {
-      logger.warn(`Error removing pausing listeners: ${error.message}`);
+      logger.log('warn', `Error removing pausing listeners: ${error.message}`);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* Removes pausing-related socket listeners to prevent memory leaks
* Must be called before re-registering listeners or during cleanup
* @private
*/
private removePausingListeners(): void {
try {
this.socket.removeAllListeners('pause');
this.socket.removeAllListeners('resume');
this.socket.removeAllListeners('step');
this.socket.removeAllListeners('breakpoints');
logger.log('debug', 'Removed pausing socket listeners');
} catch (error: any) {
logger.warn(`Error removing pausing listeners: ${error.message}`);
}
}
/**
* Removes pausing-related socket listeners to prevent memory leaks
* Must be called before re-registering listeners or during cleanup
* @private
*/
private removePausingListeners(): void {
try {
this.socket.removeAllListeners('pause');
this.socket.removeAllListeners('resume');
this.socket.removeAllListeners('step');
this.socket.removeAllListeners('breakpoints');
logger.log('debug', 'Removed pausing socket listeners');
} catch (error: any) {
logger.log('warn', `Error removing pausing listeners: ${error.message}`);
}
}
🤖 Prompt for AI Agents
In server/src/workflow-management/classes/Interpreter.ts around lines 177 to
192, the catch block currently calls logger.warn(...) which is inconsistent with
the file's logger usage; change the call to logger.log('warn', `Error removing
pausing listeners: ${error?.message}`) (preserving the same message and safe
access to error.message) so it matches the established logger.log(level,
message) pattern.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
server/src/routes/record.ts (2)

26-58: Potential timer leak if promise rejects early.

The setTimeout on line 50 schedules the next check, but if the promise rejects on lines 32, 39, 47, or 52, there's a brief window where the next checkJobStatus may still execute. While the check on line 31 prevents infinite recursion, this could cause one extra unnecessary poll after rejection.

Consider storing the timer and clearing it on early exit:

 async function waitForJobCompletion(jobId: string, queueName: string, timeout = 15000): Promise<any> {
     return new Promise((resolve, reject) => {
       const startTime = Date.now();
+      let timeoutId: NodeJS.Timeout | null = null;
       
       const checkJobStatus = async () => {
         if (Date.now() - startTime > timeout) {
+          if (timeoutId) clearTimeout(timeoutId);
           return reject(new Error(`Timeout waiting for job ${jobId} to complete`));
         }
         
         try {
           const job = await pgBossClient.getJobById(queueName, jobId);
           
           if (!job) {
+            if (timeoutId) clearTimeout(timeoutId);
             return reject(new Error(`Job ${jobId} not found`));
           }
           
           if (job.state === 'completed') {
+            if (timeoutId) clearTimeout(timeoutId);
             return resolve(job.output);
           }
           
           if (job.state === 'failed') {
+            if (timeoutId) clearTimeout(timeoutId);
             return reject(new Error(`Job ${jobId} failed.`));
           }
           
-          setTimeout(checkJobStatus, 200);
+          timeoutId = setTimeout(checkJobStatus, 200);
         } catch (error) {
+          if (timeoutId) clearTimeout(timeoutId);
           reject(error);
         }
       };
       
       checkJobStatus();
     });
 }

79-89: Verify pgBossClient is initialized before use by adjusting startup timing.

The server listens for requests at line 168 in server/src/server.ts, but startPgBossClient() is called asynchronously inside the listen callback at line 178. This creates a race condition: requests arriving between server.listen() and startPgBossClient() completion will attempt to call pgBossClient.send() on an unstarted client.

While try-catch blocks in server/src/routes/record.ts (lines 78, 138, 232, 268) handle errors and fall back to initializeRemoteBrowserForRecording(), this masks the underlying initialization gap. Either move startPgBossClient() to execute before server.listen() or add a startup guard to prevent route handlers from processing requests until initialization completes.

server/src/api/record.ts (1)

591-613: API formats override is currently ignored

executeRun supports an optional requestedFormats?: string[] parameter and correctly overrides recording.recording_meta.formats when provided. However:

  • readyForRunHandler calls executeRun(id, userId) without passing requestedFormats.
  • handleRunRecording receives requestedFormats and uses it only to call handleRunRecording itself; it is never forwarded to readyForRunHandler/executeRun.

As a result, the formats field in the POST body to /api/robots/{id}/runs has no effect.

To honor the API contract, wire the argument through:

-async function readyForRunHandler(browserId: string, id: string, userId: string, socket: Socket){
+async function readyForRunHandler(
+  browserId: string,
+  id: string,
+  userId: string,
+  socket: Socket,
+  requestedFormats?: string[]
+) {
   try {
-        const result = await executeRun(id, userId);
+        const result = await executeRun(id, userId, requestedFormats);
@@
 export async function handleRunRecording(id: string, userId: string, requestedFormats?: string[]) {
@@
-        const readyHandler = () => readyForRunHandler(browserId, newRunId, userId, socket!);
+        const readyHandler = () => readyForRunHandler(browserId, newRunId, userId, socket!, requestedFormats);

This keeps scheduled runs unaffected while allowing API callers to override formats as documented.

Also applies to: 632-699, 1276-1279

♻️ Duplicate comments (7)
server/src/workflow-management/classes/Generator.ts (1)

74-75: pageCloseListeners map is still never populated.

The map is declared but notifyOnNewTab (lines 1219-1228) still adds page close listeners directly via page.on('close', ...) without storing them in this map. The cleanup loop at lines 974-983 will iterate over an empty map, making the cleanup ineffective.

Apply the fix from the past review to populate the map:

  public notifyOnNewTab = (page: Page, pageIndex: number) => {
    if (this.socket) {
-     page.on('close', () => {
+     const closeListener = () => {
        this.socket.emit('tabHasBeenClosed', pageIndex);
-     })
+     };
+     page.on('close', closeListener);
+     this.pageCloseListeners.set(page, closeListener);
      const parsedUrl = new URL(page.url());
      const host = parsedUrl.hostname?.match(/\b(?!www\.)[a-zA-Z0-9]+/g)?.join('.');
      this.socket.emit('newTab', host ? host : 'new tab')
    }
  }
server/src/browser-management/classes/RemoteBrowser.ts (1)

154-175: initializeMemoryManagement() is still never called.

The method sets up memory monitoring intervals but is never invoked during browser initialization or elsewhere. The interval will never run.

Call this method at the end of successful initialization:

              this.emitLoadingProgress(60, 0);

              success = true;
+             this.initializeMemoryManagement();
              logger.log('debug', `Browser initialized successfully for user ${userId}`);
server/src/workflow-management/classes/Interpreter.ts (1)

177-192: Fix ordering around updateSocket to actually detach listeners from the old socket, and align logger usage.

Two issues here:

  1. Socket listener leak due to order of operations
  • removePausingListeners() operates on this.socket.
  • In updateSocket, you first assign this.socket = socket and then call subscribeToPausing(), which in turn calls removePausingListeners().
  • Result: you remove listeners from the new socket (which has none) and leave all listeners attached to the old socket, which is likely the opposite of what you intend and can cause memory leaks / double‑handling if the old socket is still alive.

A minimal fix is to remove listeners before swapping sockets:

   public updateSocket = (socket: Socket): void => {
-    this.socket = socket;
-    this.subscribeToPausing();
+    // Detach pausing listeners from the current socket before switching
+    this.removePausingListeners();
+    this.socket = socket;
+    this.subscribeToPausing();
   };

This way:

  • First call cleans up listeners on the existing socket.
  • subscribeToPausing() then removes (no‑op) and registers fresh listeners on the new socket.
  • Subsequent socket swaps correctly clean up the previous socket’s handlers.
  1. Logger usage inconsistent with rest of file (previously flagged)

In removePausingListeners’s catch block you still use logger.warn(...), but everywhere else the file follows logger.log(level, message):

-    } catch (error: any) {
-      logger.warn(`Error removing pausing listeners: ${error.message}`);
-    }
+    } catch (error: any) {
+      logger.log('warn', `Error removing pausing listeners: ${error?.message ?? error}`);
+    }

This avoids API inconsistency and matches the established logger pattern mentioned in the earlier review.

Also applies to: 200-226, 409-410, 731-734

server/src/pgboss-worker.ts (1)

98-122: Initialize integration task retries to 0 instead of 5

Both Google Sheets and Airtable tasks are enqueued with retries: 5. The processors (processAirtableUpdates / processGoogleSheetUpdates) treat retries as an attempt counter starting from 0 and only retry while retries < MAX_RETRIES. Initializing to 5 effectively disables retries (and can make logging misleading).

Consider:

addGoogleSheetUpdateTask(runId, {
  robotId: robotMetaId,
  runId: runId,
  status: 'pending',
- retries: 5,
+ retries: 0,
});

addAirtableUpdateTask(runId, {
  robotId: robotMetaId,
  runId: runId,
  status: 'pending',
- retries: 5,
+ retries: 0,
});
server/src/workflow-management/scheduler/index.ts (1)

104-137: Integration task retries should start at 0; helper duplication mirrors other files

triggerIntegrationUpdates enqueues both Google Sheets and Airtable tasks with retries: 5, then starts the processors via withTimeout. As in the other modules, the processors interpret retries as a counter from 0, so starting at 5 effectively skips the retry window.

Suggest:

addGoogleSheetUpdateTask(runId, {
  robotId: robotMetaId,
  runId: runId,
  status: 'pending',
- retries: 5,
+ retries: 0,
});

addAirtableUpdateTask(runId, {
  robotId: robotMetaId,
  runId: runId,
  status: 'pending',
- retries: 5,
+ retries: 0,
});

Also, withTimeout here is identical to the helpers in pgboss-worker.ts and api/record.ts; a shared utility would avoid drift if you ever tweak its behavior.

server/src/browser-management/inputHandlers.ts (1)

1059-1084: Remove unused dom:input listener cleanup or add matching registration

removeInputHandlers calls socket.removeAllListeners("dom:input"), but there is no corresponding socket.on("dom:input", ...) registration in registerInputHandlers. This is harmless but misleading and may cause future drift.

Consider either:

  • Removing the socket.removeAllListeners("dom:input"); line, or
  • Adding the missing "dom:input" registration if it was intended to be supported.
server/src/api/record.ts (1)

556-586: Integration task retries should start at 0, not 5

As in the worker and scheduler modules, triggerIntegrationUpdates here enqueues tasks with retries: 5, but the Airtable/Google Sheets processors treat retries as a retry counter starting from 0. Initializing to 5 effectively prevents any retries.

Consider:

addGoogleSheetUpdateTask(runId, {
  robotId: robotMetaId,
  runId: runId,
  status: 'pending',
- retries: 5,
+ retries: 0,
});

addAirtableUpdateTask(runId, {
  robotId: robotMetaId,
  runId: runId,
  status: 'pending',
- retries: 5,
+ retries: 0,
});
🧹 Nitpick comments (8)
server/src/browser-management/classes/RemoteBrowser.ts (1)

417-437: Redundant nested try-catch blocks.

The outer try-catch (lines 421, 433-436) is unreachable because the inner catch (lines 430-432) swallows all errors without rethrowing. Simplify to a single try-catch:

  private async applyEnhancedFingerprinting(context: BrowserContext): Promise<void> {
    try {
-     try {
-       const fingerprintGenerator = new FingerprintGenerator();
-       const fingerprint = fingerprintGenerator.getFingerprint();
-       const fingerprintInjector = new FingerprintInjector();
-
-       await fingerprintInjector.attachFingerprintToPlaywright(context as any, fingerprint);
-
-       logger.info("Enhanced fingerprinting applied successfully");
-     } catch (fingerprintError: any) {
-       logger.warn(`Modern fingerprint injection failed: ${fingerprintError.message}. Using existing protection.`);
-     }
-   } catch (error: any) {
-     logger.error(`Enhanced fingerprinting failed: ${error.message}`);
-     // Don't throw - fallback to basic functionality
+     const fingerprintGenerator = new FingerprintGenerator();
+     const fingerprint = fingerprintGenerator.getFingerprint();
+     const fingerprintInjector = new FingerprintInjector();
+
+     await fingerprintInjector.attachFingerprintToPlaywright(context as any, fingerprint);
+
+     logger.info("Enhanced fingerprinting applied successfully");
+   } catch (error: any) {
+     logger.warn(`Modern fingerprint injection failed: ${error.message}. Using existing protection.`);
+     // Don't throw - fallback to basic functionality
    }
  }
server/src/routes/storage.ts (1)

1085-1087: Add logging when circuit breaker prevents processing.

The early return when the circuit breaker is open happens silently. This could make it difficult to diagnose why queued runs aren't being processed during cooldown periods.

Add a log message:

     if (Date.now() < circuitBreakerOpenUntil) {
+      logger.log('debug', `Circuit breaker open, skipping queue processing until ${new Date(circuitBreakerOpenUntil).toISOString()}`);
       return;
     }
server/src/workflow-management/classes/Interpreter.ts (1)

146-152: Bounded persistence retries and timer cleanup look good; consider minor tidy‑ups.

The new persistenceRetryTimer, MAX_PERSISTENCE_RETRIES, and corresponding cleanup in clearState + flushPersistenceBuffer give you bounded, backoff‑based retries and avoid timer leaks. Behaviorally this looks solid.

Two minor cleanups you might consider:

  1. The guard in the catch block is a no‑op and can be dropped:
-      if (!this.persistenceRetryCount) {
-        this.persistenceRetryCount = 0;
-      }
  1. Optionally clear any outstanding retry timer on a successful flush, to avoid a stray timeout firing later (it will early‑return, but you can skip the extra call):
     try {
       const sequelize = require('../../storage/db').default;
       await sequelize.transaction(async (transaction: any) => {
         // ...
       });

       this.persistenceRetryCount = 0;
+      if (this.persistenceRetryTimer) {
+        clearTimeout(this.persistenceRetryTimer);
+        this.persistenceRetryTimer = null;
+      }

These are purely maintenance/clarity improvements; current logic is functionally fine.

Also applies to: 372-391, 851-867

server/src/browser-management/inputHandlers.ts (1)

871-1014: Pagination scroll test handler is correct; selector helper duplication is optional to DRY up

The scroll test logic (pre/post item counts, scroll restore, error emission) looks solid and matches the intended XPath/CSS behavior. Since evaluateSelector is already implemented in src/helpers/clientPaginationDetector.ts, consider extracting a shared helper or at least avoiding duplicating the same logic twice inside this function to keep maintenance simpler.

server/src/pgboss-worker.ts (1)

89-96: Timeout handling is good; consider centralizing helper usage

The new timeouts around interpretation and scrape conversions are a solid guardrail against hung jobs. Right now you:

  • Define withTimeout once.
  • Re‑implement Promise.race-based timeouts for SCRAPE_TIMEOUT and INTERPRETATION_TIMEOUT.

For consistency and easier tuning, consider using withTimeout for the scrape conversions as well (or extracting a shared async-utils module so this helper is reused across this file, scheduler/index.ts, and api/record.ts).

Also applies to: 248-265, 398-412

server/src/workflow-management/scheduler/index.ts (2)

625-644: Socket-aware readyForRunHandler and cleanup are correct; resetRecordingState is effectively a no-op

Passing the socket into readyForRunHandler and always invoking cleanupSocketConnection in finally nicely guarantees socket cleanup regardless of run outcome. One small nit: resetRecordingState(browserId, id) only reassigns its parameters, so it has no observable effect and can be removed unless you later evolve it to mutate shared state.

Also applies to: 693-713


651-691: Verify BACKEND_URL fallback port consistency with API runner

Here the connection URL falls back to http://localhost:5000, while server/src/api/record.ts’s handleRunRecording falls back to http://localhost:8080. If both paths are intended to talk to the same backend instance, this mismatch could cause scheduled vs API runs to behave differently in local/dev environments.

Please double-check which port is correct and align both call sites (or configure via a single shared constant).

server/src/api/record.ts (1)

1097-1114: Verify BACKEND_URL fallback port vs scheduler

Here the socket client falls back to http://localhost:8080, while the scheduler version uses http://localhost:5000. If both should target the same backend, this discrepancy could cause API-triggered runs to connect to a different port than scheduler-triggered runs.

Recommend aligning both to a single constant (e.g., SOCKET_BACKEND_URL) so local/dev configs don’t silently diverge.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 926db0a and 4d49275.

📒 Files selected for processing (10)
  • package.json (1 hunks)
  • server/src/api/record.ts (10 hunks)
  • server/src/browser-management/classes/RemoteBrowser.ts (17 hunks)
  • server/src/browser-management/inputHandlers.ts (2 hunks)
  • server/src/pgboss-worker.ts (13 hunks)
  • server/src/routes/record.ts (6 hunks)
  • server/src/routes/storage.ts (9 hunks)
  • server/src/workflow-management/classes/Generator.ts (3 hunks)
  • server/src/workflow-management/classes/Interpreter.ts (4 hunks)
  • server/src/workflow-management/scheduler/index.ts (10 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • package.json
🧰 Additional context used
🧬 Code graph analysis (5)
server/src/routes/storage.ts (1)
server/src/storage/pgboss.ts (1)
  • pgBossClient (27-30)
server/src/routes/record.ts (1)
server/src/storage/pgboss.ts (1)
  • pgBossClient (27-30)
server/src/browser-management/inputHandlers.ts (3)
server/src/server.ts (1)
  • browserPool (99-99)
maxun-core/src/browserSide/scraper.js (2)
  • selector (28-28)
  • node (664-664)
src/helpers/clientPaginationDetector.ts (1)
  • evaluateSelector (290-318)
server/src/api/record.ts (7)
server/src/workflow-management/integrations/gsheet.ts (2)
  • addGoogleSheetUpdateTask (25-38)
  • processGoogleSheetUpdates (312-365)
server/src/workflow-management/integrations/airtable.ts (2)
  • addAirtableUpdateTask (26-39)
  • processAirtableUpdates (514-569)
server/src/markdownify/scrape.ts (2)
  • convertPageToMarkdown (26-77)
  • convertPageToHTML (85-135)
server/src/routes/webhook.ts (1)
  • sendWebhook (404-434)
maxun-core/src/interpret.ts (1)
  • run (1664-1700)
server/src/browser-management/controller.ts (1)
  • destroyRemoteBrowser (124-200)
server/src/server.ts (1)
  • io (87-94)
server/src/workflow-management/scheduler/index.ts (6)
server/src/workflow-management/integrations/gsheet.ts (2)
  • addGoogleSheetUpdateTask (25-38)
  • processGoogleSheetUpdates (312-365)
server/src/workflow-management/integrations/airtable.ts (2)
  • addAirtableUpdateTask (26-39)
  • processAirtableUpdates (514-569)
server/src/markdownify/scrape.ts (2)
  • convertPageToMarkdown (26-77)
  • convertPageToHTML (85-135)
server/src/browser-management/controller.ts (1)
  • destroyRemoteBrowser (124-200)
maxun-core/src/interpret.ts (1)
  • run (1664-1700)
server/src/server.ts (1)
  • io (87-94)
🔇 Additional comments (23)
server/src/workflow-management/classes/Generator.ts (2)

889-910: LGTM!

The removeSocketListeners method correctly removes all event listeners that are registered in initializeSocketListeners, initializeDOMListeners, and registerEventHandlers. Error handling is appropriate.


966-997: Well-structured cleanup method.

The cleanup method properly:

  • Removes socket listeners first
  • Defensively checks page.isClosed() before removing listeners
  • Resets internal state completely
  • Has granular error handling

Once the pageCloseListeners population issue is fixed (per the earlier comment), this cleanup will be fully effective.

server/src/browser-management/classes/RemoteBrowser.ts (5)

233-277: Good defensive cleanup pattern for scroll listener.

Properly removes existing dom:scroll listener before re-registering to prevent duplicate handlers. The try-catch ensures stability even if removal fails.


653-670: LGTM!

The method correctly removes all socket listeners registered in registerEditorEvents. Error handling is appropriate.


877-1009: Excellent comprehensive cleanup implementation.

The switchOff method now properly:

  • Clears all intervals and timeouts first (lines 878-908)
  • Removes socket listeners (line 910)
  • Cleans up generator (lines 926-934)
  • Uses 5-second timeout guards for async operations (CDP, page, context, browser close)
  • Nulls resources in finally blocks ensuring cleanup even on timeout
  • Has individual try-catch blocks preventing cascading failures

The cleanup order (listeners → generator → interpreter → DOM → CDP → page → context → browser) is correct.


1018-1027: LGTM!

Good conditional logic to only re-register the scroll event listener when DOM streaming is active, preventing unnecessary listener setup.


445-594: Good timeout protection for initialization.

The initialization now has:

  • Overall 120-second timeout guard (line 447, 589-593)
  • Nested 15-second timeout for context creation (lines 508-513)
  • Retry logic with cleanup between attempts (lines 571-584)

This prevents indefinite hangs during browser initialization.

server/src/routes/record.ts (2)

18-18: LGTM: Import updated to use pgBossClient singleton.

The import path change aligns with the PR's objective to consolidate queue operations through a shared pgBossClient singleton.


139-169: LGTM: Consistent queue operation pattern.

The endpoint follows the same pgBossClient pattern established in the /start endpoint with appropriate fallback handling.

server/src/routes/storage.ts (8)

665-667: LGTM: Consistent queue operations.

The queue creation and job sending pattern is consistent with the approach in record.ts.


1077-1081: Circuit breaker state is shared across all queue processing.

The circuit breaker variables are module-level, meaning the circuit breaker state affects all users and all queued runs globally. If one user's queued runs trigger database errors, all queue processing stops during the cooldown period.

This is acceptable for protecting the database from cascading failures, but be aware that a single problematic run could temporarily block queue processing for all users.

Consider whether per-user circuit breakers would be more appropriate for your use case, or if the global circuit breaker adequately serves your database protection needs.


1149-1158: LGTM: Circuit breaker error handling is correct.

The circuit breaker properly tracks consecutive errors and opens after reaching the threshold. The cooldown mechanism will prevent excessive database load during outages.


1130-1138: LGTM: Queue operations properly error-handled.

The queue operations for execution are wrapped in the try-catch block that feeds the circuit breaker, ensuring queue failures are properly tracked.


1238-1238: LGTM: Exporting processQueuedRuns for external use.

The export allows other modules (likely server initialization or scheduled jobs) to trigger queue processing, which aligns with the PR's goals for improved queue management.


973-980: LGTM: Schedule cancellation updated.

The comment and code correctly reflect the use of cancelScheduledWorkflow, which internally uses pgBossClient as per the PR's consolidation objectives.


591-592: LGTM: Comment accurately describes queue behavior.

The updated comment correctly reflects the use of pgBossClient for queueing runs when browser limits are reached.


1054-1056: LGTM: Abort queue operations consistent.

The abort functionality correctly uses pgBossClient following the same pattern as other queue operations.

server/src/workflow-management/classes/Interpreter.ts (1)

392-407: Interpreter cleanup hook is correctly integrated.

The additional this.interpreter.cleanup() call (guarded by a type check and wrapped in the existing try/catch) cleanly extends the shutdown path without changing external behavior. This should help prevent lingering interpreter‑level resources.

server/src/browser-management/inputHandlers.ts (1)

1049-1050: New testPaginationScroll wiring looks consistent

Registering "testPaginationScroll" here and exporting registerInputHandlers/removeInputHandlers aligns with the new server-side pagination test handler; I don't see issues with the event wiring.

Also applies to: 1086-1086

server/src/pgboss-worker.ts (1)

846-847: Worker interval tracking and shutdown cleanup look correct

Tracking queue-check setInterval handles in workerIntervals and clearing them on SIGTERM/SIGINT prevents orphaned timers from keeping the process alive. The pattern is consistent for both user and abort queues; I don’t see issues here.

Also applies to: 920-928, 956-964, 1072-1077, 1082-1087

server/src/workflow-management/scheduler/index.ts (1)

272-290: Scrape/interpretation timeouts and interpreter cleanup look good

The added SCRAPE_TIMEOUT and INTERPRETATION_TIMEOUT wrappers around markdown/HTML conversion and InterpretRecording provide clear upper bounds on run duration, and the follow‑up browser.interpreter.clearState() + destroyRemoteBrowser calls in both normal and error paths should significantly reduce long‑lived interpreter/browser leaks. I don’t see logic bugs in these flows.

Also applies to: 427-438, 668-675, 677-679

server/src/api/record.ts (2)

591-613: Socket lifecycle and run-wait logic look robust

The new readyForRunHandler + cleanupSocketConnection pattern ensures:

  • Client socket listeners are removed and the socket is disconnected in all cases.
  • Server namespaces are torn down when browserId is known.
    Combined with waitForRunCompletion’s 3‑hour cap and explicit aborted/failed run handling, this should greatly reduce long-lived socket and DB resource leakage for API runs.

Also applies to: 1097-1139, 1141-1161, 1163-1185


711-728: Scrape/interpretation timeouts and interpreter cleanup are well integrated

The addition of SCRAPE_TIMEOUT around markdown/HTML conversions and INTERPRETATION_TIMEOUT around InterpretRecording, plus browser.interpreter.clearState() before destroyRemoteBrowser, gives you clear safeguards against hung conversions and interpreter state leaks. The error paths also correctly update run status, fire webhooks, and emit socket events, so behavior remains observable.

Also applies to: 875-886, 890-893, 1006-1007

import { pgBoss, registerWorkerForQueue, registerAbortWorkerForQueue } from '../pgboss-worker';
import { cancelScheduledWorkflow, scheduleWorkflow } from '../storage/schedule';
import { pgBossClient } from '../storage/pgboss';
chromium.use(stealthPlugin());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and read the relevant section
if [ -f "server/src/routes/storage.ts" ]; then
  echo "=== File exists, reading lines 1-30 ==="
  head -n 30 "server/src/routes/storage.ts" | cat -n
else
  echo "File not found at server/src/routes/storage.ts"
  echo "=== Searching for storage.ts files ==="
  find . -name "storage.ts" -type f 2>/dev/null
fi

Repository: getmaxun/maxun

Length of output: 1773


🏁 Script executed:

# Search for chromium and stealthPlugin usage in the codebase
echo "=== Searching for chromium imports/usage ==="
rg "chromium" --type ts -B 2 -A 2 server/src/

echo -e "\n=== Searching for stealthPlugin ==="
rg "stealthPlugin" --type ts -B 2 -A 2 server/src/

Repository: getmaxun/maxun

Length of output: 3220


🏁 Script executed:

# Check package.json for stealth-related dependencies
echo "=== Checking package.json ==="
fd -e json "package.json" -x cat {} \; | jq '.dependencies, .devDependencies' 2>/dev/null | grep -i -E "(playwright|puppeteer|stealth)"

Repository: getmaxun/maxun

Length of output: 373


🏁 Script executed:

# Check if chromium.use pattern is used elsewhere
echo "=== Searching for .use( pattern with chromium ==="
rg "chromium\.use\(" --type ts -B 5 -A 2

echo -e "\n=== Check browserConnection.ts more thoroughly ==="
rg -n "stealthPlugin|\.use\(" server/src/browser-management/browserConnection.ts

echo -e "\n=== Search for playwright-extra usage ==="
rg "playwright-extra" --type ts -B 2 -A 2

Repository: getmaxun/maxun

Length of output: 1350


Critical: Remove misplaced line from storage.ts

Line 18 contains chromium.use(stealthPlugin()); which does not belong in this file. This code is already correctly implemented in browser/server.ts with proper imports (chromium from 'playwright-extra' and stealthPlugin from 'puppeteer-extra-plugin-stealth'). Delete this line from storage.ts entirely—it will cause a ReferenceError at runtime and serves no purpose in the router module.

🤖 Prompt for AI Agents
In server/src/routes/storage.ts around line 18, remove the stray line
`chromium.use(stealthPlugin());` entirely — it does not belong in this router
module and will cause a ReferenceError at runtime; after deletion, also scan the
file for any unused imports or references to `chromium` or `stealthPlugin` and
remove them to avoid lint/errors so the module contains only relevant router
code.

Comment on lines 1088 to 1093
const queuedRun = await Run.findOne({
where: { status: 'queued' },
order: [['startedAt', 'ASC']]
order: [['startedAt', 'ASC']],
});

consecutiveDbErrors = 0;
if (!queuedRun) return;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix circuit breaker reset logic.

The consecutiveDbErrors counter is reset on line 1092 immediately after a successful database query, even if the query returns no queued runs. This means the circuit breaker will reset even when no actual work is processed.

The counter should only reset after successfully processing a queued run, not just after any successful query.

Apply this diff to fix the reset logic:

     const queuedRun = await Run.findOne({
       where: { status: 'queued' },
       order: [['startedAt', 'ASC']],
     });
-    consecutiveDbErrors = 0;
     if (!queuedRun) return;
     
+    // Reset error counter after successfully finding and beginning to process a run
+    consecutiveDbErrors = 0;
+    
     const userId = queuedRun.runByUserId;

Alternatively, consider resetting the counter only after the entire run processing succeeds (after line 1138), which would provide even stronger guarantees that the system is healthy before resetting the circuit breaker.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const queuedRun = await Run.findOne({
where: { status: 'queued' },
order: [['startedAt', 'ASC']]
order: [['startedAt', 'ASC']],
});
consecutiveDbErrors = 0;
if (!queuedRun) return;
const queuedRun = await Run.findOne({
where: { status: 'queued' },
order: [['startedAt', 'ASC']],
});
if (!queuedRun) return;
// Reset error counter after successfully finding and beginning to process a run
consecutiveDbErrors = 0;
🤖 Prompt for AI Agents
In server/src/routes/storage.ts around lines 1088-1093 the code resets
consecutiveDbErrors immediately after the database findOne call, which clears
the circuit breaker even when no queued run was returned; change the logic so
that consecutiveDbErrors is NOT reset here — instead reset it only after a
queuedRun is actually processed successfully (move the reset to after the
successful run-processing block, e.g. after line 1138) or at the exact point
where the full run processing completes without error, ensuring the counter only
clears when real work finished.

@amhsirak amhsirak merged commit 3d7ae1e into develop Nov 30, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Chore: Clean up unused/stale sockets

3 participants