Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance on Restarting a Specific Service in Workerpool to Handle Memory Leaks in Playwright #427

Closed
wojtekKrol opened this issue Jan 24, 2024 · 3 comments
Labels

Comments

@wojtekKrol
Copy link

Description

I am using the workerpool library to manage multiple services in a Node.js application, specifically for crawling tasks using Playwright. However, I've encountered an issue with Playwright related to memory leaks. This seems to be a common problem among developers using Playwright, and the suggested workaround involves restarting the Playwright process to free up memory.

Issue

In my application, each service is a separate worker within workerpool. One of these services, a crawler, is responsible for handling thousands of URLs. Due to the memory leak in Playwright, I need a way to programmatically restart this specific service (crawler) within workerpool. The service is stateless and does not process any data persistently, so it should be feasible to restart it without losing important information.

Current Implementation

Here is a simplified version of how the services are structured:

// Main file
import path from 'path';
import { fileURLToPath } from 'url';
import { pool } from 'workerpool';
import { runApiServer } from '~/api/api.js';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

const computeWorkersCount = (
  name: AppWorker
): [min: number, max: number] => {
  return [
    Number(CONFIG[(name + '_WORKERS_MIN') as AppWorkersMin]),
    Number(CONFIG[(name + '_WORKERS_MAX') as AppWorkersMax]),
  ]
}

const main = async () => {
  // Database initializations
  const oneDB = createOneDB();
  const anotherDB = createAnotherDB();

  runApiServer({ oneDB , anotherDB });

  const services = [
    [computeWorkersCount('FIRST'), './services/one'],
    [computeWorkersCount('SECOND'), './services/second'],
    [computeWorkersCount('THIRD_LEAK_MEMORY_PROBLEM'), './services/third'],
  ];

  for (const [[min, max], servicePath] of services) {
    pool(path.join(__dirname, servicePath), {
      minWorkers: min,
      maxWorkers: max,
    })
      .exec('main', null)
      .catch(console.error);
  }
};

main();

// Example of a service worker
import { worker } from 'workerpool';

worker({
  main: () =>
    main({
      oneDB: createOneDB(),
      anotherDB: createAnotherDB(),
    }),
});

Request

I am seeking guidance or a feature within workerpool that would allow me to restart a specific service (especially the crawler service using Playwright) to handle the memory leak issue. This would involve terminating and then reinitializing the service's process. Any suggestions or solutions for this scenario would be greatly appreciated.

@josdejong
Copy link
Owner

I guess you can call .terminate() on the workerpool to kill all workers, and then create a new workerpool.

@wojtekKrol
Copy link
Author

@josdejong I would like to make that logic, that worker process logic inside it (or best repeat it N times), and after that it will be terminated and re-created (with reset N counter) automatically.

@josdejong
Copy link
Owner

I think what you can do is create a little wrapper function around your workerpool that:

  • creates a workerpool instance
  • keeps track on the number of executed tasks (divided by the number of workers to get your N tasks per worker)
  • once the max executed tasks is reached, gracefully shutdown the pool and create a new one

There is no support for terminating a single worker, but this would terminate all of them and re-create them once in a while to solve the memory leaks issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants