headless-cluster is a fork of the renowned puppeteer-cluster library, designed to streamline and optimize the process of managing multiple puppeteer instances concurrently. This project enhances the core functionalities of puppeteer-cluster by providing proxy support and integrating the latest features of Puppeteer.
Headless-cluster enables authenticated proxy support. Pass a data object to cluster.execute containing proxy settings (contextOptions) and authentication credentials (authentication). Retrieve these in your task callback and use page.authenticate to set username and password. See the example code in examples/execute-proxy.js.
// Create a cluster with 2 workers
// You can also use Cluster.CONCURRENCY_BROWSER
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 2,
});
// Define a task
await cluster.task(async ({ page, data }) => {
try {
await page.goto(data.url);
} catch (err) {
console.log(err);
return 'Failed to load the page';
}
const pageTitle = await page.evaluate(() => document.title);
return pageTitle;
});
// Use try-catch block as "execute" will throw instead of using events
try {
// Execute the tasks one after another via execute
let data = { contextOptions: {'proxyServer': 'http://localhost:3128'}, url: 'https://www.google.com',
authentication: { username: 'foobar', password: 'Ya4zAzj8i' }};
console.log(data);
const result1 = await cluster.execute(data);
console.log(result1);
const result2 = await cluster.execute({ url: 'https://www.wikipedia.org'});
console.log(result2);
} catch (err) {
// Handle crawling error
}
// Shutdown after everything is done
await cluster.idle();
await cluster.close();