Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated setContent is 50x slower without goto('about:blank') in between than with it #3665

Closed
phiresky opened this issue Dec 13, 2018 · 2 comments
Labels
bug chromium Issues with Puppeteer-Chromium

Comments

@phiresky
Copy link

I'm using puppeteer just to extract some information from html, setting the content using page.setContent(). This is extremely slow, even though I never navigate to an actual website.

Steps to reproduce

  • Puppeteer version: 1.11.0
  • Platform / OS version: Ubuntu 18.04
  • Node.js version: v11.3.0

Minimal example:

import chrome, { Page } from "puppeteer"

const content = `
<div>
    ${Array(1000)
      .fill("<p>test</p>")
      .join("\n")}
</div>`

const count = 100

async function testNoBlank(tab: Page) {
  for (let i = 0; i < count; i++) {
    await tab.setContent(content)
    const ret = await tab.$$eval("p", ps => ps.map(p => p.textContent))
    console.assert(ret.length === 1000)
  }
}
async function testBlank(tab: Page) {
  for (let i = 0; i < count; i++) {
    await tab.setContent(content)
    const ret = await tab.$$eval("p", ps => ps.map(p => p.textContent))
    console.assert(ret.length === 1000)
    await tab.goto("about:blank")
  }
}

async function both() {
  const browser = await chrome.launch({ headless: true })
  const [tab] = await browser.pages()
  console.time("blank")
  await testBlank(tab)
  console.timeEnd("blank")
  await tab.goto("about:blank")
  console.time("no blank")
  await testNoBlank(tab)
  console.timeEnd("no blank")

  await browser.close()
}

both()

Output:

blank: 3568.199ms
no blank: 191955.917ms
Done in 196.83s.

Without goto("about:blank") in between setContent calls it is 50x! slower.

I found this by accident. If this can't be fixed, it should at least be mentioned in the docs, since setContent seems to be the canonical method to load html from a string.

@phiresky phiresky changed the title Repeated setContent is 50x slower without goto('about:blank') in between than without Repeated setContent is 50x slower without goto('about:blank') in between than with it Dec 13, 2018
aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Dec 13, 2018
@aslushnikov aslushnikov added bug chromium Issues with Puppeteer-Chromium labels Dec 13, 2018
@aslushnikov
Copy link
Contributor

Can reproduce with this:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({ headless: true })
  const [tab] = await browser.pages()

  console.time('time');
  for (let i = 0; i < 20; ++i)
    await tab.setContent('<div></div>');
  console.timeEnd('time');
  await browser.close()
})();

@devnyxie
Copy link

Hey! I have exactly the same issue, can't provide the code since it's not open-sourced tho but it's almost the same as in example above.
I pass the Page to one function, and if page is not redirected to 'about:blank' first and I use .setContent() - it will cause an infinite loading and reach default 30000ms timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug chromium Issues with Puppeteer-Chromium
Projects
None yet
Development

No branches or pull requests

3 participants