Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for lazy loaded elements when using the fullPage option #40

Merged
merged 15 commits into from Aug 10, 2020
35 changes: 35 additions & 0 deletions index.js
Expand Up @@ -128,6 +128,8 @@ const parseCookie = (url, cookie) => {
return ret;
};

const imagesHaveLoaded = () => [...document.images].map(element => element.complete);

const captureWebsite = async (input, options) => {
options = {
inputType: 'url',
Expand Down Expand Up @@ -321,6 +323,39 @@ const captureWebsite = async (input, options) => {
await options.beforeScreenshot(page, browser);
}

if (screenshotOptions.fullPage) {
// Get the height of the rendered page
const bodyHandle = await page.$('body');
const bodyBoundingHeight = await bodyHandle.boundingBox();
await bodyHandle.dispose();

// Scroll one viewport at a time, pausing to let content load
const viewportHeight = viewportOptions.height;
let viewportIncrement = 0;
while (viewportIncrement + viewportHeight < bodyBoundingHeight) {
const navigationPromise = page.waitForNavigation();
Copy link

@ghost ghost Apr 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this line wait for networkidle instead of the default loaded? I assume this promise will be awaited to let the images load, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably agree with you, but care to clarify?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably agree with you, but care to clarify?

Waiting for network idle allows lazy elements to load (elements loaded after the "load" event of the document). I suggest doing as follows:

  1. Wait "load" event of the document.
  2. Scroll the page to its maximum Y (this will load lazy elements like lazy loaded images).
  3. Wait for network idle.

Copy link

@ghost ghost Apr 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also worth to keep in mind that some (old) websites can actually never fulfill the requirements for the network idle navigation event - for example frequent chat requests.

In the end, the propsed flow above should work just fine (most cases).

/* eslint-disable no-await-in-loop */
await page.evaluate(_viewportHeight => {
/* eslint-disable no-undef */
window.scrollBy(0, _viewportHeight);
/* eslint-enable no-undef */
}, viewportHeight);
await navigationPromise;
/* eslint-enable no-await-in-loop */
viewportIncrement += viewportHeight;
}

// Scroll back to top
await page.evaluate(_ => {
/* eslint-disable no-undef */
window.scrollTo(0, 0);
/* eslint-enable no-undef */
});

// Some extra delay to let images load
await page.waitForFunction(imagesHaveLoaded, {timeout: 60});
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the user would expect it to take up to one minute. If we keep this, it needs to be documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also be added as an argument with default value set to timeout 60. Might want to comment on this or implement some other better functionality? Maybe it's also worth looking into alternate functions, such as the waitForNavigation Vasile mentioned earlier.

This

await wait(100);

is not good.

You should wait actually for images and other snippets to load, not giving them 100ms...
The waitForNavigation method waiting for a network idle should be used instead.

In case it might be useful for documentation purposes; I retrieved part of the solution from https://stackoverflow.com/questions/51651830/how-to-wait-for-all-images-to-load-from-page-evaluate-function-in-puppeteer-when

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would look into something better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using page.waitForNavigation as per pointed by #40 (review)

}

const buffer = await page.screenshot(screenshotOptions);

await page.close();
Expand Down