Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we increase default timeout? #20

Closed
rgaudin opened this issue Oct 7, 2020 · 2 comments · Fixed by #24
Closed

Should we increase default timeout? #20

rgaudin opened this issue Oct 7, 2020 · 2 comments · Fixed by #24
Labels

Comments

@rgaudin
Copy link
Member

rgaudin commented Oct 7, 2020

Scraping fondamentaux, I got a lot of timeouts on pages (169!) with the default 30s timeout.

@ikreymer could you please give us more details on the implication of timeout:

  • What does it apply to? await page.goto(url, {waitUntil, timeout}); implies it's the time for the page to load (that's the default waitUntil value). Is this the time before the document loaded event?
  • What we be the consequences of increasing it?
  • How are non-page requests affected? Like video files for example.

My initial though would be to use a large value here to capture long pages and other pages would still load and be processed fast.

@ikreymer
Copy link
Collaborator

ikreymer commented Oct 8, 2020

  • What does it apply to? await page.goto(url, {waitUntil, timeout}); implies it's the time for the page to load (that's the default waitUntil value). Is this the time before the document loaded event?

Yes, the timeout applies to the various 'modes' of the goto function.
It could wait for either just the dom loading, or until there is no more network traffic.
The options are better explained here:
https://github.com/puppeteer/puppeteer/blob/v5.3.1/docs/api.md#pagegotourl-options

  • What we be the consequences of increasing it?

It depends. Maybe its good to just set it to 1 min. For sites that load quickly, there should be no affect.

  • How are non-page requests affected? Like video files for example.

They would not be included in this.. There will be separate logic, part of #9, to handle waiting for video playing.

Also, the timeouts could be for any number of reasons.. for example, I've seen more timeouts when too many browsers are running and there is not enough CPU..

@rgaudin
Copy link
Member Author

rgaudin commented Oct 8, 2020

OK, thanks for the details. I think we should increase the default then. Maybe 90s?

I think we shall also document in the --scroll option help its custom timeout.

You mentioned the separate behavior for video but what about non-video resources? Images, or large non-media files like JS or WASM files? Is there a timeout at play there?

Thanks for pointing to the concurrency issue ; indeed that must be taken into consideration when writing recipes. Network is also affected although I understand for most cases, bottleneck would be CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants