Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Page.reload() Destroys execution context on certain websites #10435

Closed
1 of 2 tasks
Vasile-Peste opened this issue Jun 22, 2023 · 11 comments · Fixed by #12119
Closed
1 of 2 tasks

[Bug]: Page.reload() Destroys execution context on certain websites #10435

Vasile-Peste opened this issue Jun 22, 2023 · 11 comments · Fixed by #12119
Assignees

Comments

@Vasile-Peste
Copy link

Vasile-Peste commented Jun 22, 2023

Bug expectation

We are trying to visit a webpage, reload it and query a selector.
On certain websites, a "Execution context was destroyed" error is thrown as soon as $ query selector is used after the page reload.

Bug behavior

  • Flaky
  • PDF

Minimal, reproducible example

import * as Puppeteer from "puppeteer";

const browser = await Puppeteer.launch({
    ignoreHTTPSErrors: true,
    headless: false,
    devtools: true,
    waitForInitialPage: false,
});

const page = await browser.newPage();

await page.goto("https://ilsole24ore.com", {
    waitUntil: "load",
});

await page.reload();

const element = await page.$(".test");

// Error: Execution context was destroyed, most likely because of a navigation.
// Happens for "https://ilsole24ore.com"
// Instead if we try a different websites, like "https://aranzulla.it" it doesn't happen

Error string

Error: Execution context was destroyed, most likely because of a navigation

Puppeteer configuration

No response

Puppeteer version

20.7.3

Node version

20.3.0

Package manager

npm

Package manager version

9.6.7

Operating system

macOS

@github-actions
Copy link

github-actions bot commented Jun 22, 2023

This issue was not reproducible. Please check that your example runs locally and the following:

  • Ensure the script does not rely on dependencies outside of puppeteer and puppeteer-core.
  • Ensure the error string is just the error message.
    • Bad:

      Error: something went wrong
        at Object.<anonymous> (/Users/username/repository/script.js:2:1)
        at Module._compile (node:internal/modules/cjs/loader:1159:14)
        at Module._extensions..js (node:internal/modules/cjs/loader:1213:10)
        at Module.load (node:internal/modules/cjs/loader:1037:32)
        at Module._load (node:internal/modules/cjs/loader:878:12)
        at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)
        at node:internal/main/run_main_module:23:47
    • Good: Error: something went wrong.

  • Ensure your configuration file (if applicable) is valid.
  • If the issue is flaky (does not reproduce all the time), make sure 'Flaky' is checked.
  • If the issue is not expected to error, make sure to write 'no error'.

Once the above checks are satisfied, please edit your issue with the changes and we will
try to reproduce the bug again.


Analyzer run

@OrKoN
Copy link
Collaborator

OrKoN commented Jun 22, 2023

minimal repro:

import * as puppeteer from "puppeteer";

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://ilsole24ore.com", {
  waitUntil: "load",
});
await page.reload();

const element = await page.$(".test");

@OrKoN
Copy link
Collaborator

OrKoN commented Jun 22, 2023

@jrandolf could you please take a look?

@jrandolf
Copy link
Contributor

jrandolf commented Jun 29, 2023

If you use domcontentloaded instead of load, the above script will work, however this is a Puppeteer bug.

The underlying problem here is that the reload and goto wait on the same events. Some of the events of your goto are leaking to your reload call because you only wait for the goto to load your page. This causes reload to think it is finished when in reality the events are from the goto. This causes a (sort of) race condition.

@OrKoN I think we can solve this by mutexing the navigation. More precisely, we can do something like this in every navigation function:

declare const mutex: Mutex;

let resolve;
const promise = new Promise(res => resolve = res);
void mutex.run(async () => {
  // Run navigation code
  // Wait for specified event to finish.
  resolve();
  // Wait for remaining events to finish.
});
return promise;

I don't think we have another option since those navigation events cannot be "marked" (hence bucketed).

@OrKoN
Copy link
Collaborator

OrKoN commented Jun 29, 2023

// Wait for remaining events to finish.

I am not sure that would work as we also have networkidle events on the list of supported event that might take a long time to occur (possibly infinitely). If we do that, we basically turn every navigation into a waitUntil: ["networkidle2"] (i.e. the next nav would treat the previous as if it was a networkidle2)

Why are we trying to do something on the destroyed context here? I think we might need to think of some other solution.

@jrandolf
Copy link
Contributor

jrandolf commented Jun 29, 2023

I am not sure that would work as we also have networkidle events on the list of supported event that might take a long time to occur (possibly infinitely)

Actually the only event that may require waiting is domcontentloaded. Once domcontentloaded happens, we can perform other navigations.

Why are we trying to do something on the destroyed context here?

Because the reload finishes prematurely (see my above comment), the page.$ executes immediately before the page has had a chance to change it's execution context.

@OrKoN
Copy link
Collaborator

OrKoN commented Jun 29, 2023

@jrandolf but load event assumes waiting for domcontentloaded as load should always happen after domcontentloaded.

@OrKoN
Copy link
Collaborator

OrKoN commented Jun 29, 2023

so

await page.goto("https://ilsole24ore.com", {
    waitUntil: "load",
});
// events received at this point should be: ['domcontentloaded', 'load'].
// reload should reset received events.
await page.reload();
// should clear the received events before this point and start waiting for a fresh pair of ['domcontentloaded', 'load']

@jrandolf
Copy link
Contributor

In CDP it shows that domcontentloaded happens after load. In particular, what happens (if you remove page.reload) is the load happens, then the query, then the domcontentloaded. From looking at MDN, these events look independent.

Independent of this though, if you wait for both load and domcontentloaded, the script fails once more which shouldn't be the case. I'll take a deeper look.

@OrKoN
Copy link
Collaborator

OrKoN commented Jun 30, 2023

The DOMContentLoaded event fires when the initial HTML document has been completely loaded and parsed, without waiting for stylesheets, images, and subframes to finish loading.
The load event is fired when the whole page has loaded, including all dependent resources such as stylesheets, scripts, iframes, and images. This is in contrast to DOMContentLoaded, which is fired as soon as the page DOM has been loaded, without waiting for resources to finish loading.

IMO the DOMContentLoaded almost always comes before the load (especially if the page has resources). It might be that Puppeteer inserts async handling when processing it? Do you have a CDP log for this goto/reload part? We can take a look offline together.

@jrandolf
Copy link
Contributor

jrandolf commented Feb 2, 2024

A functional repro:

import * as Puppeteer from "puppeteer";

const browser = await Puppeteer.launch({
    ignoreHTTPSErrors: true,
    headless: false,
    devtools: true,
});

const page = await browser.newPage();

await page.goto("https://ilsole24ore.com", {
    waitUntil: "load",
});

await page.reload();

const element = await page.evaluate(() => document);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants