Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

page.goto returns null periodically #2479

Closed
ntzm opened this issue Apr 30, 2018 · 28 comments
Closed

page.goto returns null periodically #2479

ntzm opened this issue Apr 30, 2018 · 28 comments
Assignees
Labels

Comments

@ntzm
Copy link
Contributor

ntzm commented Apr 30, 2018

Steps to reproduce

Similar/same as: #1391

Tell us about your environment:

What steps will reproduce the problem?

const browser = await puppeteer.launch();

const page = await browser.newPage();

for (var i = 0; i < 15; i++) {
  const r = await page.goto('https://www.microsoft.com/en-gb/store/d/xbox-one-s-1tb-console-playerunknowns-battlegrounds-bundle/908z9jn5cnh2/gz4w?cid=msft_web_collection', { waitUntil: 'domcontentloaded' });

  console.log(r.ok());
}

await browser.close();

What is the expected result?

true outputted 15 times

What happens instead?

Sometimes true 15 times, sometimes TypeError: Cannot read property 'ok' of null

It works as expected on 1.2.0, but fails on 1.3.0

@aslushnikov
Copy link
Contributor

@ntzm Thanks for filing this separately, I can easily repro this.

You can workaround using waitUntil: "networkidle0" command while this is being fixed.

@aslushnikov aslushnikov added the bug label May 1, 2018
@aslushnikov aslushnikov self-assigned this May 1, 2018
@ntzm
Copy link
Contributor Author

ntzm commented May 1, 2018

Thanks!

@peterbe
Copy link

peterbe commented May 29, 2018

Here's an even simpler way to reproduce it:

const puppeteer = require("puppeteer");
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // const Url = "https://www.peterbe.com/#anything";  // FAILS
  const Url = "https://www.peterbe.com#anything";  // ALSO FAILS
  // const Url = "https://www.peterbe.com";  // WORKS
  const r1 = await page.goto(Url, { waitUntil: "domcontentloaded" });
  if (r1 === null) console.log("Response is null!");
  else console.log("Response OK?", r1.ok());

  const r2 = await page.goto(Url, { waitUntil: "domcontentloaded" });
  if (r2 === null) console.log("Response is null!");
  else console.log("Response OK?", r2.ok());

  await browser.close();
})();

Output is:

Response OK? true
Response is null!

Every time.

Note that removing the anchor part of the Url variable value "solves" the problem.

Note, I don't think it's related to https://www.peterbe.com/... specifically. The killer difference is the anchor link or not.

@peterbe
Copy link

peterbe commented May 29, 2018

Actually, regarding my above example of reproducing it. If you comment out the first request (or second, doesn't matter) so it looks like this:

const puppeteer = require("puppeteer");
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // const Url = "https://www.peterbe.com/#anything";  // FAILS
  const Url = "https://www.peterbe.com#anything";  // ALSO FAILS
  // const Url = "https://www.peterbe.com";  // WORKS
  const r1 = await page.goto(Url, { waitUntil: "domcontentloaded" });
  if (r1 === null) console.log("Response is null!");
  else console.log("Response OK?", r1.ok());

  // const r2 = await page.goto(Url, { waitUntil: "domcontentloaded" });
  // if (r2 === null) console.log("Response is null!");
  // else console.log("Response OK?", r2.ok());

  await browser.close();
})();

Then it works.

So, pretty sure the response becomes null if the URL has been requested before and contains an anchor part.

@bluepeter
Copy link

bluepeter commented Jul 26, 2018

@peterbe I am curious if you also got this error when using it on a URL that does not have (or which you intercept via Puppeteer) Google Analytics, Intercom, etc?

I am also experiencing this issue: that is: when setRequestInterception(true) and I crawl two URLs on the same site, the 2nd URL will return a null response. If I block/remove any iframe creating Javascript, everything works. See also this comment.

@ntzm
Copy link
Contributor Author

ntzm commented Jul 26, 2018

@bluepeter I had a similar issue trying to use an unsupported version of Chrome

@eknkc
Copy link

eknkc commented Jul 26, 2018

This happens for us too, when using setRequestInterception(true) only.

Interestingly, using a context (ie createIncognitoBrowserContext) and setRequestInterception together resolves the issue.

@bluepeter
Copy link

bluepeter commented Jul 26, 2018

Here's a bit of a workaround until this is fixed:

let fullResponse = await chromePage.goto("https://wherever.com");
if (fullResponse === null) {
  console.log("Got null, trying wait.");
  fullResponse = await chromePage.waitForResponse(() => true);
}

@ntzm
Copy link
Contributor Author

ntzm commented Sep 3, 2018

@bluepeter that workaround works for me, thanks a lot

@aslushnikov
Copy link
Contributor

@ntzm somehow it doesn't reproduce for me anymore on v1.8.0. Here's my script:

const puppeteer = require('puppeteer');
const url = 'https://www.microsoft.com/en-gb/store/d/xbox-one-s-1tb-console-playerunknowns-battlegrounds-bundle/908z9jn5cnh2/gz4w?cid=msft_web_collection';
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  for (var i = 0; i < 30; i++) {
    const r = await page.goto(url, { waitUntil: 'domcontentloaded' });
    console.log(!!r);
  }
  await browser.close();
})();

@peterbe I looked into your script. We're actually quite consistent with chromium here:

  • the first time you navigate to https://example.com, a network is hit and a new page is loaded
  • the next time you navigate to https://example.com#asdf, a new page is not loaded - inside, an anchor navigation is happenning, so chromium doesn't hit network.

This might be surprising, but this is consistent with what happens if user types in the address bar. if you want to force-load a website, I'd open a new page and navigate it instead.

@ntzm
Copy link
Contributor Author

ntzm commented Sep 7, 2018

Yep, I can confirm this works on 1.8.0!

@alphonse92
Copy link

alphonse92 commented Sep 10, 2018

Still isnt working for me.

Version: 1.8.0

image

image

image

@aslushnikov
Copy link
Contributor

@alphonse92 can you share the URL that doesn't work?

@alphonse92
Copy link

@aslushnikov Hi, sorry, it was my fault, the problem was the ssl certificates and chromium is killing the request.

@gsouf
Copy link

gsouf commented Oct 3, 2019

@aslushnikov still broken with 1.20.

This returns null everytimes

const puppeteer = require('puppeteer');
const url = 'https://havgaarden.dk/restaurant';
(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    const response = await page.goto(url, {
        timeout: 15000,
        waitUntil: 'domcontentloaded'
    });
    if (!response) console.log('null response');
    await browser.close();
})();

Note that this website features a redirection initiated by javascript

@jtara1
Copy link

jtara1 commented Oct 9, 2019

is it possible the server just ends the connection earlier than expected? something like https://httpstatuses.com/444 but more subtle or silent?
e: fixed by setting these in config in our wrapper for puppeteer

      trustChromeNativeRequest: true,
      enableRequestInterception: false,

@gsouf
Copy link

gsouf commented Oct 9, 2019

@jtara1 that sounds unlikely, for me it happens everytime that a page has a redirection initiated by javascript after the event domcontentloaded is triggered

ghost pushed a commit to EU-EDPS/website-evidence-collector that referenced this issue Mar 4, 2020
UnhandledPromiseRejectionWarning: TypeError: Cannot read property 
'request' of null
    at 
/opt/inspection/website-evidence-collector/website-evidence-collector.js:196:40
    at process._tickCallback (internal/process/next_tick.js:68:7)

Solution from:
puppeteer/puppeteer#2479 (comment)
@lanwin
Copy link

lanwin commented May 14, 2020

It seems this problem has something todo with caching. When I create a createIncognitoBrowserContext for every request. This never happens. Since I removed that extra context. This Problem happened also to me.
So my guess is that now the second request is fullfilled via cache and so there is no response. This is also why the chromePage.waitForResponse(() => true); works. Since it waits for the cache refresh request.

@alijaya
Copy link

alijaya commented May 27, 2020

It seems this problem has something todo with caching. When I create a createIncognitoBrowserContext for every request. This never happens. Since I removed that extra context. This Problem happened also to me.
So my guess is that now the second request is fullfilled via cache and so there is no response. This is also why the chromePage.waitForResponse(() => true); works. Since it waits for the cache refresh request.

wow... it's working... it's weird that we need to make new context for every request... I hope it will be fixed soon

@lanwin
Copy link

lanwin commented May 28, 2020

It seems this problem has something todo with caching. When I create a createIncognitoBrowserContext for every request. This never happens. Since I removed that extra context. This Problem happened also to me.
So my guess is that now the second request is fullfilled via cache and so there is no response. This is also why the chromePage.waitForResponse(() => true); works. Since it waits for the cache refresh request.

wow... it's working... it's weird that we need to make new context for every request... I hope it will be fixed soon

You dont need the context. Just call page.setCacheEnabled(false). From my limited tests, it seems to do the same.

@alijaya
Copy link

alijaya commented May 28, 2020

I see... I'll try it

@alijaya
Copy link

alijaya commented May 28, 2020

ah yes... I can confirm it's working thanks! :D

@alijaya
Copy link

alijaya commented May 28, 2020

wait... I'm mistaken, it's not working with page.setCacheEnabled(false)

@stevenlafl
Copy link

stevenlafl commented Jun 19, 2020

It is not working with page.setCacheEnabled(false) alone but a combination from above:

  page.setCacheEnabled(false)
  await page.setDefaultNavigationTimeout(0);
  response = await page.goto(url, {waitUntil: "networkidle0"});

  if (response === null) {
    response = await page.waitForResponse(() => true);
  }

yields the perfect results. I'm afraid to change it now, but some of that, such as networkidle0 may be unnecessary. But again, networkidle0 did not work alone, and neither did disabling the cache.

Thanks to all in this thread.

@Schaka
Copy link

Schaka commented Jan 11, 2021

if (response === null) {
    response = await page.waitForResponse(() => true);
}

This will return the first response you're getting once interception starts. This will NOT return the intial request you're actually looking. So if you actually need that request, you're out of luck unless you use an incognito tab.

@alfeugds
Copy link

In my case I was accessing different pages of an SPA where the pages were after the hash symbol. There's a footnote in puppeteer documentation about goto that says the following:

"page.goto either throws an error or returns a main resource response. The only exceptions are navigation to about:blank or navigation to the same URL with a different hash, which would succeed and return null."

What I did as a temporary workaround was to force the page to go somewhere else (e.g. 'about:blank') and then 'goto' the page I wanted to test. Something like this:

if (response === null) {
    await page.goto('about:blank');
    response = await page.goto(urlWithHash);
}

This doesn't seem optimal since it downloads the entire page again but the crawling system I'm working on depends on the response data for every single page, so that works.

@herberthobregon
Copy link

In my case the service-worker is the issue because sw server the index.html and get null response

The first time it works, the second time it already returns null. You have to delete the chrome data to remove the SW

const puppeteer = require('puppeteer');
const url = 'https://whatpwacando.today/';
(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    const response = await page.goto(url, {
        timeout: 15000,
        waitUntil: 'domcontentloaded'
    });
    if (!response) console.log('null response');
    await browser.close();
})();

framirezsandino pushed a commit to framirezsandino/headless-chrome-crawler that referenced this issue Nov 10, 2021
framirezsandino pushed a commit to framirezsandino/headless-chrome-crawler that referenced this issue Nov 10, 2021
garzj added a commit to garzj/d4sd that referenced this issue Apr 22, 2022
@dgtlmoon
Copy link

FWIW, If you could have an issue due to service workers, over at playwright they have the ability to block the service worker registration (but maybe the real issue is that page.goto still returns the wrong 'frame')

https://github.com/microsoft/playwright/pull/14321/files#diff-5e6d61431da6c4bb6fda4276ba5781ae64e98537d9f68c9c51a465e1fcb4c6c2R125

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests