Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overcome Cloudflare's blocking: Please turn JavaScript on and reload the page. #7006

Closed
suntong opened this issue Mar 19, 2021 · 16 comments
Closed

Comments

@suntong
Copy link

suntong commented Mar 19, 2021

Steps to reproduce

Tell us about your environment:

  • Puppeteer version: 8.0.0
  • Platform / OS version: Ubuntu 20.04 LTS
  • URLs (if applicable):
  • Node.js version: v10.21.0

What steps will reproduce the problem?

    const response = await page.goto(`https://finviz.com/news.ashx`, {
      waitUntil: 'networkidle0'
    });
    console.log(await page.content());

What is the expected result?

Get normal page

What happens instead?

Got,

Please turn JavaScript on and reload the page
Please enable Cookies and reload the page.
This process is automatic. Your browser will redirect to your requested content shortly.
Please allow up to 5 seconds…

            DDoS protection by <a rel="noopener noreferrer" href="https://www.cloudflare.com/5xx-error-landing/" target="_blank">Cloudflare</a>

The url has been working fine for ages, until it broke just a day or two before.

Any way to by-pass it? Thx

@suntong
Copy link
Author

suntong commented Mar 19, 2021

So many similar questions on SO that don't have an answer, however, I did find out an article

https://dev.to/rubengmurray/using-cookies-puppeteer-nodejs-to-mirror-a-chrome-profile-on-macos-1l6m

which seems quite promising...

@suntong
Copy link
Author

suntong commented Mar 20, 2021

it works

@suntong suntong closed this as completed Mar 20, 2021
@suntong
Copy link
Author

suntong commented Mar 20, 2021

Nope, only works when headless: false, :(

@suntong suntong reopened this Mar 20, 2021
@suntong
Copy link
Author

suntong commented Mar 20, 2021

The only change is from headless: true to headless: false and script would work magically. How strange!

Seems to me that Cloudflare is able to detect when I'm using headless: true and send the alternative blocking html, with Please turn JavaScript on and reload the page.

Even if I trigger page.reload myself won't help.

Any idea anything else I can try, to make headless: true working?

@mrceperka
Copy link

https://www.npmjs.com/package/puppeteer-extra-plugin-stealth might help... 🤞

@mathiasbynens
Copy link
Member

This is not a problem with Puppeteer but rather with the web page.

@suntong
Copy link
Author

suntong commented Apr 2, 2021

Thanks a lot @mrceperka for sharing that well-kept secret, really appreciate it!!! Horay!

@mtatarau90
Copy link

Hi @suntong
You found any solution to bypass this "issue" ? I encountered the same issue and puppeteer-extra is not enough.

Thanks

@suntong
Copy link
Author

suntong commented Aug 6, 2021

read my last comment again, @mtatarau90.
If you don't get it, read it again, and again, until you get it -- what my last sentence and those up-votes means.

That's all I want to say on this.

@garora1212
Copy link

following

@basetta
Copy link

basetta commented Sep 7, 2021

thanks @suntong and @mrceperka :D

@10111282
Copy link

10111282 commented Mar 4, 2022

https://www.npmjs.com/package/puppeteer-extra-plugin-stealth might help... 🤞

Worked for me to fix the issue
Screen Shot 2022-03-04 at 1 23 43 PM

@batrachophagous1
Copy link

this is happening to me as well

@yurist38
Copy link

Same here. Also experiencing this issue only in non-headless mode (even when using puppeteer extra + stealth plugin).

@abdellah711
Copy link

setting user agent using page.setUserAgent fixed the issue for me

@kishorek01
Copy link

With the new version of puppeteer tried this. This only happens when the puppeteer applies controller to the browser pages. The sites will work fine even in headless mode if you are using the default blank page the puppeteer creates without the controller. I tried it with the existing browser by connection wsendpoint. As i get the pages from the browser it applies all the default puppeteer values and controller to all the webpages . So we need to find out how cloudfare and other bot detection sites finds the controller from puppeteer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests