Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudflare blocking scraping #46

Open
davidelionetti opened this issue Mar 29, 2021 · 40 comments
Open

Cloudflare blocking scraping #46

davidelionetti opened this issue Mar 29, 2021 · 40 comments

Comments

@davidelionetti
Copy link

Hello, I have just started using this library and all seems to be correctly set up. I ran python blinkistscraper email password with my credentials and Cloudflare unfortunately detects (I assume) an automated activity and blocks me from navigating to Blinkist.com on the browser instance that got opened by the script.

Any ideas?

@Riviss
Copy link

Riviss commented Mar 29, 2021

I think this issue may be the same as the "Captha taking longer than expected." Take a look to see if your problem is the same and if so, hopefully one of the solutions posted there will work for you.

@klochden
Copy link

klochden commented Apr 25, 2021

Yes, cloudflare definitely detects an unusual activity and you land in an endless cycle of captchas. No matter what I'm trying, it don't let me through.

@vongyver
Copy link

I am experiencing the same thing as klochden. I have tried to adjust, even disable UBlock with no success. If there is any way I can assist with debugging or testing, let me know.

@rocketinventor
Copy link
Contributor

@klochden @vongyver

Are you running chrome in headless mode when you face this issue?

@vongyver
Copy link

I get the chrome app popup and complete the captcha, but keep failing. I have even tried to disable the ublock in various ways. I have also signed into Blinkist on regular chrome, which I can, but the script still fails. Happy to test any specifics. Thank you!

@johndoe-dev00
Copy link
Contributor

FYI: uBlock can be disabled using the --no-ublock switch.

I also got the cloudflare captcha loop. This seems to be new.
Currently this workaround seems to be working for me:
In scraper.py change from seleniumwire import webdriver to from selenium import webdriver
This fixes the cloudflare issue, but this will not allow you to download the audio files, as that part requires seleniumwire,
everything else should work, though.
Let me know if this allows you to login.
Will look into a fully functioning fix.

@klochden
Copy link

klochden commented Apr 27, 2021 via email

@klochden
Copy link

klochden commented Apr 27, 2021 via email

@klochden
Copy link

FYI: uBlock can be disabled using the --no-ublock switch.

I also got the cloudflare captcha loop. This seems to be new.
Currently this workaround seems to be working for me:
In scraper.py change from seleniumwire import webdriver to from selenium import webdriver
This fixes the cloudflare issue, but this will not allow you to download the audio files, as that part requires seleniumwire,
everything else should work, though.
Let me know if this allows you to login.
Will look into a fully functioning fix.

Yes it worked now Ind I was able to log in without an issue. Also the SSL Certificate was active, what is an important thing for Cloudflare I think! But got only a JSON Text file, no audio. Hopefully someone can recover the main functions. To download everything complete.
Thanks to All!

@klochden
Copy link

FYI: uBlock can be disabled using the --no-ublock switch.

I also got the cloudflare captcha loop. This seems to be new.
Currently this workaround seems to be working for me:
In scraper.py change from seleniumwire import webdriver to from selenium import webdriver
This fixes the cloudflare issue, but this will not allow you to download the audio files, as that part requires seleniumwire,
everything else should work, though.
Let me know if this allows you to login.
Will look into a fully functioning fix.

By the way, with the chrome addon "Audio Downloader Prime" I could manually download the audio files without an issue. Maybe there is a possibility to implement an automated solution?

@rocketinventor
Copy link
Contributor

rocketinventor commented Apr 28, 2021

I've looked into this issue a little bit...

The project is using an old version of seleniumwire (2.1.2 vs the newest version: 4.2.4). This could be part of why Cloudflare is having so many issues with it. If the package is upgraded then we can fix a lot of issues and take advantage of new features.

For example:

4.1.1 (2021-02-26)
Integration with undetected-chromedriver.

Also, we might be able to remove the seleniumwire/mitmproxy requirement completely by using Chrome Devtools Protocol, directly.

I will try to look into those two things.

@jonaschn
Copy link
Contributor

Manually using the privacy-pass extension makes scraping audio, e.g., of the daily book, possible again because you get 30 passes when solving 1 captcha.

@vongyver
Copy link

vongyver commented Jun 4, 2021

Manually using the privacy-pass extension makes scraping audio, e.g., of the daily book, possible again because you get 30 passes when solving 1 captcha.

I am able to add privacy-pass to my regular chrome and add the 30 passes. When I run the scraper it does not appear in the dev-tools instance and I am still being asked to deal with the captchas that is still circular. How do we add the privacy-pass into the dev-tools instance. Thanks!

@jonaschn
Copy link
Contributor

jonaschn commented Jun 4, 2021

I did not automate this process but increased the time allowed for solving the captcha and then manually installed privacy-pass in the chrome instance opened when running the scraper. For now, this needs to be done every time the scraper is run.
But this can definitely be automatized similar to the ublock extension.
Maybe @leoncvlt or someone else has some time to automate this process.

@vongyver
Copy link

vongyver commented Jun 5, 2021 via email

@ilearnio
Copy link

ilearnio commented Jun 21, 2021

Same issue. Resolving captcha brings another captcha and so on, so can't get this script to work

@hxh103
Copy link

hxh103 commented Jul 13, 2021

@vongyver you can change scraper.py (currently line 180) : WebDriverWait(driver, 60) --> WebDriverWait(driver, 360) that will change it from 60 seconds to 360 seconds.

but I did the privacy-pass method from @jonaschn and it does not work for me.

changed to selenium and it works minus not being able to download the audio, which is a huge bummer. Hopefully, this gets fixed soon.

Thanks for the feedback. I poked around, however I have no idea how to add privacy-pass in the chrome instance or increase the time. I am not really a developer, more a hack. I know my limits. All good. I hope that leoncvlt is able to fix it soon.

On Fri, Jun 4, 2021 at 2:00 AM Jonathan Schneider @.***> wrote: I did not automate this process but increased the time allowed for solving the captcha and then manually installed privacy-pass in the chrome instance opened when running the scraper. For now, this needs to be done every time the scraper is run. But this can definitely be automatized similar to the ublock extension. Maybe @leoncvlt https://github.com/leoncvlt or someone else has some time to automate this process. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASTLST4YDAM3OHPE7JZL3LTRCBZ5ANCNFSM42ARMJMQ .

@vongyver
Copy link

vongyver commented Jul 13, 2021 via email

@hxh103
Copy link

hxh103 commented Jul 13, 2021

It was more to give you enough time to manually add privacy-pass extension to that instance of chrome before the timeout that was first suggested. Anyways, that did not work for me and I assume it would not for you either.

No luck on that change. I didn't think it was the 60 sec limit, I am resolving two sets of images within 20 seconds. I get the classic bicycle or boat and after completing it, the session flips to the blinkist login screen and then back to the "not a robot" checkbox and image sets again, repeatedly. It does not look like it's trying to enter the passed credentials. Just to clarify, I have confirmed that I do have the latest version, having cloned fresh a couple of times. Just a note, I had a password with an "&" in it and had trouble passing that to the blinkistscraper, so I changed the password. I had no luck with adding privacy-pass either. Thanks for the recommendation. Happy to test what's offered.

On Mon, Jul 12, 2021 at 8:04 PM hxh103 @.> wrote: @vongyver https://github.com/vongyver you can change scraper.py (currently line 180) : WebDriverWait(driver, 60) --> WebDriverWait(driver, 360) that will change it from 60 seconds to 360 seconds. but I did the privacy-pass method from @jonaschn https://github.com/jonaschn and it does not work for me. changed to selenium and it works minus not being able to download the audio, which is a huge bummer. Hopefully, this gets fixed soon. Thanks for the feedback. I poked around, however I have no idea how to add privacy-pass in the chrome instance or increase the time. I am not really a developer, more a hack. I know my limits. All good. I hope that leoncvlt is able to fix it soon. … <#m_1893796245102071081_> On Fri, Jun 4, 2021 at 2:00 AM Jonathan Schneider @.> wrote: I did not automate this process but increased the time allowed for solving the captcha and then manually installed privacy-pass in the chrome instance opened when running the scraper. For now, this needs to be done every time the scraper is run. But this can definitely be automatized similar to the ublock extension. Maybe @leoncvlt https://github.com/leoncvlt https://github.com/leoncvlt or someone else has some time to automate this process. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46 (comment) <#46 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASTLST4YDAM3OHPE7JZL3LTRCBZ5ANCNFSM42ARMJMQ . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASTLSQXC4NSSSJDP27XWMTTXONLDANCNFSM42ARMJMQ .

@hxh103
Copy link

hxh103 commented Jul 13, 2021

So I found a solution that worked for me. it requires a bit of manual work but it downloads audio now. at least for me, the problem seems to be in the user-agent and the version of selenium-wire identified by @rocketinventor

So this worked for me:

  1. Made a new environment in Anaconda (just to be sure there are not other incompatibility issues)
  2. install all the required packages manually using pip (chromedriver-autoinstaller colorama EbookLib requests selenium selenium-wire). don't use the requirement.txt as it will force versioning for you. I tried to just change the user-agent, but it didn't work for me without updating the packages in a fresh environment, but I didn't look into it much more. I also tried to not change the User-agent and only use the new package version - this also did not work for me.
  3. clone the repo
  4. Change line 180 in scraper.py to allow time to manually install extension: WebDriverWait(driver, 60) --> WebDriverWait(driver, 360)
  5. run the scaper as you normally would in command line
  6. install user-agent switching: https://chrome.google.com/webstore/detail/user-agent-switcher-for-c/djflhoibgkdhkhhcedjiklpkjnoahfmg/related
  7. click on user-agent extension to change your user agent to something else (like safari).
  8. refresh blinkist page, it shouldn't force you to Cloudflare anymore. at least until cloudflare changes something lol

So if it's only changing user-agent, this should easily be implemented in the chrome options. Or can save annoyance of manual extension installation by adding this extension like how it is implemented with u-block.

the scraping script so far works in the updated packages but I haven't done any extensive testing. I didn't need privacy-pass extension, but if above doesn't work for you then you try to manually install to check.

@kotobuki09
Copy link

kotobuki09 commented Jul 13, 2021

Your method is working perfectly for me as well! Thank you for keeping this work
For some of the audio, I got this error, but it seems like the majority is working fine.
ERROR Request timed out or other unexpected error: HTTP Error 401: Unauthorized ERROR Error processing audio url, aborting audio scrape...

@hxh103
Copy link

hxh103 commented Jul 19, 2021

Due to this error happening to me all the time #58, I got annoyed with having to reinstall user-agent every morning. I implemented 2 options to change user-agent. Unfortunately, both require manual clicking, but less work than the above solution.

  1. change user-agent at start: Add the following line in scraper.py (I added in line 88).
chrome_options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A")

This will change it to a Safari user-agent. If this user-agent gets flagged by Cloudflare, then just change it to another user-agent. I think anything other than a chrome user-agent should work. This option always required me to do the captcha at least once so a little bit annoying. I tried option 2 below to see if I could get around solving captcha.

  1. load extension at start: download the user-agent extension as a .crx file (google if if you don't know how) and place it into the bin folder (like ublock); so for me, I have it in bin\useragent\User-Agent-1.1.0.crx. This can be anywhere as long as you point it correctly in the code below. Then add the below line in scraper.py (I added it after line 88 as I left the first option in)
chrome_options.add_extension(os.path.join(os.getcwd(), "bin", "useragent", "User-Agent-1.1.0.crx"))

I did not have to solve the captcha with this route, but I did have to click on the extension to change the user-agent and then reload the page. I don't know how to set user-agent from this extension automatically, but maybe this would save from clicking. If someone knows how to do this or has a better solution that doesn't require any manual clicking or captcha, that would be awesome.

@vongyver
Copy link

vongyver commented Jul 19, 2021 via email

@kotobuki09
Copy link

Due to this error happening to me all the time #58, I got annoyed with having to reinstall user-agent every morning. I implemented 2 options to change user-agent. Unfortunately, both require manual clicking, but less work than the above solution.

1. **change user-agent at start**: Add the following line in scraper.py (I added in line 88).
chrome_options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A")

This will change it to a Safari user-agent. If this user-agent gets flagged by Cloudflare, then just change it to another user-agent. I think anything other than a chrome user-agent should work. This option always required me to do the captcha at least once so a little bit annoying. I tried option 2 below to see if I could get around solving captcha.

1. **load extension at start**: download the user-agent extension as a .crx file (google if if you don't know how) and place it into the bin folder (like ublock); so for me, I have it in bin\useragent\User-Agent-1.1.0.crx. This can be anywhere as long as you point it correctly in the code below. Then add the below line in scraper.py (I added it after line 88 as I left the first option in)
chrome_options.add_extension(os.path.join(os.getcwd(), "bin", "useragent", "User-Agent-1.1.0.crx"))

I did not have to solve the captcha with this route, but I did have to click on the extension to change the user-agent and then reload the page. I don't know how to set user-agent from this extension automatically, but maybe this would save from clicking. If someone knows how to do this or has a better solution that doesn't require any manual clicking or captcha, that would be awesome.

Working like charm in my case, thanks hx103. Sometimes I still got network blocks or errors while backup all the files, but that's already too good already!

@kotobuki09
Copy link

hxh103, thanks for the recommendations, glad to see it's working for you. Not working for me, tried both and switching agents about 6 times with reloads. I'm still getting the hCaptcha cycle. I expect my issue may be a little different. I am not sure what Cloudflare is using for browser fingerprinting, but I may be blocking that too. FYI

On Mon, Jul 19, 2021 at 3:23 PM hxh103 @.***> wrote: Due to this error happening to me all the time #58 <#58>, I got annoyed with having to reinstall user-agent every morning. I implemented 2 options to change user-agent. Unfortunately, both require manual clicking, but less work than the above solution. 1. change user-agent at start: Add the following line in scraper.py (I added in line 88). chrome_options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A") This will change it to a Safari user-agent. If this user-agent gets flagged by Cloudflare, then just change it to another user-agent. I think anything other than a chrome user-agent should work. This option always required me to do the captcha at least once so a little bit annoying. I tried option 2 below to see if I could get around solving captcha. 1. load extension at start: download the user-agent extension as a .crx file (google if if you don't know how) and place it into the bin folder (like ublock); so for me, I have it in bin\useragent\User-Agent-1.1.0.crx. This can be anywhere as long as you point it correctly in the code below. Then add the below line in scraper.py (I added it after line 88 as I left the first option in) chrome_options.add_extension(os.path.join(os.getcwd(), "bin", "useragent", "User-Agent-1.1.0.crx")) I did not have to solve the captcha with this route, but I did have to click on the extension to change the user-agent and then reload the page. I don't know how to set user-agent from this extension automatically, but maybe this would save from clicking. If someone knows how to do this or has a better solution that doesn't require any manual clicking or captcha, that would be awesome. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASTLSXI5LEES3YPW6HHHP3TYSJUVANCNFSM42ARMJMQ .

Have you tried to download with a different network? Cause you already created a similar environment like me and hxh103. The only problem left is your network firewall and so on.

@vongyver
Copy link

vongyver commented Jul 20, 2021 via email

@mandliya
Copy link

None of the above options worked for me! I keep getting thrown to captcha page. Didn't even login once. (I tried the extension as well as changing user agent at the load)
I will keep an eye on this thread in case someone runs into similar issue and able to solve it.

Thank you for the amazing tool.

@mandliya
Copy link

If anyone is still stuck with this, use undetected-chromedriver. Replace your driver with this and fix few errors of unwanted options and voila it works! 😊

@vongyver
Copy link

vongyver commented Jul 23, 2021 via email

@mandliya
Copy link

Yes, imported in scraper.py to replace the existing selenium chrome driver. undetected-chromedriver has examples in their repo.

@leoncvlt
Copy link
Owner

leoncvlt commented Sep 7, 2021

Can anoyone confirm undetected-chromedriver does indeed fix the issue? If so, might be time for a PR 😄

@fugohan
Copy link

fugohan commented Nov 28, 2021

I have tried to use the undetected-chromedriver but I can't fix this error message. Can somebody help me?

python3 blinkistscraper ********@***m ******** --language de --audio --concat-audio --keep-noncat
[14:22:24] INFO Starting scrape run...
[14:22:25] INFO Initialising chromedriver at /home/user/.local/lib/python3.8/site-packages/chromedriver_autoinstaller/97/chromedriver...
[14:22:26] ERROR Message: invalid argument: cannot parse capability: goog:chromeOptions
from invalid argument: unrecognized chrome option: excludeSwitches
  (Driver info: chromedriver=97.0.4692.20 (6559bb085abcaedffe35d268b3546c43f755151c-refs/branch-heads/4692@{#186}),platform=Linux 5.11.0-40-generic x86_64)
Traceback (most recent call last):
  File "/home/user/Downloads/blinkist-scraper/blinkistscraper/__main__.py", line 412, in <module>
    main()
  File "/home/user/Downloads/blinkist-scraper/blinkistscraper/__main__.py", line 319, in main
    driver = scraper.initialize_driver(
  File "/home/user/Downloads/blinkist-scraper/blinkistscraper/scraper.py", line 102, in initialize_driver
    driver = uc.Chrome(version_main=97,
  File "/home/user/.local/lib/python3.8/site-packages/undetected_chromedriver/v2.py", line 302, in __init__
    super(Chrome, self).__init__(
  File "/home/user/.local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 70, in __init__
    super(WebDriver, self).__init__(DesiredCapabilities.CHROME['browserName'], "goog",
  File "/home/user/.local/lib/python3.8/site-packages/selenium/webdriver/chromium/webdriver.py", line 93, in __init__
    RemoteWebDriver.__init__(
  File "/home/user/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 268, in __init__
    self.start_session(capabilities, browser_profile)
  File "/home/user/.local/lib/python3.8/site-packages/undetected_chromedriver/v2.py", line 582, in start_session
    super(Chrome, self).start_session(capabilities, browser_profile)
  File "/home/user/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 359, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/home/user/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 424, in execute
    self.error_handler.check_response(response)
  File "/home/user/.local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: cannot parse capability: goog:chromeOptions
from invalid argument: unrecognized chrome option: excludeSwitches
  (Driver info: chromedriver=97.0.4692.20 (6559bb085abcaedffe35d268b3546c43f755151c-refs/branch-heads/4692@{#186}),platform=Linux 5.11.0-40-generic x86_64)

[14:22:26] CRITICAL Uncaught Exception. Exiting...

@bl4ckOut
Copy link

Can anoyone confirm undetected-chromedriver does indeed fix the issue? If so, might be time for a PR smile

Yes I can confirm it. Just like @mandliya mentioned, the undetected-chromedriver fixes the infinite captcha-loop from cloudflare.

@fugohan
Copy link

fugohan commented Jan 5, 2022

I have tried to use the undetected-chromedriver but I can't fix this error message. Can somebody help me?

python3 blinkistscraper  --language de --audio --concat-audio --keep-noncat
[14:22:24] INFO Starting scrape run...
[14:22:25] INFO Initialising chromedriver at /home/user/.local/lib/python3.8/site-packages/chromedriver_autoinstaller/97/chromedriver...
[14:22:26] ERROR Message: invalid argument: cannot parse capability: goog:chromeOptions
from invalid argument: unrecognized chrome option: excludeSwitches
  (Driver info: chromedriver=97.0.4692.20 (6559bb085abcaedffe35d268b3546c43f755151c-refs/branch-heads/4692@{#186}),platform=Linux 5.11.0-40-generic x86_64)
Traceback (most recent call last):
  File "/home/user/Downloads/blinkist-scraper/blinkistscraper/__main__.py", line 412, in <module>
    main()
  File "/home/user/Downloads/blinkist-scraper/blinkistscraper/__main__.py", line 319, in main
    driver = scraper.initialize_driver(
  File "/home/user/Downloads/blinkist-scraper/blinkistscraper/scraper.py", line 102, in initialize_driver
    driver = uc.Chrome(version_main=97,
  File "/home/user/.local/lib/python3.8/site-packages/undetected_chromedriver/v2.py", line 302, in __init__
    super(Chrome, self).__init__(
  File "/home/user/.local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 70, in __init__
    super(WebDriver, self).__init__(DesiredCapabilities.CHROME['browserName'], "goog",
  File "/home/user/.local/lib/python3.8/site-packages/selenium/webdriver/chromium/webdriver.py", line 93, in __init__
    RemoteWebDriver.__init__(
  File "/home/user/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 268, in __init__
    self.start_session(capabilities, browser_profile)
  File "/home/user/.local/lib/python3.8/site-packages/undetected_chromedriver/v2.py", line 582, in start_session
    super(Chrome, self).start_session(capabilities, browser_profile)
  File "/home/user/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 359, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/home/user/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 424, in execute
    self.error_handler.check_response(response)
  File "/home/user/.local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: cannot parse capability: goog:chromeOptions
from invalid argument: unrecognized chrome option: excludeSwitches
  (Driver info: chromedriver=97.0.4692.20 (6559bb085abcaedffe35d268b3546c43f755151c-refs/branch-heads/4692@{#186}),platform=Linux 5.11.0-40-generic x86_64)

[14:22:26] CRITICAL Uncaught Exception. Exiting...

@fugohan to solve it see my comment below, also FYI your email and password is exposed

I will check you fix now out and thank you for the mentioning of the email ^^ Can you also edit it out?

@fugohan
Copy link

fugohan commented Jan 5, 2022

Can anoyone confirm undetected-chromedriver does indeed fix the issue? If so, might be time for a PR smile

it did fix the issue for me but I had to comment out undetected code inside scraper.py for it to work lines :69 to :88

I got it to work but i can't scrape any audio i get this error message

[21:27:44] ERROR 'Chrome' object has no attribute 'wait_for_request'
Traceback (most recent call last):
  File "blinkistscraper/__main__.py", line 412, in <module>
    main()
  File "blinkistscraper/__main__.py", line 368, in main
    dump_exists = scrape_book(
  File "blinkistscraper/__main__.py", line 257, in scrape_book
    audio_files = scraper.scrape_book_audio(
  File "blinkistscraper/scraper.py", line 526, in scrape_book_audio
    captured_request = driver.wait_for_request("audio", timeout=30)
AttributeError: 'Chrome' object has no attribute 'wait_for_request'
[21:27:44] CRITICAL Uncaught Exception. Exiting...

@fugohan
Copy link

fugohan commented Jan 5, 2022

@orenaksakal are you able to download audio with this fix?

@fugohan
Copy link

fugohan commented Jan 6, 2022

@orenaksakal are you able to download audio with this fix?

yes, I'm able to download epub, html and audio (concatenate works too) what I do at that point is reverting back to selenium web-driver and using chrome plugin mentioned above to switch user agent

You mean after you saved the cookies am I right?

@forhobbie
Copy link

@orenaksakal are you able to download audio with this fix?

yes, I'm able to download epub, html and audio (concatenate works too) what I do at that point is reverting back to selenium web-driver and using chrome plugin mentioned above to switch user agent

Hi @orenaksakal
I scrapped all pdfs already, but I am having the same issue when downloading audio. Apparently, the undetected-chromedriver cannot scrap audio since that is done via a seleniumwire function.

←[2m[17:10:37]←[0m ←[34mINFO←[0m Getting all books for category Entrepreneurship... ←[2m[17:10:44]←[0m ←[34mINFO←[0m Found 216 books ←[2m[17:10:44]←[0m ←[34mINFO←[0m Scraping book at https://www.blinkist.com/en/books/15-secrets-successful-people-know-about-time-management-en-kevin-kruse C:\Users\luisa\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\remote\webelement.py:446: UserWarning: find_element_by_* commands are deprecated. Please use find_element() instead warnings.warn("find_element_by_* commands are deprecated. Please use find_element() instead") ←[2m[17:10:48]←[0m ←[31mERROR←[0m requests Traceback (most recent call last): File "C:\Users\luisa\.a python\blinkist\.a test\blinkistscraper\__main__.py", line 412, in <module> main() File "C:\Users\luisa\.a python\blinkist\.a test\blinkistscraper\__main__.py", line 368, in main dump_exists = scrape_book( File "C:\Users\luisa\.a python\blinkist\.a test\blinkistscraper\__main__.py", line 257, in scrape_book audio_files = scraper.scrape_book_audio( File "C:\Users\luisa\.a python\blinkist\.a test\blinkistscraper\scraper.py", line 513, in scrape_book_audio del driver.requests AttributeError: requests ←[2m[17:10:48]←[0m ←[41mCRITICAL←[0m Uncaught Exception. Exiting...

How do you revert back to selenium? Could you please explain in a little more detail what you do after login or what you need to change in the scraper.py to make it revert automatically after successful login?

After passing the captcha with undetected-chromedriver I try to run the program again with the default driver, but the new window opening goes back to the captcha loop. I also tried modifying scraper.py and substitute the selenium webdriver by uc, but nothing.
I am not very experienced in python and I ran out of ideas :(
I´d appreciate it if you could share your solution!
Thank you for your help, guys!

@raspgalax
Copy link

Also have the issue where it just says
www.blinkist.com needs to review the security of your connection before proceeding.
being blocked by cloudflare. None of the solutions above work :(

@fanishjain
Copy link

Hello, captcha page stuck. I was wondering if you got it solved, then maybe I can use yours

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests