Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with captcha/bot detection #296

Closed
4symmetry19 opened this issue Jan 21, 2023 · 7 comments
Closed

Issue with captcha/bot detection #296

4symmetry19 opened this issue Jan 21, 2023 · 7 comments

Comments

@4symmetry19
Copy link

4symmetry19 commented Jan 21, 2023

Hi,

first of all, thanks for this great project!
I'm running this on a mac in the local shell using Python 3.11.
I configured everything for IS24, incl. 2captcha and the Telegram bot.

When I run flathunter.py though, I get output the first time; when it tries again after 10min, it is apparently detected as a bot.
Note: I turned off "headless" as that wasn't working at all; at least with that off it gets me the first batch of results.

This is the outut I get after a 2nd run (verbose mode):
[2023/01/21 13:22:59|abstract_crawler.py |INFO ]: Timeout waiting for iframe element - no captcha verification necessary? [2023/01/21 13:22:59|crawl_immobilienscout.py|WARNING ]: Unable to find IS24 variable in window [2023/01/21 13:22:59|crawl_immobilienscout.py|ERROR ]: IS24 bot detection has identified our script as a bot - we've been blocked [2023/01/21 13:22:59|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/de/[confidential but seems normal] [2023/01/21 13:23:09|abstract_crawler.py |INFO ]: Timeout waiting for iframe element - no captcha verification necessary? [2023/01/21 13:23:09|crawl_immobilienscout.py|WARNING ]: Unable to find IS24 variable in window [2023/01/21 13:23:09|crawl_immobilienscout.py|ERROR ]: IS24 bot detection has identified our script as a bot - we've been blocked [2023/01/21 13:23:09|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/radius/wohnung-mieten?[confidential but seems normal] [2023/01/21 13:23:20|abstract_crawler.py |INFO ]: Timeout waiting for iframe element - no captcha verification necessary? [2023/01/21 13:23:20|crawl_immobilienscout.py|WARNING ]: Unable to find IS24 variable in window [2023/01/21 13:23:20|crawl_immobilienscout.py|ERROR ]: IS24 bot detection has identified our script as a bot - we've been blocked

Another thing that stands out to me is that acc. to 2captcha.com, I've only used 1 captcha so far. For a very long time, the use count was even at 0 despite me getting that first batch of results. The API code is correct though.

Any help would be appreciated!

Cheers,
asymmetry

@codders
Copy link

codders commented Jan 21, 2023

Hey there,

You'll probably need to provide a few more arguments to the chrome driver. From the looks of your output, you might be hitting the bot detection. Try:

captcha:
  2captcha:
    api_key: 0...00
  driver_arguments:
    - "--no-sandbox"
    - "--headless"
    - "--disable-gpu"
    - "--remote-debugging-port=9222"
    - "--disable-dev-shm-usage"
    - "window-size=1024,768"

@4ndrew
Copy link

4ndrew commented Jan 23, 2023

Got the same issue with IS24, tried to add driver arguments but with no luck.
I use flathunder with docker...

UPD: installed on mac instead of linux -- with headless -- all the same. Without headless it works.

@codders
Copy link

codders commented Jan 24, 2023

I just deployed it to a PC in the cloud without docker, and it all works (with --headless), so I think it's maybe something about IP ranges or some other property that is triggering the bot detection.

@codders
Copy link

codders commented Feb 21, 2023

We've upgraded the undetected-chrome support in the latest software. Are you still seeing this issue with the newest version?

@infctr
Copy link

infctr commented Feb 21, 2023

I've updated to latest build and crawling IS24 still doesn't work for me on Google Cloud Deployment

@4symmetry19
Copy link
Author

We've upgraded the undetected-chrome support in the latest software. Are you still seeing this issue with the newest version?

It started working soon after I posted, so I guess you fixed it! Thanks so much :)

@codders
Copy link

codders commented Mar 9, 2023

Great to hear - thanks for the report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants