Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraping is blocked #195

Closed
eliavm opened this issue Nov 7, 2018 · 3 comments
Closed

Scraping is blocked #195

eliavm opened this issue Nov 7, 2018 · 3 comments

Comments

@eliavm
Copy link

eliavm commented Nov 7, 2018

When trying to scrape a page I'm getting empty page with one div <div id="distilIdentificationBlock"> </div>

Found something on stackoverflow that might be relevant: https://stackoverflow.com/questions/45060011/crawling-web-using-selenium-chrome-driver-but-still-blocked

Any idea how to bypass this?

Using Scrapy 1.5.1 + scrapy-splash 0.7.2

@al-serebrov
Copy link

al-serebrov commented Nov 9, 2018

Hi @eliavm !
Unfortunately, you are hitting the website using Distil Antibot measures, and there's no "silver bullet" to propose you.
There are some solutions and thoughts available on this topic, e.g. here and here, but generally you need to use proxies and develop some browser fingerprinting into your spider (pretend that you are an actual human being) - all of these things are pretty complicated and not really related to the Splash itself, as you can face the same responses using other headless browsers (e.g. Selenium).

@JavierRuano
Copy link

JavierRuano commented Nov 11, 2018 via email

@Gallaecio
Copy link
Contributor

@eliavm Could you close this issue, as it is not really about scrapy-splash?

@eliavm eliavm closed this as completed May 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants