New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scraping is blocked #195
Comments
Hi @eliavm ! |
I have "read" about here.
Another resource,
https://www.blackhatworld.com/seo/python-scraping-distil-protected-sites.988967/
i don't know if it is useful, sorry.
i don't speak chinese.
https://www.jianshu.com/p/be856bc15afb
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Libre
de virus. www.avg.com
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
El vie., 9 nov. 2018 a las 17:49, Alexander Serebrov (<
notifications@github.com>) escribió:
… Hi @eliavm <https://github.com/eliavm> !
Unfortunately, you are hitting the website using Distil Antibot measures,
and there's not "silver bullet" to propose you.
There are some solutions and thoughts available on this topic, e.g. here
<https://www.reddit.com/r/webdev/comments/5q1ypx/what_is_your_approach_on_scraping_distil_networks/>
and here
<https://www.blackhatworld.com/seo/python-scraping-distil-protected-sites.988967/>,
but generally you need to use proxies and develop some browser
fingerprinting into your spider (pretend that you are an actual human
being) - all of these things are pretty complicated and not really related
to the Splash itself, as you can face the same responses using other
headless browsers (e.g. Selenium).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#195 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/Agwyu7PKjyEjBLK6NTO5uKMFxy-Qh4cJks5utbILgaJpZM4YSRUY>
.
|
@eliavm Could you close this issue, as it is not really about scrapy-splash? |
When trying to scrape a page I'm getting empty page with one div
<div id="distilIdentificationBlock"> </div>
Found something on stackoverflow that might be relevant: https://stackoverflow.com/questions/45060011/crawling-web-using-selenium-chrome-driver-but-still-blocked
Any idea how to bypass this?
Using Scrapy 1.5.1 + scrapy-splash 0.7.2
The text was updated successfully, but these errors were encountered: