-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with TOR + Splash #268
Comments
How did you install Splash? There is a few caveats if you use Docker for this, e.g. 'localhost' inside Docker container is not the same as your localhost. This is incorrect:
Correct usage would be
but it shouldn't matter because if you use a "default" proxy profile there is no need to pass "proxy=default" argument. |
I just tried using one of the SOCKS5 proxies from this list; it worked both with proxy profiles and By the way, SOCKS5 proxying is a new feature in upcoming Splash release; are you using latest Splash master? In Splash 1.6 it won't work. |
Thanks for the answer! |
No, it is more complicated. It will be something like http://10.0.2.2. There is a discussion at #234. I'm releasing Splash 1.7 now; SOCKS5 support will be there. |
Thanks! Still no luck with 10.0.2.2:9050 i guess most time-saving option for me would be installing splash without docker. There are section in docs about install on 12.04, i guess 14.04 must work the same. |
Okay so installing all dependencies and running Splash without docker solved the issue. |
@andverb cool! I've also seen these messages in logs, but never found any problem they can cause. |
I'm closing this issue because it seems tor works fine. |
Hello! default.ini file in /etc/splash/proxy-profiles/ The solution is to run docker with this args: --net="host" - it'll bind the docker with localhost |
Hello! Recently i found out about Splash and Scrapyjs and started to use them in my scrapers. In my current project, i have encountered a problem and i just cant make it work.
So i kindly ask you for help. Usually i use Tor for my Scrapy crawlers if i have problems with ip blocking. In my current project, i need to scrape website that both don't show content without javascript, and blocks my ip after very moderate amount of requests. So i decided to combine Tor and Splash for this.
my default.ini file in /etc/splash/proxy-profiles/
[proxy]
host=localhost
port=9050
type=SOCKS5
i run Splash like this:
sudo docker run -p 8050:8050 -v /etc/splash/proxy-profiles/:/etc/splash/proxy-profiles/ scrapinghub/splash
i added proxy to request url as written in docs:
yield scrapy.Request(url+"proxy=default", self.parse_wine, meta={
'splash':{'endpoint': 'render.html','args': {'wait': 0.5}}})
but i get errors in Splash:
2015-08-06 15:05:53.714796 [-] "172.17.42.1" - - [06/Aug/2015:15:05:52 +0000] "POST /render.html HTTP/1.1" 502 21 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/534.55.3 (KHTML, like Gecko) Version/5.1.3 Safari/534.53.10"
2015-08-06 15:05:57.150016 [render] [37545816] loadFinished: RenderErrorInfo(type='Network', code=1, text=u'Connection refused', url=u"http://www.vinopedia.com/wine/The+Winner's+Tank+Shiraz+2005proxy=default")
my os is Elementary OS 0.3, basically its ubuntu 14.04.
I would be very grateful if someone can point me in right direction
The text was updated successfully, but these errors were encountered: