New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bindaddress of meta never work right. #3565
Comments
#1967 same issue. |
I was able to reproduce the issue, but only from the Scrapy shell, not within a spider, and only when the requests are directed to hosts outside of the local network. Within the local network I started a simple Flask app in one server (192.168.1.156) that returns the IP address of the host that makes a request: from flask import Flask, request
app = Flask(__name__)
@app.route('/')
def hello():
return request.remote_addr
if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0') then, from a different machine with two interfaces (192.168.1.4 and 192.168.1.6) in the same private network:
Now, consider the following spider: from scrapy import Spider, Request
class AddressSpider(Spider):
name = 'address'
def start_requests(self):
yield Request('http://192.168.1.156:5000', dont_filter=True, meta={'bindaddress': ('192.168.1.4', 0)})
yield Request('http://192.168.1.156:5000', dont_filter=True, meta={'bindaddress': ('192.168.1.6', 0)})
def parse(self, response):
self.logger.info('%s - %s', response.meta['bindaddress'], response.text) and the resulting logs:
Requesting external servers The same spider, now requesting public servers, running from a host with two interfaces which are connected to the internet through different public IPs: from scrapy import Spider, Request
class AddressSpider(Spider):
name = 'address'
def start_requests(self):
yield Request('http://ip.jsontest.com/', dont_filter=True, meta={'bindaddress': ('192.168.43.152', 0)})
yield Request('http://ip.jsontest.com/', dont_filter=True, meta={'bindaddress': ('192.168.1.4', 0)})
def parse(self, response):
self.logger.info('%s - %s', response.meta['bindaddress'], response.text) produces:
However, there are issues when fetching the same requests from the shell:
Conclusion I suspect some connection is reused in the context of the Scrapy shell, which makes all requests go through the same interface. |
I tried both in shell and in spider and the results are the same. so I'm head to using a local proxy server with my env was: the machine running these is a windows machine has an ethernet interface with two ips(192.168.0.2 + 192.168.0.3)
the target server says the two requests are from the same source internet ip(the 192.168.0.2 one).
|
Could you provide the complete spider code with logs, along with the |
hey,man.I guess i find the reason. Scrapy use a network library named twisted, it reserved a connection pool which is stored as k-v, whose key is only relative to the target server.So if you request the target website twice in a short duration it may reuse the connection before. So the result is clear.But i can‘t reproduce the issue because of my network :( |
hey, I just had new found when I'm running it for detailed logs.
this won't work, and output:
but this do work:
output:
the only difference is in but who don't override |
Maybe the delay just make the connection made before break in my opinion.So can you add a break point in your file? It's in the sit-packages ,twisted.web.client.HttpConnection.getConnection .you can add at 1st line in the function.And then debug it to see if it makes two newConnections or not. |
how to add a breakpoint in an imported library? |
sorry for replying so late, that was midnight in my country and i got to sleep. About the debug are you using an ide for python? I'm using pycharm and the library can be found in the project sidebar, and if you are not using this you can google or just search it in the ide's website.The 'twisted' network library is a pure python project instead a c extesion so debugging the library is possible. |
@NewUserHa, as @alwaysused mentioned above, Your example with empty To fix this problem you can patch connection key composed in
But it access the private member and violates incapsulation properties. If this functionality is relly needed by community it worths to think better and maybe update |
I have multi local IPs with different outgoing port to internet.
simply:
responsed same result.
ps: already verified the two bind-address works using
curl
.does anyone know how to fix and be willing to fix it?
The text was updated successfully, but these errors were encountered: