New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added possibility to add a custom http request user agent header #115
Conversation
I've cherry-picked 8b38ebb for the moment, thanks. |
@@ -80,3 +80,10 @@ def decode_page(page): | |||
else: # requests? | |||
encoding = page.headers.get("charset") or "utf-8" | |||
return str(page.content, encoding=encoding) | |||
|
|||
|
|||
def get_version(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_version()
should be replaced with use of a constant, to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean really an arbitrary constant like "1.0" or the version number of geopy? In my opinion for traceability and debugging it makes sense that it is the actual geopy package version? But in this case a constant would have to be held in sync manually...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just that it can be "geopy/%s" % geopy.version.GEOPY_VERSION
or the like, determined at interpreter start. No need for a function call.
@sebastianneubauer it certainly makes sense, and thanks for working on it. Some comments. One significant thing is that every geocoder would need to accept and pass down a |
As it it implemented now, if a geocoder has not implemented the acceptance and passing of a custom http header, the default value "geopy/VERSION" which is still much better than the urllib default value. As said, at the moment I implemented it only for the Nominatim geocoder, as it was requested by them. Of course it would make sense also for all or many other geocoders to implement this feature, but I fear, I certainly would need help for this...Until then, I believe it improves also with this current implementation and maybe one could write an issue to implement this in other geocoders later on? |
Ok, I'll add it to other geocoders, then. |
Ok, I think I managed to incorporate the suggestions, the default value of the user agent is now a constant and the circular import is removed... |
Now added customizable user agents as optional parameter to all geocoders! |
@sebastianneubauer ok thanks, I'll merge later. |
@@ -156,8 +156,7 @@ def geocode( | |||
params = {'q': self.format_string % query} | |||
|
|||
params.update({ | |||
# `viewbox` apparently replaces `view_box` | |||
'viewbox': self.view_box, | |||
'view_box': self.view_box, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad merge, I'll fix.
@sebastianneubauer sorry, this is stalled because it breaks Yahoo Placefinder, which uses requests. Would you be willing to fix that? |
Oh, ok, but why are all tests passing? Would be much easier for me to have a breaking test for this bug....I will have a look.... |
@ijl : sorry, could you elaborate a bit further, what does "break" mean in detail. How can I reproduce this? Would it be an option to remove the user agent option from yahoo placefinder? |
Just to note that the Nominatim usage policy requires a valid HTTP Referer or User-Agent identifying the application and stock User-Agents as set by http libraries are explicitly not okay. |
@ijl : what is the status here? this library violates the Nominatim usage policy without this PR! So for us it is really urgent to have this included! |
@ijl : Now I see, apparently I removed accidentally some code in a bad merge, I'll fix that... |
14a343c
to
71e4819
Compare
ok, so I updated it and fixed the bad merges...still any open issues? |
@sebastianneubauer, Placefinder needed a fix in Base, but I added that. I'll release in 1.11.0 shortly. Thanks. |
@sebastianneubauer on PyPI. |
wohooo, thanks!! |
My application that uses geopy was blocked by Nominatim as geopy uses the standard user agent header for requests provided by urllib (Python-urllib/2.7). Nominatim requires to add an identifying user agent header per application in order to be able to track and block requests if needed.
I now added in this PR a default header in the base geocoder class:
User-Agent: geopy/#version
where #version is the current geopy version (e.g "geopy/1.9.1")
This is now added to every request and every geocoder inheriting from the 'Geocoder' base class.
This should be much better, as it is much easier for services to be able to track which request come from where. Now every service is able to distinguish geopy requests from non geopy requests.
In addition I added the possibility to pass in a custom user agent string to the Nominatim geocoder class:
This should help to prevent quite some blockings`and bans from Nominatim, as now they are able to distinguish bad requests from good request based on the user agent headers and only block those, which have e.g a standard user agent like "geopy/1.9.1" or even "Python-urllib/2.7" and spare those requests identifying with a specific user agent like "GoodApp/0.1" as long as all "GoodApp/0.1" requests respect the Nominatim terms of usage.
I also added tests for it and thereby fixed the tests for ignfrance which were not able to pass tests without api key credentials even they should be skipped in that case.