Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added possibility to add a custom http request user agent header #115

Closed
wants to merge 23 commits into from

Conversation

sebastianneubauer
Copy link
Contributor

My application that uses geopy was blocked by Nominatim as geopy uses the standard user agent header for requests provided by urllib (Python-urllib/2.7). Nominatim requires to add an identifying user agent header per application in order to be able to track and block requests if needed.

I now added in this PR a default header in the base geocoder class:
User-Agent: geopy/#version
where #version is the current geopy version (e.g "geopy/1.9.1")

This is now added to every request and every geocoder inheriting from the 'Geocoder' base class.
This should be much better, as it is much easier for services to be able to track which request come from where. Now every service is able to distinguish geopy requests from non geopy requests.

In addition I added the possibility to pass in a custom user agent string to the Nominatim geocoder class:

        geocoder = Nominatim(
            user_agent='my_user_agent/1.0'
        )
        self.assertEqual(geocoder.headers['User-Agent'], 'my_user_agent/1.0')

This should help to prevent quite some blockings`and bans from Nominatim, as now they are able to distinguish bad requests from good request based on the user agent headers and only block those, which have e.g a standard user agent like "geopy/1.9.1" or even "Python-urllib/2.7" and spare those requests identifying with a specific user agent like "GoodApp/0.1" as long as all "GoodApp/0.1" requests respect the Nominatim terms of usage.

I also added tests for it and thereby fixed the tests for ignfrance which were not able to pass tests without api key credentials even they should be skipped in that case.

@ijl
Copy link
Member

ijl commented Mar 29, 2015

I've cherry-picked 8b38ebb for the moment, thanks.

@@ -80,3 +80,10 @@ def decode_page(page):
else: # requests?
encoding = page.headers.get("charset") or "utf-8"
return str(page.content, encoding=encoding)


def get_version():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_version() should be replaced with use of a constant, to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean really an arbitrary constant like "1.0" or the version number of geopy? In my opinion for traceability and debugging it makes sense that it is the actual geopy package version? But in this case a constant would have to be held in sync manually...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just that it can be "geopy/%s" % geopy.version.GEOPY_VERSION or the like, determined at interpreter start. No need for a function call.

@ijl
Copy link
Member

ijl commented Mar 29, 2015

@sebastianneubauer it certainly makes sense, and thanks for working on it. Some comments. One significant thing is that every geocoder would need to accept and pass down a user_agent parameter.

@sebastianneubauer
Copy link
Contributor Author

As it it implemented now, if a geocoder has not implemented the acceptance and passing of a custom http header, the default value "geopy/VERSION" which is still much better than the urllib default value. As said, at the moment I implemented it only for the Nominatim geocoder, as it was requested by them. Of course it would make sense also for all or many other geocoders to implement this feature, but I fear, I certainly would need help for this...Until then, I believe it improves also with this current implementation and maybe one could write an issue to implement this in other geocoders later on?

@ijl
Copy link
Member

ijl commented Mar 29, 2015

Ok, I'll add it to other geocoders, then.

@sebastianneubauer
Copy link
Contributor Author

Ok, I think I managed to incorporate the suggestions, the default value of the user agent is now a constant and the circular import is removed...

@sebastianneubauer
Copy link
Contributor Author

Now added customizable user agents as optional parameter to all geocoders!
From my side this is finished for now...

@ijl
Copy link
Member

ijl commented Apr 10, 2015

@sebastianneubauer ok thanks, I'll merge later.

@@ -156,8 +156,7 @@ def geocode(
params = {'q': self.format_string % query}

params.update({
# `viewbox` apparently replaces `view_box`
'viewbox': self.view_box,
'view_box': self.view_box,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad merge, I'll fix.

@ijl
Copy link
Member

ijl commented Jun 9, 2015

@sebastianneubauer sorry, this is stalled because it breaks Yahoo Placefinder, which uses requests. Would you be willing to fix that?

@sebastianneubauer
Copy link
Contributor Author

Oh, ok, but why are all tests passing? Would be much easier for me to have a breaking test for this bug....I will have a look....

@sebastianneubauer
Copy link
Contributor Author

@ijl : sorry, could you elaborate a bit further, what does "break" mean in detail. How can I reproduce this? Would it be an option to remove the user agent option from yahoo placefinder?

@pnorman
Copy link

pnorman commented Jul 9, 2015

My application that uses geopy was blocked by Nominatim as geopy uses the standard user agent header for requests provided by urllib (Python-urllib/2.7). Nominatim requires to add an identifying user agent header per application in order to be able to track and block requests if needed.

Just to note that the Nominatim usage policy requires a valid HTTP Referer or User-Agent identifying the application and stock User-Agents as set by http libraries are explicitly not okay.

@sebastianneubauer
Copy link
Contributor Author

@ijl : what is the status here? this library violates the Nominatim usage policy without this PR! So for us it is really urgent to have this included!

@sebastianneubauer
Copy link
Contributor Author

@ijl : Now I see, apparently I removed accidentally some code in a bad merge, I'll fix that...

@sebastianneubauer
Copy link
Contributor Author

ok, so I updated it and fixed the bad merges...still any open issues?

@ijl
Copy link
Member

ijl commented Aug 31, 2015

@sebastianneubauer, Placefinder needed a fix in Base, but I added that. I'll release in 1.11.0 shortly. Thanks.

@ijl ijl closed this Aug 31, 2015
@ijl
Copy link
Member

ijl commented Aug 31, 2015

@sebastianneubauer on PyPI.

@sebastianneubauer
Copy link
Contributor Author

wohooo, thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants