Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baidu: signature authentication support #298

Closed
wants to merge 15 commits into from
Closed

Conversation

jinzhc
Copy link

@jinzhc jinzhc commented May 16, 2018

changes for baidu geocoder & pytest
For signature authentication, see
http://lbsyun.baidu.com/index.php?title=lbscloud/api/appendix

@jinzhc
Copy link
Author

jinzhc commented May 16, 2018

I have done the pytest with my api_keys with/without security key

Copy link
Member

@KostyaEsmukov KostyaEsmukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've left a couple of notes. Overall it looks good!

One thing I'm wondering about is the status code changes: was the source code incorrect before? I mean, it used to be strings, but now they're integers. Was it a bug, or a change in the API?

@@ -78,6 +84,8 @@ def __init__(
)
self.api_key = api_key
self.api = '%s://api.map.baidu.com/geocoder/v2/' % self.scheme
self.api_path = self.api.split('baidu.com')[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that would be better?

self.api_path = '/geocoder/v2/'
self.api = '%s://api.map.baidu.com%s' % (self.scheme, self.api_path)

raise GeocoderQueryError(
'IP/SN/SCODE/REFERER Illegal:'
)
elif status == '2xx':
elif status in range(200, 300):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python2 range is a list, which seems inefficient.
What about this?

elif 200 <= status <= 300:

@@ -27,6 +27,7 @@ class BaiduTestCase(GeocoderTestBase):
def setUpClass(cls):
cls.geocoder = Baidu(
api_key=env['BAIDU_KEY'],
security_key=env['BAIDU_SEC_KEY'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to create a separate test class instead, which does use the SK, so we could test both setups. I think that having two simple calls (1 geocode and 1 reverse) for this new class would be enough to be sure that SK works.

Also, if I understood correctly, that should be a different BAIDU_KEY (for which the SK auth is enabled), shouldn't it? If so, it should have a different ENV name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. Are BAIDU_KEY_REQUIRES_SK and BAIDU_SEC_KEY ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's fine.

"""

# Since quoted_params is escaped by urlencode, don't apply quote twice
raw = self.api_path + '?' + quoted_params + self.security_key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs at http://lbsyun.baidu.com/index.php?title=lbscloud/api/appendix , the quote(params, safe="/:=&?#+!$,;'@()*[]") code is used for getting a raw string, while there we have urlencode(params). Wouldn't that cause different hashes for peculiar requests? I think the relevant test cases should be added. For example, a one with a comma: locator.geocode("beijing, china").

Copy link
Author

@jinzhc jinzhc May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quoted_params in self._url_with_signature was given by urlencode. urlencode has already called quote_plus on parameters internally. Except parameters, quote never escape the characters in site/path/key parts, because they are not special characters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the list of safe characters match the one used in urlencode internally? For example, I see that comma is in that list, however urlencode uses percent encoding on commas, IIRC.

A test wouldn't harm in either way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this patch code doesn't match the given sample code in docs. But it works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sure works, but for some cases it might not work. Could you please add the test with a comma, and fix the code, if that fails?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to stick to docs. I'm going to change urlencode(params) to urlencode(params, safe="!*'();:@&=+$,/?#[]", quote_via=quote). Will you accept this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the test first :)

But yeah, that would be better than crafting the query string manually. Even better if all of this is not required and a simple urlencode works.

@KostyaEsmukov KostyaEsmukov added this to the 1.15 milestone May 17, 2018
@jinzhc
Copy link
Author

jinzhc commented May 21, 2018

Comments for status in http response page.

See 状态码定义 (status definition in English) section (1st section) in http://lbsyun.baidu.com/index.php?title=lbscloud/api/appendix. status has more values than original implementation. The failed signature will have 211 status value, whose inferred type is int. This is why I change status checking from str to int version.

@KostyaEsmukov
Copy link
Member

I'm not sure if I understood your reply regarding status correctly. My question was specifically about the types, not about checking for more specific error codes. If status in the API response is an int, then it would be never equal to a string (In Python 1 == '1' yields False). And if it's a string in the response, then, again, '1' == 1 would still yield False.

I've just checked, API returns ints. It means that the current implementation (I mean, without this patch) doesn't seem to respect the status at all. It's not covered by tests too. That's pretty sad :(

@jinzhc
Copy link
Author

jinzhc commented May 23, 2018

May the new Invalid AK test case help. It's hard to cover all failure status tests due to the poor Baidu docs.

@jinzhc
Copy link
Author

jinzhc commented May 23, 2018

For unittest.assertRaisesRegex's compatibility, willpytest.assertRaises be a better option?

else:
assert_raise_msg = self.assertRaisesRegexp

with assert_raise_msg(GeocoderAuthenticationFailure, 'Invalid AK'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to do the same without tinkering around compatibility:

    with self.assertRaises(GeocoderAuthenticationFailure) as cm:
        ...
    self.assertEqual(str(cm.exception), 'Invalid AK')

To me it looks better than assertRaisesRegex. What do you think?

Also I'm wondering: does the self.reverse_run command ever run? I feel that self.geocode_run causes an exception being thrown, and the self.reverse_run call is never tested. To fix that, each call should be wrapped in their own assertRaises/assertRaisesRegex context.

Copy link
Author

@jinzhc jinzhc May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, you have a better solution.
self.reverse_run cannot be reached, and I will remove this line afterwards.

"""

# Since quoted_params is escaped by urlencode, don't apply quote twice
raw = self.api_path + '?' + quoted_params + self.security_key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sure works, but for some cases it might not work. Could you please add the test with a comma, and fix the code, if that fails?

Copy link
Member

@KostyaEsmukov KostyaEsmukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment this looks good to me, except the urlencode vs quote_plus problem, discussed in the thread #298 (comment)

A test with a comma in the query is absolutely required, because this is an edge case, dividing the two possible implementations. So in order to avoid any bugs for now and in the future, such test must be added.

@jinzhc
Copy link
Author

jinzhc commented May 25, 2018

Do you think the comma test is still a must?

@KostyaEsmukov
Copy link
Member

Sorry for the delay.

Yes, a comma test is definitely a must. Do you need any help or a guidance with it? Please don't hesitate to ask. Basically all what's needed here is a simple forward geocoding request, which contains a comma in its query. If signature is wrong, an exception would be thrown and the test would fail, proving that there's a bug in the signature implementation.

The cf9a30e...1009fe2 change is somewhat disturbing to me. I would better to not construct the query string manually by escaping individual parts and joining them with & and =, but make urlencode to do this.

It's important to distinguish between input for the signature algorithm and resulting URL. The former requires the aforementioned quote with a list of safe characters, while the latter should use urlencode, which is safer. Unless the web server uses urlencoded data (before decoding) for verifying the signature on the server side instead of decoded and encoded with the quote... In this case a simple urlencode, just like it was before this change, should work correctly (and this is why I wanted to have a test before an implementation).

@jinzhc
Copy link
Author

jinzhc commented Jun 8, 2018

The cf9a30e...1009fe2 change is somewhat disturbing to me. I would better to not construct the query string manually by escaping individual parts and joining them with & and =, but make urlencode to do this.

I append 1009fe2 because cf9a30e failed in py2.7 & py 3.4. No safe parameter in py2.7; New quote_via parameter since 3.5.

@KostyaEsmukov
Copy link
Member

Would you mind if I finished this by myself? :)

@jinzhc
Copy link
Author

jinzhc commented Jun 10, 2018

Never mind. It's my pleasure.

@jinzhc
Copy link
Author

jinzhc commented Jun 11, 2018

@KostyaEsmukov Is new test appropriate?

@KostyaEsmukov
Copy link
Member

I'm wondering if at one point Baidu changes their search algorithm such as the !*'();:@&=+$,/?[] % part of the query would make nothing returned.

Actually the reason why I wanted to take this over is that there clearly is a confusion between composing a raw body for the signature and urlencoding the params for url. As I said in my previous comment #298 (comment) , urlencode should be used for the latter, instead of manual string construction. But for the raw body, apparently, the string must be constructed manually (unless, as said before, their backend uses urlencoded data for signature verification, where this safe list of characters is actually useless).

It's just easier for me to finish this by myself than trying to explain this in text. Sorry for that. You've done a great job, a little is left. Is that okay if I finish this by myself? 😉Of course you'd still be credited as a contributor and would get a green box. 😀

@jinzhc
Copy link
Author

jinzhc commented Jun 19, 2018

It's ok you take it over and go for final shot. :-)

@KostyaEsmukov KostyaEsmukov mentioned this pull request Jun 26, 2018
@KostyaEsmukov
Copy link
Member

Opened #306 which supersedes this PR.

So it looks like the safe list of characters is unnecessary at all! The SN is valid as long as the urlencoded string matches the one in the raw payload for the hash function.

Thanks again! I will merge #306 soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants