New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urllib incorrectly quotes username and password in https basic auth #57851
Comments
Reproduction: >>> import urllib
>>> urllib.urlopen("https://example.com/")
Enter username for Test Site at example.com: user
Enter password for user in Test Site at example.com: top secret
Enter username for Test Site at example.com:
# If the correct password contains spaces, nothing will be accepted. The problem is that the password in basic auth is URI quoted and then base64 encoded. The password should not be quoted. RFC 2617: I traced the problem with Pydev to urllib retry_https_basic_auth where I can see that After that, the path is like this: user_passwd is not unquoted, host is. I found closely related bpo-2244 - but did not confirm where this bug has been introduced. I added some people from 2244 to this issue. I hope that is ok. I think a test should be added that covers usernames and passwords with spaces to avoid further regressions. The reproduction code given works with Python 2.4.3 urllib. This probably also affects python3, did not try. |
Joonas, this issue seems easy to solve. Do you want to try to post a patch?. Extra credits for patching testsuite too :). If you work in 2.7, I promise to up-port the patch to 3.x. |
|
Note Lib/test/test_urllib.py:261 ; there is a fake HTTP wrapper in the test. So the request is not really sent. I modified FakeHTTPConnection to store the sent HTTP request. I also copied the test you mentioned from python3 to 2.7. The second test I add in the patch fails. The test should pass with python2.5 from OS X (did not run the test but checked headers against netcat). Please take a look at the tests I added. I'm not sure if geturl() should return the quoted version or not. But certainly the quoted version must not be used in the base64. If you think geturl() should return the quoted version, I'm fine with that - in principle characters like \n in the password could be bad in an URL unless quoted. Maybe the tests could ALSO be added to some other places, but I think this full path makes sense to check like this. |
Updated patch for 2.7 hg tip attached. Please review, test and if ok, port to 3.x. I guess the URL needs to be quoted so commented out the assertion for the URL being equal. I added unquote in the base64 encoding of the password, which makes the test pass. Seems to work for me and no urllib tests were broken. Did not run others. http://test.webdav.org/ has some basic auth test accounts configured if you want to try it out. You can use wireshark to grab the base64 from the unencrypted http. Fancy opener works for me with this now, too. |
Some review comments. Instead of doing the inline unquote like this -
+ auth = base64.b64encode(unquote(user_passwd)).strip() It is better to do the explicitly above the b64 encoding step.
Also, you have done this only for https_open, the same would need be replicated for http_open and also for proxy_passwd. Also on tests, Modifying sendall with def sendall(self, str):
- pass
+ FakeHTTPConnection.request += str seems a bit odd to me, you are using a class level object and adding a str. I think, there should be better way to do. (I shall provide an example). Also str term can replaced, even if it was coming from old code. |
Ok. So it needs to be done on the line after import base64.
So four cases where this may need to be fixed and my test only covers one of them:
Copypasted code :( Can the https and the proxy auth be tested with the same fake http connection, when the request is stored?
Agreed, not clean and needs to be fixed. Works here, but could cause the test code to become unreadable later on. Wrote it at 5 am, not getting sleep. Please provide a cleaner alternative. :) One question: how come the fake http is given HTTP headers in some tests and payload only in others? Is it emulating the TCP stream or the payload stream handle? It can't do both, can it? Are the headers in some tests actually doing anything? I would not mind if someone else finished the patch :) |
There's no need to port your patch over python3k, since urllib behaves differently with http passwords - as you can see in the doc http://docs.python.org/dev/py3k/library/urllib.request.html#examples I would be glad to finish your password on 2.7 as soon as possible, joneskoo. Thanks. |
Patch attached. Note that now unquote is called with host using map(), and b64 encoded strings are no more hardcoded. Please tell me if those changes are acceptable - anyway they don't break any other unit tests. |
Michele, in your patch: + authorization = ("Authorization: Basic %s\r\n" % This is wrong. See the original report by me and RFC 2617. The username and password MUST NOT be url encoded before base64. That is the original problem. The point is that this test should fail with urllib before the change and not fail with the fix. Secondly, I think unquote will fail when given a None. For me, some other unit tests caught this when I had the unquote where the splituser is called. I didn't run your code but are other urllib tests ok for you? I like your change of having the base64 explicitly there and not as a magic string is a good idea. Senthil, could you provide the better alternative for the class field abuse, please? |
Whoops, probably I tested using $ python instead of $ ./python.exe - Well, concerning the class field abuse, HTTPConnection._buffer attribute might help? The point is that I can't find an easy way to get the HTTPConnection instance from an urlopen(). |
New changeset 01ef13e9e225 by Senthil Kumaran in branch '2.7':
|
Joonas and Michele - The fix along with the tests is in for 2.7 line. W.r.t to tests having it in the class level buf seems to be only easy way, for other it seemed to be that FakeSocket and FakeConnection stuff need some major change (and resulted in breaking of some tests). I thought it is better to push it in the current form as we use the buf for temporary storage and verification. |
Senthil, I ported the tests to 3.2. The quoting problem seems to be the same in 3.2 and the new test fails. I don't know how the password managers handle the usernames and passwords in python3 urllib so I did not look at that. Could you please also port the fix to relevant python3 branches? I think you can do it much quicker since you are already familiar with both code bases and the fix. |
Also adding a patch that may be enough to fix the problem in python3.2. Review needed, did not test more than passing the previously failed unit test. |
New changeset 80e3b8de4edd by Senthil Kumaran in branch '3.2': New changeset 4b4029fc8cf2 by Senthil Kumaran in branch 'default': |
Here we go! I thought the problem did not exist in py3k, but good that the tests caught them and we have a fix now. Thanks for the complete patch, Joonas. I hope it was easy to port the patch to 3k. The encoding part may perhaps be the only thing to careful with and use of the new string format feature was good one. Thanks! |
Updating the issue with version 3.2 tag since it was fixed there as well. Still fixed, of course. You are correct that the encodings can be tricky. Luckily I only added coding to tests. But you're right, I would consider very carefully before using similar code outside tests. I just realized what's the impact of this change on python3.2 really. Since urllib.request.urlopen (for some reason) does not allow username and password on the URI, it is not possible to hit this with that, I think. But FancyURLopener does allow using user-pass in url, so this bug was reachable. I just verified that and the fix :) Fix does the trick for FancyURLopener when the username and password are passed in the URL. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: