Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib incorrectly quotes username and password in https basic auth #57851

Closed
joneskoo mannequin opened this issue Dec 20, 2011 · 18 comments
Closed

urllib incorrectly quotes username and password in https basic auth #57851

joneskoo mannequin opened this issue Dec 20, 2011 · 18 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@joneskoo
Copy link
Mannequin

joneskoo mannequin commented Dec 20, 2011

BPO 13642
Nosy @jcea, @orsenthil, @carljm, @ezio-melotti, @mmaker
Files
  • tests-and-fakehttp-request-storing.diff: FakeHTTPConnection stores request, added tests
  • tests-and-fakehttp-request-storing-2.diff
  • issue13642.patch
  • issue13642_with_map.patch
  • urllib-userpass-w-spaces.patch: test_userpass_inurl_w_spaces for python3.2
  • urllib-userpass-w-spaces-fix.patch: same as previous but with fix
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/orsenthil'
    closed_at = <Date 2012-01-14.11:15:55.280>
    created_at = <Date 2011-12-20.15:02:45.137>
    labels = ['type-bug', 'library']
    title = 'urllib incorrectly quotes username and password in https basic auth'
    updated_at = <Date 2012-01-14.12:10:41.222>
    user = 'https://bugs.python.org/joneskoo'

    bugs.python.org fields:

    activity = <Date 2012-01-14.12:10:41.222>
    actor = 'joneskoo'
    assignee = 'orsenthil'
    closed = True
    closed_date = <Date 2012-01-14.11:15:55.280>
    closer = 'orsenthil'
    components = ['Library (Lib)']
    creation = <Date 2011-12-20.15:02:45.137>
    creator = 'joneskoo'
    dependencies = []
    files = ['24147', '24148', '24191', '24192', '24234', '24235']
    hgrepos = []
    issue_num = 13642
    keywords = ['patch']
    message_count = 18.0
    messages = ['149915', '150245', '150378', '150710', '150714', '150720', '150734', '150913', '150939', '151005', '151010', '151018', '151019', '151239', '151240', '151245', '151246', '151248']
    nosy_count = 7.0
    nosy_names = ['jcea', 'orsenthil', 'carljm', 'ezio.melotti', 'maker', 'python-dev', 'joneskoo']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue13642'
    versions = ['Python 2.7', 'Python 3.2']

    @joneskoo
    Copy link
    Mannequin Author

    joneskoo mannequin commented Dec 20, 2011

    Reproduction:

    >>> import urllib
    >>> urllib.urlopen("https://example.com/")
    Enter username for Test Site at example.com: user
    Enter password for user in Test Site at example.com: top secret
    Enter username for Test Site at example.com:
    # If the correct password contains spaces, nothing will be accepted.

    The problem is that the password in basic auth is URI quoted and then base64 encoded. The password should not be quoted.

    RFC 2617:
    userid = *<TEXT excluding ":">
    password = *TEXT
    base64-user-pass = <base64 [4] encoding of user-pass,
    except not limited to 76 char/line>

    I traced the problem with Pydev to urllib retry_https_basic_auth where I can see that
    user = "user"
    passwd = "my secret password"

    After that, the path is like this:
    self.retry_https_basic_auth:
    self.open(fullurl="https://user:my%20%secret%20password@example.com/")
    self.open_https(url="://user:my%20%secret%20password@example.com/")
    => in open_https:
    host, selector = splithost(url)
    user_passwd, host = splituser(host)
    host = unquote(host)

    user_passwd is not unquoted, host is.

    I found closely related bpo-2244 - but did not confirm where this bug has been introduced. I added some people from 2244 to this issue. I hope that is ok.

    I think a test should be added that covers usernames and passwords with spaces to avoid further regressions. The reproduction code given works with Python 2.4.3 urllib. This probably also affects python3, did not try.

    @joneskoo joneskoo mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 20, 2011
    @jcea
    Copy link
    Member

    jcea commented Dec 25, 2011

    Joonas, this issue seems easy to solve. Do you want to try to post a patch?. Extra credits for patching testsuite too :).

    If you work in 2.7, I promise to up-port the patch to 3.x.

    @mmaker
    Copy link
    Mannequin

    mmaker mannequin commented Dec 30, 2011

    Joonas, this issue seems easy to solve. Do you want to try to post a
    patch?. Extra credits for patching testsuite too :).
    As far as I see, it would be sufficient to add unquote(passed) to _open_generic_http. Regarding unittests instead, there is already a method called test_userpass_inurl which could be extended with some tests on a password containing spaces ( Lib/test/test_urllib.py:263). But what I haven't yet understood is: does it really exists a user:pass in python.org?

    @joneskoo
    Copy link
    Mannequin Author

    joneskoo mannequin commented Jan 6, 2012

    Regarding unittests instead, there is already a method called
    test_userpass_inurl which could be extended with some tests on a
    password containing spaces ( Lib/test/test_urllib.py:263). But what
    I haven't yet understood is: does it really exists a user:pass in
    python.org?

    Note Lib/test/test_urllib.py:261 ; there is a fake HTTP wrapper in the test. So the request is not really sent.

    I modified FakeHTTPConnection to store the sent HTTP request. I also copied the test you mentioned from python3 to 2.7. The second test I add in the patch fails. The test should pass with python2.5 from OS X (did not run the test but checked headers against netcat).

    Please take a look at the tests I added. I'm not sure if geturl() should return the quoted version or not. But certainly the quoted version must not be used in the base64. If you think geturl() should return the quoted version, I'm fine with that - in principle characters like \n in the password could be bad in an URL unless quoted.

    Maybe the tests could ALSO be added to some other places, but I think this full path makes sense to check like this.

    @joneskoo
    Copy link
    Mannequin Author

    joneskoo mannequin commented Jan 6, 2012

    Updated patch for 2.7 hg tip attached. Please review, test and if ok, port to 3.x.

    I guess the URL needs to be quoted so commented out the assertion for the URL being equal. I added unquote in the base64 encoding of the password, which makes the test pass. Seems to work for me and no urllib tests were broken. Did not run others.

    http://test.webdav.org/ has some basic auth test accounts configured if you want to try it out. You can use wireshark to grab the base64 from the unencrypted http. Fancy opener works for me with this now, too.

    @orsenthil
    Copy link
    Member

    Some review comments. Instead of doing the inline unquote like this -

    •        auth = base64.b64encode(user_passwd).strip()
      

    + auth = base64.b64encode(unquote(user_passwd)).strip()

    It is better to do the explicitly above the b64 encoding step.
    Just as host has been unquoted.

                    user_passwd, host = splituser(host)
                    host = unquote(host)
    

    Also, you have done this only for https_open, the same would need be replicated for http_open and also for proxy_passwd. Also on tests, Modifying sendall with

                 def sendall(self, str):
    -                pass
    +                FakeHTTPConnection.request += str

    seems a bit odd to me, you are using a class level object and adding a str. I think, there should be better way to do. (I shall provide an example). Also str term can replaced, even if it was coming from old code.

    @orsenthil orsenthil self-assigned this Jan 6, 2012
    @joneskoo
    Copy link
    Mannequin Author

    joneskoo mannequin commented Jan 6, 2012

    It is better to do the explicitly above the b64 encoding step.
    Just as host has been unquoted.

                    user_passwd, host = splituser(host)
                    host = unquote(host)
    

    Ok. So it needs to be done on the line after import base64.

    Also, you have done this only for https_open, the same would need be
    replicated for http_open and also for proxy_passwd.

    So four cases where this may need to be fixed and my test only covers one of them:

    • http without proxy
    • http with proxy
    • https without proxy
    • https with proxy

    Copypasted code :(

    Can the https and the proxy auth be tested with the same fake http connection, when the request is stored?

             def sendall(self, str):
    
    •            pass
      
    •            FakeHTTPConnection.request += str
      

    seems a bit odd to me

    Agreed, not clean and needs to be fixed. Works here, but could cause the test code to become unreadable later on. Wrote it at 5 am, not getting sleep. Please provide a cleaner alternative. :)

    One question: how come the fake http is given HTTP headers in some tests and payload only in others? Is it emulating the TCP stream or the payload stream handle? It can't do both, can it? Are the headers in some tests actually doing anything?

    I would not mind if someone else finished the patch :)

    @mmaker
    Copy link
    Mannequin

    mmaker mannequin commented Jan 8, 2012

    There's no need to port your patch over python3k, since urllib behaves differently with http passwords - as you can see in the doc http://docs.python.org/dev/py3k/library/urllib.request.html#examples

    I would be glad to finish your password on 2.7 as soon as possible, joneskoo. Thanks.

    @mmaker
    Copy link
    Mannequin

    mmaker mannequin commented Jan 9, 2012

    Patch attached. Note that now unquote is called with host using map(), and b64 encoded strings are no more hardcoded. Please tell me if those changes are acceptable - anyway they don't break any other unit tests.

    @joneskoo
    Copy link
    Mannequin Author

    joneskoo mannequin commented Jan 10, 2012

    Michele, in your patch:

    + authorization = ("Authorization: Basic %s\r\n" %
    + b64encode('a%20b:c%20d'))

    This is wrong. See the original report by me and RFC 2617. The username and password MUST NOT be url encoded before base64. That is the original problem. The point is that this test should fail with urllib before the change and not fail with the fix.

    Secondly, I think unquote will fail when given a None. For me, some other unit tests caught this when I had the unquote where the splituser is called. I didn't run your code but are other urllib tests ok for you?

    I like your change of having the base64 explicitly there and not as a magic string is a good idea.

    Senthil, could you provide the better alternative for the class field abuse, please?

    @mmaker
    Copy link
    Mannequin

    mmaker mannequin commented Jan 10, 2012

    Whoops, probably I tested using $ python instead of $ ./python.exe -
    Attaching two patches, one keeps using map(), but definitely changes unquote() behavior; the other simply asserts user_passwd exists before using unquote().

    Well, concerning the class field abuse, HTTPConnection._buffer attribute might help? The point is that I can't find an easy way to get the HTTPConnection instance from an urlopen().

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 10, 2012

    New changeset 01ef13e9e225 by Senthil Kumaran in branch '2.7':

    @orsenthil
    Copy link
    Member

    Joonas and Michele - The fix along with the tests is in for 2.7 line. W.r.t to tests having it in the class level buf seems to be only easy way, for other it seemed to be that FakeSocket and FakeConnection stuff need some major change (and resulted in breaking of some tests). I thought it is better to push it in the current form as we use the buf for temporary storage and verification.
    I think, those tests can be brought into 3.x line as well.

    @joneskoo
    Copy link
    Mannequin Author

    joneskoo mannequin commented Jan 14, 2012

    Senthil, I ported the tests to 3.2. The quoting problem seems to be the same in 3.2 and the new test fails. I don't know how the password managers handle the usernames and passwords in python3 urllib so I did not look at that.

    Could you please also port the fix to relevant python3 branches? I think you can do it much quicker since you are already familiar with both code bases and the fix.

    @joneskoo
    Copy link
    Mannequin Author

    joneskoo mannequin commented Jan 14, 2012

    Also adding a patch that may be enough to fix the problem in python3.2. Review needed, did not test more than passing the previously failed unit test.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 14, 2012

    New changeset 80e3b8de4edd by Senthil Kumaran in branch '3.2':
    Fix Issue bpo-13642: Unquote before b64encoding user:password during Basic Authentication.
    http://hg.python.org/cpython/rev/80e3b8de4edd

    New changeset 4b4029fc8cf2 by Senthil Kumaran in branch 'default':
    merge from 3.2 - Fix Issue bpo-13642: Unquote before b64encoding user:password during Basic Authentication.
    http://hg.python.org/cpython/rev/4b4029fc8cf2

    @orsenthil
    Copy link
    Member

    Here we go! I thought the problem did not exist in py3k, but good that the tests caught them and we have a fix now.

    Thanks for the complete patch, Joonas. I hope it was easy to port the patch to 3k. The encoding part may perhaps be the only thing to careful with and use of the new string format feature was good one. Thanks!

    @joneskoo
    Copy link
    Mannequin Author

    joneskoo mannequin commented Jan 14, 2012

    Updating the issue with version 3.2 tag since it was fixed there as well. Still fixed, of course.

    You are correct that the encodings can be tricky. Luckily I only added coding to tests. But you're right, I would consider very carefully before using similar code outside tests.

    I just realized what's the impact of this change on python3.2 really. Since urllib.request.urlopen (for some reason) does not allow username and password on the URI, it is not possible to hit this with that, I think. But FancyURLopener does allow using user-pass in url, so this bug was reachable. I just verified that and the fix :)

    Fix does the trick for FancyURLopener when the username and password are passed in the URL.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants