Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPDigestAuth fails on non-latin credentials #6102

Open
ondratu opened this issue Apr 4, 2022 · 3 comments
Open

HTTPDigestAuth fails on non-latin credentials #6102

ondratu opened this issue Apr 4, 2022 · 3 comments
Labels

Comments

@ondratu
Copy link

ondratu commented Apr 4, 2022

There was issue reported, which is closed with bad results.

if isinstance(username, str):
username = username.encode('latin1')
if isinstance(password, str):
password = password.encode('latin1')

Don't pass unicode strings in the arguments, but use UTF8 bytes instead.

self.session.get(main_url, auth=requests.auth.HTTPDigestAuth("Сергей_Ласточкин".encode('UTF-8'), '1234'))

Originally posted by @D-stefaang in #5089 (comment)

But this is wrong! When i try to set user 'Ondřej' with this advice, requests send bad string:

HTTPDigestAuth('Ondřej'.encode('utf-8'), 'heslíčko')

creates header starts with wrong username!

Digest username="b'Ond\xc5\x99ej'"
@nateprewitt
Copy link
Member

Hi @ondratu,

Could you please clarify what you believe is wrong in this case? ř is the byte-sequence \xc5\x99 in UTF-8, so we'd expect the bytes object to be Ond\xc5\x99ej. We can quickly verify this by checking:

'Ondřej'.encode('utf-8') == b'Ond\xc5\x99ej'
>>> True

It's not clear what other value you'd be expecting.

@nateprewitt
Copy link
Member

Hmm, on closer inspection this does appear to be a bug. We're using the bytes username as an argument to format our string for the header. This causes the full literal "b'Ond\xc5\x99ej'" to be used which I agree doesn't look correct. This header should be encoded as bytes during creation but we currently defer that to be urllib3's problem.

When not encoding the auth/password, we get this:

Traceback (most recent call last):
  File "/Users/nateprewitt/Work/OpenSource/requests/test.py", line 3, in <module>
    r = requests.get('https://httpbin.org/digest-auth/auth/Ondřej/heslíčko', auth=h)
  File "/Users/nateprewitt/Work/OpenSource/requests/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  [...]
  File "/Users/nateprewitt/.pyenv/versions/3.10.3/lib/python3.10/http/client.py", line 1323, in _send_request
    self.putheader(hdr, value)
  File "/Users/nateprewitt/.pyenv/versions/3.10.3/lib/python3.10/site-packages/urllib3/connection.py", line 224, in putheader
    _HTTPConnection.putheader(self, header, *values)
  File "/Users/nateprewitt/.pyenv/versions/3.10.3/lib/python3.10/http/client.py", line 1255, in putheader
    values[i] = one_value.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0159' in position 20: ordinal not in range(256)

Fixing this is unfortunately somewhat complicated for a couple reasons:

1.) Users expect the output of HTTPDigestAuth to be a str in Python 3, changing that is likely breaking.
2.) If we were to encode the string, the libraries current convention would be latin-1 not utf-8. That wouldn't solve this issue.

I don't believe we can ever format this correctly in Python 3 with the current behavior though. I'll need to look more tomorrow, but we may consider a behavior change if self.username/self.password is passed in as bytes.

kalingth added a commit to kalingth/requests that referenced this issue Apr 23, 2023
Resolution for the issue psf#6102

Now, If a username and password are passed already encoded, they will not be affected by being expected as a string. In this case, the function will encode the string-formatted attributes into bytes using the latin-1 charset by default due to convention.
kalingth added a commit to kalingth/requests that referenced this issue Apr 23, 2023
Resolution for the issue psf#6102

Now, If a username and password are passed already encoded, they will not be affected by being expected as a string. In this case, the function will encode the string-formatted attributes into bytes using the latin-1 charset by default due to convention.
@mubaldino
Copy link

Hi all,
we encountered this week as we had a password policy change, too. Unicode is now optional requirement for passwords. But some may use it (I did...) and it broke twine, pip, and other things that would use requests.auth module for authentication.

I saw the exception above, and knew immediately a quick test to see if charset encoding would fix it:

    # requests/auth.py,   _basic_auth_str()
....
    if isinstance(password, str):
        # password = password.encode("latin1")
        password = password.encode("utf-8")            # Hey! IT works,... now just make it an optional kwarg

I realize exposing encoding in the function signatures will have a ripple effect, but I don't think the latin1 assumption is all that great either.

Requested change: add encoding kwarg to _basic_auth_str() and its consumers

def _basic_auth_str(username, passwd, encoding="latin1"):
   ...
   password = password.encode(encoding)
   ..

One line fix and I'm good to go.... hopefully you can include in an upcoming release. Password policy changes are now more frequently including Unicode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants