HTTPDigestAuth fails on non-latin credentials #6102

ondratu · 2022-04-04T06:32:18Z

There was issue reported, which is closed with bad results.

Lines 59 to 63 in 4f6c018

    
           if isinstance(username, str): 
        
               username = username.encode('latin1') 
        
           if isinstance(password, str): 
        
               password = password.encode('latin1')

Don't pass unicode strings in the arguments, but use UTF8 bytes instead.

self.session.get(main_url, auth=requests.auth.HTTPDigestAuth("Сергей_Ласточкин".encode('UTF-8'), '1234'))

Originally posted by @D-stefaang in #5089 (comment)

But this is wrong! When i try to set user 'Ondřej' with this advice, requests send bad string:

HTTPDigestAuth('Ondřej'.encode('utf-8'), 'heslíčko')

creates header starts with wrong username!

Digest username="b'Ond\xc5\x99ej'"

The text was updated successfully, but these errors were encountered:

nateprewitt · 2022-04-04T07:01:50Z

Hi @ondratu,

Could you please clarify what you believe is wrong in this case? ř is the byte-sequence \xc5\x99 in UTF-8, so we'd expect the bytes object to be Ond\xc5\x99ej. We can quickly verify this by checking:

'Ondřej'.encode('utf-8') == b'Ond\xc5\x99ej'
>>> True

It's not clear what other value you'd be expecting.

nateprewitt · 2022-04-04T07:43:08Z

Hmm, on closer inspection this does appear to be a bug. We're using the bytes username as an argument to format our string for the header. This causes the full literal "b'Ond\xc5\x99ej'" to be used which I agree doesn't look correct. This header should be encoded as bytes during creation but we currently defer that to be urllib3's problem.

When not encoding the auth/password, we get this:

Traceback (most recent call last):
  File "/Users/nateprewitt/Work/OpenSource/requests/test.py", line 3, in <module>
    r = requests.get('https://httpbin.org/digest-auth/auth/Ondřej/heslíčko', auth=h)
  File "/Users/nateprewitt/Work/OpenSource/requests/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  [...]
  File "/Users/nateprewitt/.pyenv/versions/3.10.3/lib/python3.10/http/client.py", line 1323, in _send_request
    self.putheader(hdr, value)
  File "/Users/nateprewitt/.pyenv/versions/3.10.3/lib/python3.10/site-packages/urllib3/connection.py", line 224, in putheader
    _HTTPConnection.putheader(self, header, *values)
  File "/Users/nateprewitt/.pyenv/versions/3.10.3/lib/python3.10/http/client.py", line 1255, in putheader
    values[i] = one_value.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0159' in position 20: ordinal not in range(256)

Fixing this is unfortunately somewhat complicated for a couple reasons:

1.) Users expect the output of HTTPDigestAuth to be a str in Python 3, changing that is likely breaking.
2.) If we were to encode the string, the libraries current convention would be latin-1 not utf-8. That wouldn't solve this issue.

I don't believe we can ever format this correctly in Python 3 with the current behavior though. I'll need to look more tomorrow, but we may consider a behavior change if self.username/self.password is passed in as bytes.

Resolution for the issue psf#6102 Now, If a username and password are passed already encoded, they will not be affected by being expected as a string. In this case, the function will encode the string-formatted attributes into bytes using the latin-1 charset by default due to convention.

mubaldino · 2024-04-17T16:41:28Z

Hi all,
we encountered this week as we had a password policy change, too. Unicode is now optional requirement for passwords. But some may use it (I did...) and it broke twine, pip, and other things that would use requests.auth module for authentication.

I saw the exception above, and knew immediately a quick test to see if charset encoding would fix it:

    # requests/auth.py,   _basic_auth_str()
....
    if isinstance(password, str):
        # password = password.encode("latin1")
        password = password.encode("utf-8")            # Hey! IT works,... now just make it an optional kwarg

I realize exposing encoding in the function signatures will have a ripple effect, but I don't think the latin1 assumption is all that great either.

Requested change: add encoding kwarg to _basic_auth_str() and its consumers

def _basic_auth_str(username, passwd, encoding="latin1"):
   ...
   password = password.encode(encoding)
   ..

One line fix and I'm good to go.... hopefully you can include in an upcoming release. Password policy changes are now more frequently including Unicode.

nateprewitt added the Needs More Information label Apr 4, 2022

nateprewitt added Bug and removed Needs More Information labels Apr 4, 2022

kalingth mentioned this issue Apr 22, 2023

chore(charset issue): Resolution for issue 6102 #6422

Closed

kalingth mentioned this issue Apr 23, 2023

chore(charset issue): Resolution for issue 6102 #6426

Closed

kalingth mentioned this issue Jul 26, 2024

chore(charset issue): Resolution for issue 6102 #6772

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTPDigestAuth fails on non-latin credentials #6102

HTTPDigestAuth fails on non-latin credentials #6102

ondratu commented Apr 4, 2022 •

edited

Loading

nateprewitt commented Apr 4, 2022

nateprewitt commented Apr 4, 2022

mubaldino commented Apr 17, 2024

HTTPDigestAuth fails on non-latin credentials #6102

HTTPDigestAuth fails on non-latin credentials #6102

Comments

ondratu commented Apr 4, 2022 • edited Loading

nateprewitt commented Apr 4, 2022

nateprewitt commented Apr 4, 2022

mubaldino commented Apr 17, 2024

ondratu commented Apr 4, 2022 •

edited

Loading