UnicodeEncodeError when auth parameters are outside of latin-1encoding #1926

Closed
oinopion opened this Issue Feb 19, 2014 · 4 comments

Projects

None yet

4 participants

@oinopion

To reproduce:

>>> import requests
>>> requests.get('http://example.com', auth=(u'żółty', u'jaźń'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/paczkowski/.virtualenvs/tmp-8a0b7916bbc9fce4/local/lib/python2.7/site-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/home/paczkowski/.virtualenvs/tmp-8a0b7916bbc9fce4/local/lib/python2.7/site-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/paczkowski/.virtualenvs/tmp-8a0b7916bbc9fce4/local/lib/python2.7/site-packages/requests/sessions.py", line 349, in request
    prep = self.prepare_request(req)
  File "/home/paczkowski/.virtualenvs/tmp-8a0b7916bbc9fce4/local/lib/python2.7/site-packages/requests/sessions.py", line 287, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/home/paczkowski/.virtualenvs/tmp-8a0b7916bbc9fce4/local/lib/python2.7/site-packages/requests/models.py", line 291, in prepare
    self.prepare_auth(auth, url)
  File "/home/paczkowski/.virtualenvs/tmp-8a0b7916bbc9fce4/local/lib/python2.7/site-packages/requests/models.py", line 470, in prepare_auth
    r = auth(self)
  File "/home/paczkowski/.virtualenvs/tmp-8a0b7916bbc9fce4/local/lib/python2.7/site-packages/requests/auth.py", line 48, in __call__
    r.headers['Authorization'] = _basic_auth_str(self.username, self.password)
  File "/home/paczkowski/.virtualenvs/tmp-8a0b7916bbc9fce4/local/lib/python2.7/site-packages/requests/auth.py", line 31, in _basic_auth_str
    return 'Basic ' + b64encode(('%s:%s' % (username, password)).encode('latin1')).strip().decode('latin1')
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u017c' in position 0: ordinal not in range(256)

I am not sure what behaviour is correct here, but rising UnicodeEncodeError is probably not the best.

@sigmavirus24
Collaborator

I don't have the time to find the spec but I think Headers are supposed to be encoded as latin strings (and that's how basic authentication and digest authentication is specified for the server). If Python cannot coerce your unicode credentials to Latin we should raise an exception. UnicodeEncodeError is a good one in my opinion. I don't like the idea of adding yet another exception.

I agree that this doesn't give the user a great deal of information, but at the same time, this is a very accurate message.

@Lukasa should we be checking credentials ahead of time? The problem with that is the fact that we gleam authentication credentials from the Session too. There's no way to check this except in the preparation of the request. If we inherit from the UnicodeDecodeError, we could raise a new exception with the original message attached. It might be more informative to the user, AuthenticationEncodeError? Or perhaps a more generic HeaderEncodeError?

@oinopion

Both Apache and nginx allow UTF-8 in Basic auth. cURL supports it, too.

@Lukasa
Collaborator

Ugh, this is why I hate HTTP. RFC 2616 has the following things to say on this topic:

Firstly, the definitions of headers:

message-header = field-name ":" [ field-value ]
field-name     = token
field-value    = *( field-content | LWS )
field-content  = <the OCTETs making up the field-value
                 and consisting of either *TEXT or combinations
                 of token, separators, and quoted-string>

UTF-8 is necessarily outside the range of tokens and separators, so we need to consider the TEXT BNF rule. Once again, RFC 2616 to the rescue:

The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 Information technology - 8-bit single byte coded graphic - character sets only when encoded according to the rules of RFC 2047 RFC 2047[sic].

TEXT           = <any OCTET except CTLs,
                 but including LWS>

Note please that ISO-8859-1 is a synonym of latin-1.

At this stage things get ambiguous. Strictly speaking, UTF-8 can be represented in the TEXT field as a string of opaque octets (as none of them will be mistaken for ASCII control characters). However, plain UTF-8 does not meet the RFC 2047 encoding requirements.

Requests is between a rock and a hard place. RFC 2616 makes clear that any complaint implementation will be able to handle ISO-8859-1, and makes no guarantees about supplying non-RFC 2047-encoded UTF-8 header values. In such a world, Requests is always going to choose the most-likely-to-succeed case, fitting in with our goal of satisfying the 90% use-case. In your situation @oinopion, I think the best thing to do is to build the Basic Auth header yourself: it's not very hard. =) You can therefore take control of the encoding yourself and choose the approach you know your target server supports.

@Lukasa Lukasa closed this Mar 23, 2014
@mar10 mar10 added a commit to mar10/requests that referenced this issue Jul 10, 2015
@mar10 mar10 Allow non-latin1 credentials
Referring to #1926
Maybe I am wrong, but my understanding was that header fields must be latin-1 encoded.
This is awlays true for Basic Authentication headers, since base64 encoded strings consist of plain ascii.  
I would think however that `<username>:<password>` may contain special characters, as long as client and server assume the same encoding (for example utf8).

This code currently encodes the credentials:
```py
def _basic_auth_str(username, password):
    """Returns a Basic Auth string."""

    authstr = 'Basic ' + to_native_string(
        b64encode(('%s:%s' % (username, password)).encode('latin1')).strip()
    )

    return authstr
```

but could be changed to encode the base64 header string instead:
```py
        ...
        b64encode(('%s:%s' % (username, password))).encode('latin1').strip()
```
725c1ad
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment