Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support unicode characters in authentication header #212

Closed
nikos opened this issue Apr 22, 2014 · 11 comments
Closed

Support unicode characters in authentication header #212

nikos opened this issue Apr 22, 2014 · 11 comments
Labels
bug Something isn't working

Comments

@nikos
Copy link

nikos commented Apr 22, 2014

If a unicode char (here for example german umlaut ö = 0xc3), is part of the authentication header an error is thrown:

    $ http -a test:654ö21 example.com

    http: error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 20: ordinal not in range(128)

I am using Python 2.7.3 on a plain Ubuntu 12.04.4 LTS system.

@jkbrzt jkbrzt added the bug label Apr 22, 2014
@sigmavirus24
Copy link

There was a similar issue raised against requests (https://github.com/kennethreitz/requests/issues/1926) 2 months ago. The important part from that is: @Lukasa's comment. In short: RFC 2616 only allows for characters in the Latin-1 encoding, if you want to pass unicode characters as part of a header value there are two options:

  1. You yourself turn the unicode into a string of octets
  2. httpie is modified to do 1. for you.

In short, this is not actually a bug in the implementation as we are being 100% compliant with the RFC.

@sigmavirus24
Copy link

Actually, there's a third option that httpie can consider: the requests-toolbelt is considering adding functionality to handle this for users of requests.

If anyone's interested in contributing to this effort, please continue the discussion over there.

@jkbrzt
Copy link
Member

jkbrzt commented Apr 24, 2014

Hm, and what about simply using UTF8? That seems to work for Opera.

http://stackoverflow.com/questions/702629/utf-8-characters-mangled-in-http-basic-auth-username

// Btw, thank you @sigmavirus24 for so often providing useful upstream context for HTTPie issues. It's very helpful 👍

@Lukasa
Copy link

Lukasa commented Apr 24, 2014

Opera does it, but no-one else does. From the same question:

  • IE uses the default codepage.
  • Mozilla uses only the lower byte of character codepoints, which has the effect of encoding to ISO-8859-1 and mangling the non-8859-1 characters irretrievably... except when doing XMLHttpRequests, in which case it uses UTF-8
  • Safari and Chrome encode to ISO-8859-1, and fail to send the authorization header at all when a non-8859-1 character is used.

The real fix here was pointed out in IRC, which is this draft RFC coming out of the HTTPbis. When this draft becomes a standard, I'll happily implement support for it in requests.

@jkbrzt
Copy link
Member

jkbrzt commented Apr 24, 2014

@Lukasa I see. It looks like the best solution (for HTTPie anyway) would be to fail with an informative message in case of non-ascii characters in basic auth credentials.

@nikos is there another HTTP client (CLI, web browser) which allows you to log in with these credentials?

@sigmavirus24
Copy link

@jkbr I agree.

There is another user-agent that allows you to use UTF-8 (as can be discovered in the requests issue I linked): cURL. The problem as I see it is that if you just read the introduction to the draft RFC that @Lukasa linked, this is not really universally supported behaviour.

cURL does the following:

$  curl -u'foobar:abcö2' https://httpbin.org/get
{
  "url": "http://httpbin.org/get",
  "headers": {
    "User-Agent": "curl/7.30.0",
    "Accept": "*/*",
    "Authorization": "Basic Zm9vYmFyOmFiY8O2Mg==",
    "Connection": "close",
    "X-Request-Id": "48556e34-492b-4d58-b164-37cc8f9eb6e7",
    "Host": "httpbin.org"
  },
  "origin": "173.229.2.112",
  "args": {}
}

If you decode the parameter (using Python's base64 library) to the Basic authorization, you get: foobar:abc\xc3\xb62. If you use *nix's base64 command-line util, you get the original string back.

@jkbrzt
Copy link
Member

jkbrzt commented Apr 24, 2014

@sigmavirus24 It looks like using UTF-8 & printing a warning message is the most pragmatic way to go. HTTPie users are likely to have previously used cURL.

@sigmavirus24
Copy link

@jkbr I'm afraid that likely will not work:

>>> auth
('foobar', 'abc\xc3\xb62')
>>> ('%s:%s' % auth).encode('latin1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

That's roughly what requests does when you pass it the auth tuple. If you want to support this, you may have to construct the header yourself:

>>> auth
('foobar', 'abc\xc3\xb62')
>>> base64.b64encode('%s:%s' % auth)
'Zm9vYmFyOmFiY8O2Mg=='

Given the vagueness of the specification around the basic authentication header, I wonder if the username/password actually have to be latin-1 encoded before they are base64 encoded. I'll have to research this. We may be able to relax this constraint in requests if so.

@Lukasa
Copy link

Lukasa commented Apr 24, 2014

@sigmavirus24 We've already covered this in this discussion repeatedly: the specification provides no guidance as to text encoding because it was written by Americans at a time where text encoding was not a concern. The only thing that's safe is latin1, because that's the only text encoding ever mentioned with respect to headers in HTTP.

There is no "have to" here. Requests can absolutely decide to use UTF-8 if we wanted to, but I guarantee we'll break someone's running code where their webserver assumes that they'll be getting ISO 8859-1 but now start getting multibyte sequences from UTF-8.

Requests has made a choice and I'm pretty happy with it at the moment. Users such as httpie should absolutely feel free to override that choice so long as they're equally aware that they could break currently running code. =)

@sigmavirus24
Copy link

@jkbr looks like you have your solution above then ;)

jkbrzt added a commit that referenced this issue Apr 26, 2014
* Immediatelly convert all args from `bytes` to `str`.
* Added `Environment.stdin_encoding` and `Environment.stdout_encoding`
* Allow unicode characters in HTTP headers and basic auth credentials
  by encoding them using UTF8 instead of latin1 (#212).
@jkbrzt
Copy link
Member

jkbrzt commented Apr 26, 2014

It turns out ö is actually part of latin1 and this particular error was a bug in HTTPie. It has been fixed and in addition to that, headers are now UTF8-encoded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants