Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe OAuth's use of application/x-www-form-urlencoded encoding #128

Closed
panva opened this issue Jul 25, 2022 · 22 comments
Closed

Describe OAuth's use of application/x-www-form-urlencoded encoding #128

panva opened this issue Jul 25, 2022 · 22 comments
Labels
help wanted Extra attention is needed ietf-117 interim Items to discuss in the next WG interim meeting
Milestone

Comments

@panva
Copy link
Member

panva commented Jul 25, 2022

499a8f8 removed Appendix B but unlinked references to it are still within the document next to almost every x-www-form-urlencoded mention.


Throughout the years this Appendix was THE resource used over and over to explain why and how the OAuth use of the Basic authorization scheme encodes the username and password tokens.

If this Appendix is removed I would propose to add examples of client secret basic with username and password tokens where the client_id and client_secret encoding changes the octets that go into the basic authorization scheme base64 encoding. This is an often overlooked implementation detail that both client and server implementers get wrong and end up inoperable, further driving users to use client secret post which this document marks as NOT RECOMMENDED.

@aaronpk
Copy link
Member

aaronpk commented Nov 8, 2022

Does OAuth do something different than what's describe in the HTTP Basic Authentication Scheme spec?

https://datatracker.ietf.org/doc/html/rfc7617#section-2

@panva
Copy link
Member Author

panva commented Nov 8, 2022

Yes.

The client identifier is encoded using the
"application/x-www-form-urlencoded" encoding algorithm per
Appendix B, and the encoded value is used as the username; the client
password is encoded using the same algorithm and used as the
password.

The client_id and client_secret are encoded before being passed to HTTP Basic as the username and password tokens.

@aaronpk
Copy link
Member

aaronpk commented Nov 8, 2022

I'm going to need someone who knows more about string encodings than me to write this section. I think it would be useful to have a new appendix that describes this encoding, but we can update it to avoid mentioning the out of date info in the 6749 appendix B.

@aaronpk
Copy link
Member

aaronpk commented Nov 8, 2022

In the mean time, there is a placeholder for this section and the internal references to it have been fixed

https://drafts.oauth.net/oauth-v2-1/draft-ietf-oauth-v2-1.html#appendix-B

@aaronpk aaronpk added this to the version -08 milestone Nov 8, 2022
@aaronpk aaronpk added the help wanted Extra attention is needed label Nov 8, 2022
@adeinega
Copy link
Contributor

adeinega commented Nov 9, 2022

String encoding does what it does, it escapes certain characters in the input string. Below are some JS examples with the intent to show the differences

> console.log("Authorization: Basic " + btoa("client:secret"))
< Authorization: Basic Y2xpZW50OnNlY3JldA==
>
> console.log("Authorization: Basic " + btoa(encodeURIComponent("client") + ":" + encodeURIComponent("secret")))
< Authorization: Basic Y2xpZW50OnNlY3JldA==

in both cases, we get the same results

now, let's use "secret=" instead of "secret"

> console.log("Authorization: Basic " + btoa("client:secret="))
< Authorization: Basic Y2xpZW50OnNlY3JldD0=
>
> console.log("Authorization: Basic " + btoa(encodeURIComponent("client") + ":" + encodeURIComponent("secret=")))
< Authorization: Basic Y2xpZW50OnNlY3JldCUzRA==

we get different results.

The OAuth specification requires us to use "application/x-www-form-urlencoded" encoding algorithm, and it's very often a source for interoperability issues as @panva mentions.

@panva
Copy link
Member Author

panva commented Nov 9, 2022

The correct js code that matches original appendix b behaviour to encode e.g. the client_id before passing it as username to basic auth is:

function formUrlEncode(token) {
  return encodeURIComponent(token).replace(/(?:[-_.!~*'()]|%20)/g, (substring) => {
    switch (substring) {
      case '-':
      case '_':
      case '.':
      case '!':
      case '~':
      case '*':
      case "'":
      case '(':
      case ')':
        return `%${substring.charCodeAt(0).toString(16).toUpperCase()}`
      case '%20':
        return '+'
      default:
        throw new Error()
    }
  })
}

@aaronpk aaronpk changed the title Original Appendix B was removed but non-link references remain Describe OAuth's use of application/x-www-form-urlencoded encoding Jul 21, 2023
@aaronpk aaronpk modified the milestones: version -09, version -10 Aug 20, 2023
@shartte
Copy link

shartte commented Oct 2, 2023

Does OAuth do something different than what's describe in the HTTP Basic Authentication Scheme spec?

https://datatracker.ietf.org/doc/html/rfc7617#section-2

I believe the important bit is that application/x-www-form-urlencoded specifies UTF-8 encoding, while basic authentication doesn't really say which encoding is to be used (causing interop issues when client secrets contain any non-ASCII characters).
by saying "use application/x-www-form-urlencoded" to encode the client-id and client-secret before using them as username/password, it is ensured that the basic auth username and password are ASCII only.

@panva
Copy link
Member Author

panva commented Jan 6, 2024

This continues to be a problematic point in the implementations. Both client and server implementations sometimes just rely on Basic Authentication Scheme plugins completely, or partially disregarding the requirement to encode the username/password tokens using application/x-www-form-urlencoded. This makes interop for client implementations particularly difficult.

I've encountered clients that

  • don't do any encoding
  • encode every non-alphanumeric character except - _ . ! ~ * ' ( ) and incorrectly uses %20 instead of + for the U+0020 (SPACE) character
  • encode every non-alphanumeric character except - _ . ! ~ * ' ( )
  • do everything correctly

I've encountered servers that

  • do not use any non-alphanumeric characters in their client_id and client_secret (deliberately limiting the syntax of these) as to avoid this mess altogether
  • use non-alphanumeric characters and gracefully handle the client mistakes mentioned above with the exception of no encoding used
  • use non-alphanumeric characters but fail to authenticate clients that encode properly as per the definition in OAuth

The biggest offender here is Google who uses - and . in their client_id values but fails to authenticate clients that properly encode those characters. Because of this, it's been the case for years that the most interoperable client implementation was to encode every non-alphanumeric character except - _ . ! ~ * ' ( ), which doesn't follow the definition.

cc @jogu This issue is even present in the Google Federated Identity listing on https://openid.net/certification where the test suite back then did not properly encode. I hope that if the suite ran against google's implementation today the certification would fail.

What a mess.

  • I think we should be giving more examples in the document. The current example of client_secret_basic avoids non-alphanumeric characters.
  • I think we should discuss restricting the client_id and client_secret syntax to alphanumerics only to avoid this mess completely.

I'd like this issue to be discussed in future OAuth WG meetings.

@jogu
Copy link

jogu commented Jan 7, 2024

I think we should discuss restricting the client_id and client_secret syntax to alphanumerics only to avoid this mess completely.

I'd be very much in favour of something along these lines. The experience of the OIDF certification team very much matches Filip's experience - any time values end up with characters in them that require encoding the number of interoperability problems sky rockets, and often in ways that are complex and time consuming to resolve (as these kind of encodings are often happening automatically in libraries and not always in obvious ways). It really doesn't seem like a good use of any developer's time to try and make these things work correctly when there's really no great advantage to using the characters that do require encoding.

(This is perhaps also a more general problem, e.g. similar interoperability issues arise if a client actually tries to use a state value that uses the current full set of allowed characters in state as per https://www.rfc-editor.org/rfc/rfc6749#appendix-A.5 )

@aaronpk aaronpk modified the milestones: version -10, version -11 Jan 31, 2024
@selfissued
Copy link

OpenID Federation Client IDs are HTTPS URLs. Some OpenID4VC Client IDs are as well. We need to keep the full character set.

@aaronpk
Copy link
Member

aaronpk commented Mar 22, 2024

So interestingly, it isn't possible to use an HTTPS URL as a userid in HTTP Basic Auth, since HTTP Basic Auth defines userid as *<TEXT excluding ":">: https://datatracker.ietf.org/doc/html/rfc2617#section-2

Updated reference: https://datatracker.ietf.org/doc/html/rfc7617#section-2

@panva
Copy link
Member Author

panva commented Mar 22, 2024

I'm very glad this was discussed at IETF 119. Unfortunately it doesn't seem like any changes are likely to be acceptable.

This leaves client and server implementers in a tough spot where neither one can be compliant and/or strict about encoding/decoding the username/password tokens when interoperability is a goal because as seen - this current definition is already not followed.

What guidance should we give to implementers, primarily clients, to be interoperable then? Don't use client_secret_basic, use client_secret_post instead? That's explicitly not meant to be the default and is defined as NOT RECOMMENDED.

Including the client credentials in the request content using the two parameters is NOT RECOMMENDED and SHOULD be limited to clients unable to directly utilize the HTTP Basic authentication scheme (or other password-based HTTP authentication schemes).

@aaronpk
Copy link
Member

aaronpk commented Mar 22, 2024

I suppose the other way to go is to open it up more rather than restrict it, for example saying that only the : character in a client ID or secret is encoded, but all other characters are left alone. That would seem to be allowed according to RFC7617.

While this wouldn't automatically make everything interoperable, it would at least give people a faster path to interoperability for new implementations. Existing implementations that do some mix of pre-encoding the client ID/secret could be described as essentially defining their own encoding of the client ID/secret that the client developer would have to know by reading the docs, just like they already have to read the docs to know what scopes to use.

@panva
Copy link
Member Author

panva commented Mar 22, 2024

@aaronpk unfortunately this would also not be compatible, in this case with existing implementations, which would continue to decode the unencoded resulting in malformed data and failed client authentications.

@aaronpk
Copy link
Member

aaronpk commented Mar 22, 2024

Right, but as was mentioned in the meeting, changing the spec also won't necessarily make existing implementations interoperable either. So my suggestion is more about making the spec reflect the reality of the variety of implementations instead.

@shartte
Copy link

shartte commented Mar 22, 2024

I suppose the other way to go is to open it up more rather than restrict it, for example saying that only the : character in a client ID or secret is encoded, but all other characters are left alone. That would seem to be allowed according to RFC7617.

While this wouldn't automatically make everything interoperable, it would at least give people a faster path to interoperability for new implementations. Existing implementations that do some mix of pre-encoding the client ID/secret could be described as essentially defining their own encoding of the client ID/secret that the client developer would have to know by reading the docs, just like they already have to read the docs to know what scopes to use.

This does not help IT departments that require "special characters" in generated passwords (please don't ask), which are non-ASCII (i.e. §). If clients encode the client secret as per the spec, the character set is well defined as UTF-8. If they don't encode it, it isn't.

I'd personally say the reasoning for the encoding is clear. Making it known to implementers that they have to take care here is the key problem. :-(

@panva
Copy link
Member Author

panva commented Mar 22, 2024

This does not help IT departments that require "special characters" in generated passwords (please don't ask), which are non-ASCII (i.e. §). If clients encode the client secret as per the spec, the character set is well defined as UTF-8. If they don't encode it, it isn't.

Non-ASCII characters are outside of the ABNF definition for client_id and client_secret in the first place.

I'd personally say the reasoning for the encoding is clear. Making it known to implementers that they have to take care here is the key problem. :-(

I agree with this.


Putting my server developer hat on. I can deal with plenty of non-conformities coming from the clients on the server side.

Putting my client developer hat on. I cannot deal with the different server implementations when they're not flexible. I would like to default to client_secret_post instead of client_secret_basic because of this issue but the fact that it's, a) a MAY support and b) as per spec, NOT RECOMMENDED, stops me from doing so. If that wasn't the case I'd be happy with the current character set and encoding rules, I could implement them properly in the client software and know that by default POST params are used that work with likely every server implementation.

@jogu
Copy link

jogu commented Mar 22, 2024

Sadly I couldn't join the meeting last night because of timezones, but I watched the recording at least.

A couple of observations:

The Openbanking use cases that were mentioned that use urls as client ids don't use client secrets. I believe most federation use cases don't allow client secrets either. But I do agree with the conclusion that there's (unfortunately) not much we can/should do to restrict the client_id character set in general.

I believe the problem only applies to client_secret_basic. I'm not sure how many of the problem authorization servers also support client_secret_post, but it seems google at least does.

I agree with Aaron's suggestion of making the reality clearer, and Filip's suggestion of adding examples that cover this case.

I think we could also encourage people to use client_secret_post instead of basic. (Although OAuth2.1 already recommends using asymmetric client authentication so already recommends against all forms of client secrets...)

@jogu
Copy link

jogu commented Mar 22, 2024

@panva (I didn't see your comment before posting mine)

I would like to default to client_secret_post instead of client_secret_basic because of this issue but the fact that it's, as per spec, NOT RECOMMENDED, stops me from doing so.

I don't think I was aware of that - where abouts is that recommendation?

@panva
Copy link
Member Author

panva commented Mar 22, 2024

I don't think I was aware of that - where abouts is that recommendation?

@jogu

OAuth 2.0 section 2.3.1
OAuth 2.1 Draft 10 section 2.4.1

@jogu
Copy link

jogu commented Mar 22, 2024

Thanks Filip. That's a pain. I guess my suggestion is useless then.

@aaronpk aaronpk added the interim Items to discuss in the next WG interim meeting label May 11, 2024
@aaronpk
Copy link
Member

aaronpk commented May 14, 2024

As discussed in the May 14 interim:

Resolved to:

  • Remove the "NOT RECOMMENDED" paragraph (but leave sentence "The parameters can only be transmitted in the request content and MUST NOT be included in the request URI.")
  • Change AS requirements to MUST support post body, MAY support HTTP Basic
  • Add a note about historic interop problems with HTTP Basic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed ietf-117 interim Items to discuss in the next WG interim meeting
Projects
None yet
Development

No branches or pull requests

6 participants