Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifier range #702

Closed
3 of 4 tasks
mnot opened this issue Oct 8, 2018 · 15 comments
Closed
3 of 4 tasks

Identifier range #702

mnot opened this issue Oct 8, 2018 · 15 comments

Comments

@mnot
Copy link
Member

mnot commented Oct 8, 2018

Looking through existing headers, it seems like it would be good to consider expanding the range of characters allowable in identifiers.

  • . -- gives us hostnames like in Origin and Host
  • : -- gives us host:port structures like in Alt-Used and Access-Control-Allow-Origin, as well as IPv6 addresses
  • % -- gives us percent-encoded data like in ALPN
  • uppercase alpha -- gives us methods, like in Access-Control-Request-Method, Access-Control-Allow-Methods

The main constraint here is making it possible to reliably recognise identifiers. Currently they're denoted by a starting lcapha. It may be possible to expand that to include uppercase but continue to preclude other characters from the starting character without affecting too many use cases.

@mnot
Copy link
Member Author

mnot commented Oct 8, 2018

@annevk any other characters you can think of?

I'm least excited about uppercase alpha here, as it would preclude things like #685.

(to be clear: this is about what existing syntaxes we can accommodate in SH without modification; it's not critical we get them all, just want to get as many as possible).

@annevk
Copy link

annevk commented Oct 8, 2018

Not immediately. I'm not sure if Origin or Host are as tightly restricted as we'd like. As registrars are not in control over subdomains there's a fair amount of weirdness that can be used there and is somewhat supported by browsers (including over HTTPS due to wildcards), but I haven't investigated in detail.

@mnot
Copy link
Member Author

mnot commented Oct 8, 2018

Sure, but the point here is to make sure that identifier can accommodate them, not exactly reflect their constraints. Unless we want to define a hostname datatype that exactly reflects the constraints of that structure.

@annevk
Copy link

annevk commented Oct 9, 2018

We might be talking past each other. The problem I have is that I'm not sure what the actual scope is. E.g., perhaps + or ! are allowed at times, even though they really shouldn't be. (Similar to how _ is grandfathered in for certificates of non-wildcard subdomains despite not being allowed by RFCs.)

@mnot
Copy link
Member Author

mnot commented Oct 10, 2018

Ah. Would be good to get some tests in, then.

Worst case, if characters are precluded and then it turns out they're needed they can be encoded, but that seems suboptimal.

@mnot
Copy link
Member Author

mnot commented Nov 12, 2018

Discussed in Bangkok. @martinthomson suggested that we split identifiers that can be values out, to keep the "normal" identifiers simple.

mnot added a commit that referenced this issue Nov 12, 2018
@mnot
Copy link
Member Author

mnot commented Nov 12, 2018

OK, they're split out into key and identifier now, with key only to be used internally in SH.

I'm going to add . : and % to identifier, since they seem to meet a number of use cases (see here for details).

My first inclination is to keep the first character restricted to lcalpha, but I could probably be talked out of it -- we have a number of non-alphanumeric characters left for delimiting new types down the road, and I don't think we'll be adding tons more over time (he says...).

Thoughts?

@mnot
Copy link
Member Author

mnot commented Nov 12, 2018

(i.e., talked into allowing uppercase alpha and maybe numerics; not all of the special chars).

mnot added a commit that referenced this issue Nov 12, 2018
@mnot
Copy link
Member Author

mnot commented Nov 12, 2018

Also, it would be good if folks had a look at the restrictions on key to make sure they're happy. Since it's in a predictable location, we can do things like remove constraints on the first character if we want to.

mnot added a commit that referenced this issue Nov 12, 2018
@annevk
Copy link

annevk commented Nov 12, 2018

Can we later change identifier if we identify more code points needed for hosts? (I filed an issue on testing that, as I wasn't easily able to do it myself and it's not really a priority for me relative to other things.)

@mnot
Copy link
Member Author

mnot commented Nov 12, 2018

@annevk nope; once it ships we can't really change it, because some implementations will fail on the characters you add.

However, if you find that a header can't use identifier, you can always use string. If you're back porting to an existing header, you can define a second header that takes the string form and negotiate for it; this is the plan for existing HTTP headers that can't fit into SH.

For example, if Foo currently looks like:

Foo: 1this-is-not-an-identifier!

You can define:

SH-Foo: "1this-is-not-an-identifier!"

... and then use a HTTP/2 SETTING to negotiate (hop-by-hop) for automagically doing the right thing WRT Foo and SH-Foo. I'm expecting to do this in a separate spec soon-ish.

@mnot mnot mentioned this issue Nov 13, 2018
@annevk
Copy link

annevk commented Nov 13, 2018

Hmm okay, maybe it's not such a big problem anyway. I've started properly defining the existing headers with parsers and I haven't really run into any major issues thus far other than testing being time consuming.

@mnot
Copy link
Member Author

mnot commented Nov 13, 2018

Thanks @annevk. If you find a solution to that, please tell us.

@mnot
Copy link
Member Author

mnot commented Nov 23, 2018

I think the only remaining questions here are:

  1. Whether to allow uppercase in identifier as the first character
  2. Whether to allow uppercase in identifier after the first character

My inclination is yes on both at this point; but I'd be willing to be talked out of 0 especially.

(yes, it consumes a lot of characters for identifying new types, but we still have a fair number of special characters left, and I don't think we'll be adding that many - famous last words ;)

Thoughts?

@annevk
Copy link

annevk commented Nov 23, 2018

I think that's reasonable.

mnot added a commit that referenced this issue Nov 27, 2018
mnot added a commit that referenced this issue Nov 27, 2018
@mnot mnot closed this as completed Nov 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants