Identifier range #702

mnot · 2018-10-08T07:28:23Z

Looking through existing headers, it seems like it would be good to consider expanding the range of characters allowable in identifiers.

. -- gives us hostnames like in Origin and Host
: -- gives us host:port structures like in Alt-Used and Access-Control-Allow-Origin, as well as IPv6 addresses
% -- gives us percent-encoded data like in ALPN
uppercase alpha -- gives us methods, like in Access-Control-Request-Method, Access-Control-Allow-Methods

The main constraint here is making it possible to reliably recognise identifiers. Currently they're denoted by a starting lcapha. It may be possible to expand that to include uppercase but continue to preclude other characters from the starting character without affecting too many use cases.

The text was updated successfully, but these errors were encountered:

mnot · 2018-10-08T07:55:48Z

@annevk any other characters you can think of?

I'm least excited about uppercase alpha here, as it would preclude things like #685.

(to be clear: this is about what existing syntaxes we can accommodate in SH without modification; it's not critical we get them all, just want to get as many as possible).

annevk · 2018-10-08T09:24:49Z

Not immediately. I'm not sure if Origin or Host are as tightly restricted as we'd like. As registrars are not in control over subdomains there's a fair amount of weirdness that can be used there and is somewhat supported by browsers (including over HTTPS due to wildcards), but I haven't investigated in detail.

mnot · 2018-10-08T22:20:13Z

Sure, but the point here is to make sure that identifier can accommodate them, not exactly reflect their constraints. Unless we want to define a hostname datatype that exactly reflects the constraints of that structure.

annevk · 2018-10-09T08:54:41Z

We might be talking past each other. The problem I have is that I'm not sure what the actual scope is. E.g., perhaps + or ! are allowed at times, even though they really shouldn't be. (Similar to how _ is grandfathered in for certificates of non-wildcard subdomains despite not being allowed by RFCs.)

mnot · 2018-10-10T23:50:55Z

Ah. Would be good to get some tests in, then.

Worst case, if characters are precluded and then it turns out they're needed they can be encoded, but that seems suboptimal.

mnot · 2018-11-12T05:30:21Z

Discussed in Bangkok. @martinthomson suggested that we split identifiers that can be values out, to keep the "normal" identifiers simple.

For #702

mnot · 2018-11-12T05:52:40Z

OK, they're split out into key and identifier now, with key only to be used internally in SH.

I'm going to add . : and % to identifier, since they seem to meet a number of use cases (see here for details).

My first inclination is to keep the first character restricted to lcalpha, but I could probably be talked out of it -- we have a number of non-alphanumeric characters left for delimiting new types down the road, and I don't think we'll be adding tons more over time (he says...).

Thoughts?

mnot · 2018-11-12T05:54:16Z

(i.e., talked into allowing uppercase alpha and maybe numerics; not all of the special chars).

For #702

mnot · 2018-11-12T05:56:03Z

Also, it would be good if folks had a look at the restrictions on key to make sure they're happy. Since it's in a predictable location, we can do things like remove constraints on the first character if we want to.

annevk · 2018-11-12T08:43:30Z

Can we later change identifier if we identify more code points needed for hosts? (I filed an issue on testing that, as I wasn't easily able to do it myself and it's not really a priority for me relative to other things.)

mnot · 2018-11-12T22:40:18Z

@annevk nope; once it ships we can't really change it, because some implementations will fail on the characters you add.

However, if you find that a header can't use identifier, you can always use string. If you're back porting to an existing header, you can define a second header that takes the string form and negotiate for it; this is the plan for existing HTTP headers that can't fit into SH.

For example, if Foo currently looks like:

Foo: 1this-is-not-an-identifier!

You can define:

SH-Foo: "1this-is-not-an-identifier!"

... and then use a HTTP/2 SETTING to negotiate (hop-by-hop) for automagically doing the right thing WRT Foo and SH-Foo. I'm expecting to do this in a separate spec soon-ish.

annevk · 2018-11-13T08:30:12Z

Hmm okay, maybe it's not such a big problem anyway. I've started properly defining the existing headers with parsers and I haven't really run into any major issues thus far other than testing being time consuming.

mnot · 2018-11-13T23:09:14Z

Thanks @annevk. If you find a solution to that, please tell us.

mnot · 2018-11-23T04:50:12Z

I think the only remaining questions here are:

Whether to allow uppercase in identifier as the first character
Whether to allow uppercase in identifier after the first character

My inclination is yes on both at this point; but I'd be willing to be talked out of 0 especially.

(yes, it consumes a lot of characters for identifying new types, but we still have a fair number of special characters left, and I don't think we'll be adding that many - famous last words ;)

Thoughts?

annevk · 2018-11-23T08:42:24Z

I think that's reasonable.

For #702.

for #702

mnot added the header-structure label Oct 8, 2018

annevk mentioned this issue Oct 11, 2018

Arbitrary subdomains web-platform-tests/wpt#13465

Open

mnot added a commit that referenced this issue Nov 12, 2018

Split out key from identifier

7171555

For #702

mnot added a commit that referenced this issue Nov 12, 2018

Add . : % to identifier

176801c

For #702

mnot added a commit that referenced this issue Nov 12, 2018

Change notes for #702

9a73185

mnot mentioned this issue Nov 13, 2018

List of Lists? #721

Closed

mnot added a commit that referenced this issue Nov 27, 2018

allow uppercase in Identifier

9134a4d

For #702.

mnot added a commit that referenced this issue Nov 27, 2018

Missed one lcalpha

2ace9ba

for #702

mnot closed this as completed Nov 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identifier range #702

Identifier range #702

mnot commented Oct 8, 2018 •

edited

Loading

mnot commented Oct 8, 2018

annevk commented Oct 8, 2018

mnot commented Oct 8, 2018

annevk commented Oct 9, 2018

mnot commented Oct 10, 2018

mnot commented Nov 12, 2018

mnot commented Nov 12, 2018

mnot commented Nov 12, 2018

mnot commented Nov 12, 2018

annevk commented Nov 12, 2018

mnot commented Nov 12, 2018

annevk commented Nov 13, 2018

mnot commented Nov 13, 2018

mnot commented Nov 23, 2018

annevk commented Nov 23, 2018

Identifier range #702

Identifier range #702

Comments

mnot commented Oct 8, 2018 • edited Loading

mnot commented Oct 8, 2018

annevk commented Oct 8, 2018

mnot commented Oct 8, 2018

annevk commented Oct 9, 2018

mnot commented Oct 10, 2018

mnot commented Nov 12, 2018

mnot commented Nov 12, 2018

mnot commented Nov 12, 2018

mnot commented Nov 12, 2018

annevk commented Nov 12, 2018

mnot commented Nov 12, 2018

annevk commented Nov 13, 2018

mnot commented Nov 13, 2018

mnot commented Nov 23, 2018

annevk commented Nov 23, 2018

mnot commented Oct 8, 2018 •

edited

Loading