New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ban %00 from Serialized DisplayString #2615
Comments
IIUC, Display Strings aren't meant for sending byte arrays, therefore I do not think we need to define a special way for conveying NUL characters. Based on that, IMO the question boils down to if we should ban encoders from sending NUL. I do not have a strong opinion on if that should be forbidden, considering the fact that if the receiving application is suspectible to NUL characters decoders have to be careful in the handling of NUL characters regardless of the encoding requirement. Note that decoders are already expected to treat NUL characters specially if necessary, see Security Considerations. Maybe we can just add a reference from the section describing the decoding logic to the Security Considerations, and call it a day. |
One possible good point in favor of banning it is that we know that many configs will just act based on regex pattern search and might find some searched string after a %00 and consider they found what they were looking for, while the implementation will stop before when seeing the zero. I think we should simply say that "0x00" must never be serialized, hence "%00" must never appear in a DisplayString. This implicitly allows end users to write pretty simple protection rules (even regex-based) consisting in blocking the 3-char sequence "%00" in such fields. Encouraging the blocking on the receiver side can be efficient enough to discourage implementations from even trying to purposely send it. |
There is a cost to banning it, which is that the Unicode string to DisplayString encoding function is no longer complete. While C strings can't represent embedded NULs, all other languages' string types can. Those languages would then have a harder time writing a DisplayString API, because now encoding can fail. So we're trading off a nuisance for C vs. a nuisance for everyone else. It is unclear to me what the second option buys us. The second option just means there are two ways to spell U+0000. A C decoder that uses a naive percent-decoded in-memory representation would still need to account for This, however, suggests a fourth option: keep the abstract data format and wire representation as-is. (All Unicode strings are allowed and |
I'm naively suspecting this could be used to attack some elements: fill such a field with as many %00 as you can and it gets doubled in its final representation (e.g. when passed to a next hop) :-/ |
By "this", do you mean the If passing to a next hop is still over HTTP, the representation you send will continue to be If you're worried about a large in-memory representation, I will note that it costs the attacker three bytes ('%' '0' '0') of bandwidth to get you to represent something in two bytes ('\xc0' '\x80'). The attacker would get a better ratio be sending you a bunch of 'a's. But ultimately all this is just a question of what in-memory representation your program uses for U+0000. It just needs to unambiguously represents the possible state space. Indeed I might suggest, rather than doing these |
I see no reason to single out U+0000 from other control characters. Either allow U+0000, or disallow all control characters. The overlong encoding hack will bring more issues than it will solve. |
The only reason to single out U+0000 is that it is the string termination character in C and POSIX environments, and that is still a major cause of CVEs. I would not object to banning all the control characters, but that takes us into UniCode land which is full of dragons...
That is a third way to get the indeterminacy of the "java-hack", but it does have the slight benefit that the implementor of the sf-bis parser may know if that over-long UTF8 sequence will work or not. However, if it does not, then nothing is gained with respect to predictability. |
I'd prefer to stick with a caution here, rather than a prohibition. That means that endpoints (yes, whatever that means) can make decisions about what they tolerate, but we aren't constraining use. That is, the field can carry anything, but a resource might decide not to accept some values. |
How about simply adding: "UniCode category 'Cc' (Control characters) are not permitted, unless the field definition unwisely explicitly permits them." |
That's a field-specific parsing rule, which doesn't seem great from an API perspective. |
Not if it goes in the "Defining New Structured Fields" section ? I think it is perfectly fair game to insist that fields which accept Control Characters are forced to specify that, and ban them by default. My hope is that adding such text will mean that no field will ever be defined to allow them. |
Right, but the software handling that ban isn't a SF processor, it's the field-specific code. I think the most we could do would be something like how we handle field-specific constraint failures in 'defining new...', eg:
But that creates a situation whereby the default is that there's nothing in the field's spec about control characters, the SF generic implementation passes them through, and the field-specific code is expected to know that it should reject them. Not great. So a better approach might be:
|
This is where we realize we have waded into the "no way to be a little bit pregnant with UniCode" sump, isn't it ? I still think that %00 is a valid security concern in C/POSIX environments, but I really do not want to open the UniCode can of worms. Maybe we should just drop this ticket and add a stern security warning that there is no such thing as "safe unicode strings" ? |
I'd be good with a general unicode 'there be dragons' security considerations warning. |
I propose we ban
%00
from serializedDisplayString
, in order to avoid a large class of security issues (and to simplify implementations) in C/POSIX environments where0x00
is the string terminator.I can see three alternatives here.
Ban
%00
outrightIn the serialization of
Display String
we add:In the parsing of `Display String we add:
If either end uses a strict UTF-8 encoder or decoder, which do not allow "overlong byte sequences" (see below), this prevents
U+0000 NULL
from being transported with DisplayString.Specify loose UTF-8 encoding & decoding
Java pioneered a handling of
U+0000 NULL
by specifying that it must be UTF-8 serialized as0xc0 + 0x80
- a so-called "over-long UTF-8 byte sequence".We have presently not specified if the UTF-8 encoding of DisplayString is "strict" where such over-long byte sequences are illegal or "non-strict" where either all or a few over-long byte sequences are accepted.
Over-long byte sequences in general are frowned upon, because they can be used to "obfuscate" UTF-8 strings, for instance by encoding a period as 0xc0 0xae to try to escape directory traversal checks.
We could change the spec to say that the UTF-8 encoding/decoding must be loose (enough) to handle
U+0000 NULL
the Java-way, but I failed to identify a suitable normative reference which didnt bring in a lot of other UniCode/UTF-8 baggage we may not want.Optimistically serialize
0x00
as%c0%80
We can optimistically serialize any
0x00
bytes we encounter in the encoded UTF-8 byte_array using the "java-trick" and leave it to the UTF-8 decoder to either reject or accept as it sees fit.In the serialization of
DisplayString
we add:In the parsing of
DisplayString
we add:Discussion
I'm a big fan of clear text and simple solutions, and I cannot imagine why anybody would ever try to send
U+0000 NULL
through aDisplayString
for non-nefarious purposes, so I am 100% on board with simply banning0x00
bytes in the encoded UTF-8 byte-array.But it is a (tiny) loss of generality, and if we want to avoid that, we either make life difficult for implementers in C/POSIX based environments by rejecting this ticket, or we throw our lot with Java's handling of
U+0000 NULL
to a greater or lesser degree.Optimistically serializing
0x00
, as specified in the third alternative is a way to do that without tying ourselves to Java's mast, but it introduces some wiggle-room which, depending on ones point of view, is either desirable (allowing the receiver to rejectU+0000 NULL
by using a strict UTF-8 decoder) or undesirable (making it anyone's guess ifU+0000 NULL
can be transported in a DisplayString or not.)But Java's handling is a bit of a hack, and allowing (some) over-long UTF-8 byte sequences introduces a class of security issues similar to the one this ticket tries to prevent, so all in all, I think we should just ban
%00
in serializedDisplayString
with the first alternative above.The text was updated successfully, but these errors were encountered: