(feat) Support semicolon as a param separator #71

leonardb · 2018-09-20T16:01:17Z

Per W3C recommendation ';' should be supported as a param separator.
See: https://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2

This helps solve an external issue where URLs with entities are being munged
For example:
"?param1=value1&note=value" gets converted to "?param1=value1¬e=value"
in browser display

Per W3C recommendation ';' should be supported as a param separator. See: https://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2 This helps solve an external issue where URLs with entities are being munged

essen · 2018-09-20T16:14:17Z

This is a very old document. That text is no longer present in the most recent spec: https://url.spec.whatwg.org/#concept-urlencoded I would recommend you fix the code where you generated this and encode the special characters properly. Erlydtl will do that automatically for example. Note that not encoding them can lead to XSS or worse.

I doubt there's too much utility to having this upstream.

leonardb · 2018-09-20T16:29:28Z

It really is more about being able to support params with ';' separators.

In many cases we're dealing with URLs where params are added by users we cannot control.

When an entity fragment is included in the URL they are munged by the browser, leading to broken URLs. It's not really something the users are capable of understanding or dealing with themselves.

This is not really about wanting to include entities, it's about the browsers assuming that something like '&note=xyzis¬e=xyz, or &copybad=xyzis©bad=xyz`

If we support ';' sep in cowboy we can then issue links as "?param1=value1;note=value" which does not suffer the same potential consequences

As far as I can tell from reading the RFC3986 and other references, ';' is a reserved character available for use as a separator.

essen · 2018-09-20T17:06:41Z

I'm not sure I understand. To give a common example:

User A writes post, that includes a link
Server stores the post
User B goes to the page containing the post
Server services the post, encoding it (if HTML that means characters like & are converted to & including in the URLs)
User B clicks on the link, browser decodes it and sends the user to the correct location

There isn't a lot of cases where this wouldn't work. Maybe if your input was HTML, but then what you really need is an HTML sanitizer library to sanitize the input.

I just don't see the use case, you'll need to be more specific. And assuming there is one, the current implementation should probably have a parse_qs/2 instead of accepting both by default and in the same query strings.

leonardb · 2018-09-20T17:31:17Z

Use case:

User A adds a link to the system for UserB to use

http://example.com/?a=1&copycount=25

(the parameters are not fixed and could be anything)

User B gets the link in their browser as text:
"http://example.com/?a=1&copycount=25" is now displayed as "http://example.com/?a=1©count=25"

User copy/pastes that link into their code

That link is then clicked by one of their visitors and the visitor hits our server

Since we've lost the copycount parameter during the copy/paste the link is seen as invalid

It seems this is true for a large number HTML entities which can be interpreted through &entity

In short, it's really a browser issue causing loss of fidelity with text thanks to it's automatic parsing (even when what you'd think would be a required terminating ';' for the entity is missing)

By supporting the ';' separator we can alleviate the issue (somewhat) as the rendered link would no longer be seen to contain html entities

I agree with the parse_qs/2 as it'd remove magic. I'm assuming that if this moves forward we'd probably want some type of config flag for the separator, defaulting to << $& >> which can be pushed down to cowlib from cowboy

essen · 2018-09-20T17:53:33Z

Use case:

User A adds a link to the system for UserB to use

http://example.com/?a=1&copycount=25

(the parameters are not fixed and could be anything)

User B gets the link in their browser as text:
"http://example.com/?a=1&copycount=25" is now displayed as "http://example.com/?a=1©count=25"

That's what I'm saying. There's no reason for the browser to interpret it as an entity if the server is sending the data properly.

If you mean text as in a text file, the media type needs to be text/plain.

If you mean text as in a part of an HTML file, then you need to encode the special characters. An example encode function could be:

html_encode(Text) -> 
    <<case C of 
        $& -> <<"&amp;">>; 
        $< -> <<"&lt;">>; 
        $> -> <<"&gt;">>; 
        $" -> <<"&quot;">>; 
        $' -> <<"&apos;">>; 
        _ -> <<C/utf8>> 
    end || <<C/utf8>> <= Text>>.

Not encoding data coming from external sources properly creates security flaws like XSS and others.

leonardb · 2018-09-20T18:13:29Z

I'm going to agree with you here. Think I was barking up the wrong tree. We'll should just have to add formatters for different outputs to solve the issue.
Thanks for putting me straight. Closing PR

(feat) Support semicolon as a param separator

f1524d2

Per W3C recommendation ';' should be supported as a param separator. See: https://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2 This helps solve an external issue where URLs with entities are being munged

leonardb closed this Sep 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(feat) Support semicolon as a param separator #71

(feat) Support semicolon as a param separator #71

leonardb commented Sep 20, 2018

essen commented Sep 20, 2018

leonardb commented Sep 20, 2018

essen commented Sep 20, 2018

leonardb commented Sep 20, 2018

essen commented Sep 20, 2018 •

edited

Loading

leonardb commented Sep 20, 2018

(feat) Support semicolon as a param separator #71

(feat) Support semicolon as a param separator #71

Conversation

leonardb commented Sep 20, 2018

essen commented Sep 20, 2018

leonardb commented Sep 20, 2018

essen commented Sep 20, 2018

leonardb commented Sep 20, 2018

essen commented Sep 20, 2018 • edited Loading

leonardb commented Sep 20, 2018

essen commented Sep 20, 2018 •

edited

Loading