Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat) Support semicolon as a param separator #71

Closed

Conversation

leonardb
Copy link

Per W3C recommendation ';' should be supported as a param separator.
See: https://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2

This helps solve an external issue where URLs with entities are being munged
For example:
"?param1=value1&note=value" gets converted to "?param1=value1¬e=value"
in browser display

Per W3C recommendation ';' should be supported as a param separator.
See: https://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2
This helps solve an external issue where URLs with entities are being munged
@essen
Copy link
Member

essen commented Sep 20, 2018

This is a very old document. That text is no longer present in the most recent spec: https://url.spec.whatwg.org/#concept-urlencoded I would recommend you fix the code where you generated this and encode the special characters properly. Erlydtl will do that automatically for example. Note that not encoding them can lead to XSS or worse.

I doubt there's too much utility to having this upstream.

@leonardb
Copy link
Author

It really is more about being able to support params with ';' separators.

In many cases we're dealing with URLs where params are added by users we cannot control.

When an entity fragment is included in the URL they are munged by the browser, leading to broken URLs. It's not really something the users are capable of understanding or dealing with themselves.

This is not really about wanting to include entities, it's about the browsers assuming that something like '&note=xyzis¬e=xyz, or &copybad=xyzis©bad=xyz`

If we support ';' sep in cowboy we can then issue links as "?param1=value1;note=value" which does not suffer the same potential consequences

As far as I can tell from reading the RFC3986 and other references, ';' is a reserved character available for use as a separator.

@essen
Copy link
Member

essen commented Sep 20, 2018

I'm not sure I understand. To give a common example:

  • User A writes post, that includes a link
  • Server stores the post
  • User B goes to the page containing the post
  • Server services the post, encoding it (if HTML that means characters like & are converted to & including in the URLs)
  • User B clicks on the link, browser decodes it and sends the user to the correct location

There isn't a lot of cases where this wouldn't work. Maybe if your input was HTML, but then what you really need is an HTML sanitizer library to sanitize the input.

I just don't see the use case, you'll need to be more specific. And assuming there is one, the current implementation should probably have a parse_qs/2 instead of accepting both by default and in the same query strings.

@leonardb
Copy link
Author

Use case:

User A adds a link to the system for UserB to use

http://example.com/?a=1&copycount=25

(the parameters are not fixed and could be anything)

User B gets the link in their browser as text:
"http://example.com/?a=1&copycount=25" is now displayed as "http://example.com/?a=1©count=25"

User copy/pastes that link into their code

That link is then clicked by one of their visitors and the visitor hits our server

Since we've lost the copycount parameter during the copy/paste the link is seen as invalid

It seems this is true for a large number HTML entities which can be interpreted through &entity

In short, it's really a browser issue causing loss of fidelity with text thanks to it's automatic parsing (even when what you'd think would be a required terminating ';' for the entity is missing)

By supporting the ';' separator we can alleviate the issue (somewhat) as the rendered link would no longer be seen to contain html entities

I agree with the parse_qs/2 as it'd remove magic. I'm assuming that if this moves forward we'd probably want some type of config flag for the separator, defaulting to << $& >> which can be pushed down to cowlib from cowboy

@essen
Copy link
Member

essen commented Sep 20, 2018

Use case:

User A adds a link to the system for UserB to use

http://example.com/?a=1&copycount=25

(the parameters are not fixed and could be anything)

User B gets the link in their browser as text:
"http://example.com/?a=1&copycount=25" is now displayed as "http://example.com/?a=1©count=25"

That's what I'm saying. There's no reason for the browser to interpret it as an entity if the server is sending the data properly.

If you mean text as in a text file, the media type needs to be text/plain.

If you mean text as in a part of an HTML file, then you need to encode the special characters. An example encode function could be:

html_encode(Text) -> 
    <<case C of 
        $& -> <<"&amp;">>; 
        $< -> <<"&lt;">>; 
        $> -> <<"&gt;">>; 
        $" -> <<"&quot;">>; 
        $' -> <<"&apos;">>; 
        _ -> <<C/utf8>> 
    end || <<C/utf8>> <= Text>>.

Not encoding data coming from external sources properly creates security flaws like XSS and others.

@leonardb
Copy link
Author

I'm going to agree with you here. Think I was barking up the wrong tree. We'll should just have to add formatters for different outputs to solve the issue.
Thanks for putting me straight. Closing PR

@leonardb leonardb closed this Sep 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants