Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html: UnescapeString unescapes HTML character references without a final semicolon in an attribute #40320

Open
elan-sg opened this issue Jul 20, 2020 · 3 comments

Comments

@elan-sg
Copy link

@elan-sg elan-sg commented Jul 20, 2020

What version of Go are you using (go version)?

$ go version
go version go1.12.5 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

$ go env
GOHOSTARCH="amd64"
GOHOSTOS="linux"

What did you do?

this is related to #21563
https://play.golang.com/p/Fh08ftsK9YQ

pass the string "<a href=example.com?param=value&timestamp=123>link" to html.UnescapeString

What did you expect to see?

according to https://html.spec.whatwg.org/multipage/parsing.html#character-reference-state
in an attribute, no character reference is parsed and string remains intact

it seems like an attempt was made to do this, but attribute is a constant?
https://golang.org/src/html/escape.go?s=1296:1319#L57
https://golang.org/src/html/escape.go?s=3112:3194#L142

I would expect the same string to come back

What did you see instead?

&times is changed to ×

@toothrot toothrot added this to the Backlog milestone Jul 21, 2020
@toothrot
Copy link
Contributor

@toothrot toothrot commented Jul 21, 2020

/cc @mikesamuel @empijei

@empijei
Copy link
Contributor

@empijei empijei commented Aug 4, 2020

This is interesting and it indeed looks like a bug, thanks for reporting.

My only concern with fixing it is that browsers have the tendency of adjusting invalid encodings/HTML to try and display something, even if it doesn't match the markup they received. The consequence of this is that if we leave incomplete encodings in the decoded output we might risk to introduce some mutation-based XSS.

I need to investigate a bit more on this to see if there are security risks in addressing it.

@elan-sg
Copy link
Author

@elan-sg elan-sg commented Aug 4, 2020

thanks for looking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants