Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzy detection of entities should be disabled in href attribute #1069

Closed
powerman opened this issue Mar 9, 2017 · 8 comments
Closed

Fuzzy detection of entities should be disabled in href attribute #1069

powerman opened this issue Mar 9, 2017 · 8 comments
Labels

Comments

@powerman
Copy link

powerman commented Mar 9, 2017

  • Mojolicious version: 7.26
  • Perl version: v5.24.1
  • Operating system: Linux

Steps to reproduce the behavior

say Mojo::DOM->new('<a href="/?a=42&sector=1">Link</a>');

Expected behavior

<a href="/?a=42&sector=1">Link</a>

Actual behavior

<a href="/?a=42§or=1">Link</a>
@kraih
Copy link
Member

kraih commented Mar 9, 2017

Please reference the sections of the specs that apply here.

@powerman
Copy link
Author

powerman commented Mar 9, 2017

https://www.w3.org/TR/html5/syntax.html#consume-a-character-reference

If the character reference is being consumed as part of an attribute, and the last character matched is not a ";" (U+003B) character, and the next character is either a "=" (U+003D) character or an alphanumeric ASCII character, then, for historical reasons, all the characters that were matched after the U+0026 AMPERSAND character (&) must be unconsumed, and nothing is returned.

@kraih
Copy link
Member

kraih commented Mar 9, 2017

Does the real HTML5 spec from WHATWG agree with that?

@powerman
Copy link
Author

powerman commented Mar 9, 2017

No idea. But it's clear current behaviour break things at least for very usual values in href attribute. And conflicts with at least one spec. Am I should dig WHATWG spec too for some reason?

@kraih
Copy link
Member

kraih commented Mar 9, 2017

Yes, we do not respect the W3C spec, same as browsers.

@kraih
Copy link
Member

kraih commented Mar 9, 2017

@powerman
Copy link
Author

powerman commented Mar 9, 2017

https://html.spec.whatwg.org/#character-reference-state

If the character reference was consumed as part of an attribute (return state is either attribute value (double-quoted) state, attribute value (single-quoted) state or attribute value (unquoted) state), and the last character matched is not a U+003B SEMICOLON character (;), and the next input character is either a U+003D EQUALS SIGN character (=) or an ASCII alphanumeric, then, for historical reasons, switch to the character reference end state.

@kraih kraih added the bug label Mar 9, 2017
@kraih
Copy link
Member

kraih commented Mar 9, 2017

Ok, that's a bug then.

@kraih kraih closed this as completed in 852a71a Mar 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants