Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mojo::DOM misparses <script> elements #2014

Closed
mauke opened this issue Dec 8, 2022 · 5 comments
Closed

Mojo::DOM misparses <script> elements #2014

mauke opened this issue Dec 8, 2022 · 5 comments
Labels

Comments

@mauke
Copy link
Contributor

mauke commented Dec 8, 2022

  • Mojolicious version: 9.30
  • Perl version: v5.36.0
  • Operating system: Ubuntu 22.04.1 LTS

Steps to reproduce the behavior

#!/usr/bin/env perl
use v5.12.0;
use warnings;
use Mojo::DOM;

my $dom = Mojo::DOM->new(do { local $/; scalar readline DATA });

say for $dom->find('div')->each;

__DATA__
<!DOCTYPE html>
<h1>Welcome to HTML</h1>
<script>
    console.log('< /script> is safe');
    /* <div>XXX this is not a div element</div> */
</script>

Expected behavior

No output as the document contains no div elements. (document.querySelectorAll('div') in a browser agrees.)

Actual behavior

Output:

<div>XXX this is not a div element</div>
@kraih
Copy link
Member

kraih commented Dec 8, 2022

I've not looked at the spec yet, but this would probably be the section to check for the correct behavior.

@mauke
Copy link
Contributor Author

mauke commented Dec 8, 2022

This one looks relevant: https://html.spec.whatwg.org/multipage/parsing.html#script-data-less-than-sign-state

After seeing a < in a <script> element, the parser looks at the next character. Only ! and / are special. For any other character (including space), the < is parsed literally and scanning continues.

@kraih
Copy link
Member

kraih commented Dec 8, 2022

This line probably needs some fixing.

@kraih kraih added the bug label Dec 8, 2022
@gordon-fish
Copy link

xmllint appears to recognize the <script> block all the way to the final closing </script> (though it seems to have issues with comments):

$ xmllint --html --debug mojo-issue-2014.html 
mojo-issue-2014.html:5: HTML parser error : Unexpected end tag : div
/* <div>XXX this is not a div element</div> */
                                           ^
HTML DOCUMENT
URL=mojo-issue-2014.html
standalone=true
  DTD(html)
  ELEMENT html
    ELEMENT body
      ELEMENT h1
        TEXT
          content=Welcome to HTML
      TEXT
        content= 
      ELEMENT script
        CDATA_SECTION
          content=     console.log('< /script> is safe'); ...`

$ xmllint --html --xpath //div mojo-issue-2014.html
mojo-issue-2014.html:5: HTML parser error : Unexpected end tag : div
    /* <div>XXX this is not a div element</div> */
                                               ^
XPath set is empty

$ xmllint --html --xpath //script mojo-issue-2014.html
mojo-issue-2014.html:5: HTML parser error : Unexpected end tag : div
    /* <div>XXX this is not a div element</div> */
                                               ^
<script><![CDATA[
    console.log('< /script> is safe');
    /* <div>XXX this is not a div element */
]]></script>

@kraih kraih closed this as completed in 6f195d8 Dec 10, 2022
@kraih
Copy link
Member

kraih commented Dec 10, 2022

Also fixed in @mojojs/dom. mojolicious/dom.js@90ad748

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants