Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mojo::DOM misparses <script> elements (another way) #2015

Closed
mauke opened this issue Dec 8, 2022 · 2 comments
Closed

Mojo::DOM misparses <script> elements (another way) #2015

mauke opened this issue Dec 8, 2022 · 2 comments
Labels

Comments

@mauke
Copy link
Contributor

mauke commented Dec 8, 2022

  • Mojolicious version: 9.30
  • Perl version: v5.36.0
  • Operating system: Ubuntu 22.04.1 LTS

Steps to reproduce the behavior

#!/usr/bin/env perl
use v5.12.0;
use warnings;
use Mojo::DOM;

my $dom = Mojo::DOM->new(do { local $/; scalar readline DATA });

say for $dom->find('p')->each;

__DATA__
<!DOCTYPE html>
<h1>Welcome to HTML</h1>
<script>
    console.log('this is a script element and should be executed');
// </script asdf> <p>
    console.log('this is not a script');
    // <span data-wtf="</script>">:-)</span>

Expected behavior

Output similar to:

<p>
    console.log(&#39;this is not a script&#39;);
    // <span data-wtf="&lt;/script&gt;">:-)</span>
</p>

An (implicitly closed) p element exists, so it should be found.

Actual behavior

No output.

@kraih
Copy link
Member

kraih commented Dec 8, 2022

I've not looked at the spec yet, but this would probably be the section to check for the correct behavior.

@mauke
Copy link
Contributor Author

mauke commented Dec 8, 2022

The relevant section is this one: https://html.spec.whatwg.org/multipage/parsing.html#script-data-end-tag-name-state

After seeing </ (followed by a letter) in a <script> element, we end up in the "script data end tag name" state. Here we accumulate letters into the name of a temporary tag. On seeing whitespace (space, tab, line feed, form feed), we check that the temporary tag name matches "script"; if so, we stop script parsing (treating the characters found as a script end tag) and continue parsing for attributes.

Now, end tags with attributes are technically an error: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-end-tag-with-attributes
But a forgiving parser will simply ignore them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants