Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

innerHTML on a script tag escapes tag characters #14975

Closed
jdm opened this issue Jan 12, 2017 · 49 comments
Closed

innerHTML on a script tag escapes tag characters #14975

jdm opened this issue Jan 12, 2017 · 49 comments

Comments

@jdm
Copy link
Member

@jdm jdm commented Jan 12, 2017

<script type="text/template"><!-- hi --></script><script>console.log(document.querySelector('script').innerHTML)</script>

In Firefox, this prints <!-- hi -->. In Servo, it prints &lt;-- hi --&gt;.

@jdm
Copy link
Member Author

@jdm jdm commented Jan 12, 2017

The cause is this match which says that the contents of script tags should be escaped.

@jdm
Copy link
Member Author

@jdm jdm commented Jan 12, 2017

That code matches what is written at https://html.spec.whatwg.org/multipage/syntax.html#serialising-html-fragments for "If current node is a Text node". Interestingly, outerHTML works fine, and innerHTML does not.

@jdm
Copy link
Member Author

@jdm jdm commented Jan 12, 2017

The parent() operation in the html5ever serializer gets the last element in the element stack. In the outerHTML case this is the element that is pushed in start_elem. In the innerHTML case this is the dummy initial value added in the serializer's constructor, so it's not considered to have a script element parent.

@jdm jdm added the E-easy label Jan 12, 2017
@highfive
Copy link

@highfive highfive commented Jan 12, 2017

Please make a comment here if you intend to work on this issue. Thank you!

@jdm
Copy link
Member Author

@jdm jdm commented Jan 12, 2017

To fix this, in the case where the serializer is created with ChildrenOnly we should make the initial element contain the data for the actual element parent. We should add a unit test in html5ever for this case, as well as create a new WPT test for Servo.

@jdm jdm changed the title innerHTML on a non-JS script tag escapes tag characters innerHTML on a script tag escapes tag characters Jan 12, 2017
@jdm
Copy link
Member Author

@jdm jdm commented Jan 12, 2017

Actually, there's a useful tests/wpt/web-platform-tests/html/syntax/serializing-html-fragments.html test that already exists, and contains a relevant case: <span><script type="test"><&></script></span>. However, it only checks innerHTML and outerHTML of the span, rather than the script directly. Let's add another case to this that gets rid of the outer span.

@tormeh
Copy link

@tormeh tormeh commented Jan 12, 2017

Can I claim this? I have no experience with Servo (except testing) and little with Rust, so if it needs to be done quickly then I'm probably not the correct person...

@jdm
Copy link
Member Author

@jdm jdm commented Jan 12, 2017

Please do! There's no rush, and the actual code changes will happen in the html5ever repository.

@jdm jdm added the C-assigned label Jan 12, 2017
@jdm
Copy link
Member Author

@jdm jdm commented Jan 12, 2017

To see how Servo uses the html5ever serializer, look at Element::GetInnerHTML and related code in the HTML parser serialize implementation.

@tormeh
Copy link

@tormeh tormeh commented Jan 15, 2017

Hmm... This bug is tagged with "C - has test", but when I run "cargo test" all the tests pass. Is the test wrong? And the code says that the contents of a script tag should not be escaped (i.e. let script = false;), while this thread and the title says they are. What am I missing?

@jdm
Copy link
Member Author

@jdm jdm commented Jan 15, 2017

The has test label means that the issue contains a minimal code sample that reproduces the problem. It still needs to be turned into a testcase that will run as part of ./mach test-wpt (as suggested in this comment).

@jdm
Copy link
Member Author

@jdm jdm commented Jan 15, 2017

This comment explains the case in which the code does not operate correctly and chooses to escape the element contents when it should not.

@jdm
Copy link
Member Author

@jdm jdm commented Feb 2, 2017

Have you encountered any problems with fixing the issue, @tormeh?

@tormeh
Copy link

@tormeh tormeh commented Feb 3, 2017

No, well, I'm new to Servo and Rust and inexperienced in HTML, git and Github. So even though the bug itself is easy, everything around it really slows me down. Which is frustrating, which slows me down further, since the world has plenty of non-frustrating things to do. But I also learn quite a bit.

@jdm
Copy link
Member Author

@jdm jdm commented Feb 18, 2017

#15148 ran into some problems due to the master of html5ever being different from the version that Servo was using. This is now available for anybody to work on again.

@jdm jdm removed the C-assigned label Feb 18, 2017
@seanlee31
Copy link

@seanlee31 seanlee31 commented Feb 18, 2017

Hi there! I would like to work on this bug. I'm new to bugzilla/servo and may need some additional help.

I've looked into the code and wants to confirm if the main problem was caused by the return value for innerHTML from parent.

Thanks!

@seanlee31
Copy link

@seanlee31 seanlee31 commented Feb 18, 2017

Also,

is the value of the node equal to <script type="text/template"><!-- hi --></script><script>console.log(document.querySelector('script').innerHTML)</script> when traversal_scope = IncludNode, and equals to <!-- hi --> when traversal_scope = ChildrenOnly?

@jdm
Copy link
Member Author

@jdm jdm commented Feb 18, 2017

That is correct.

@seanlee31
Copy link

@seanlee31 seanlee31 commented Feb 19, 2017

I've read thru past comments from the previous developer but not sure where to begin with. Also, the problem was fixed and started to occur again due to the inconsistency in html5ever that were being used?

Thank you for your help!

@jdm
Copy link
Member Author

@jdm jdm commented Feb 19, 2017

No, the problem mentioned was in trying to use a local clone of html5ever in Servo, which is required in order to modify it and see the changes in Servo.

@jdm
Copy link
Member Author

@jdm jdm commented Feb 19, 2017

First: add the test that demonstrates the problem. Second: get a local clone of html5ever working. Third: fix the problem in html5ever and make the test pass.

@seanlee31
Copy link

@seanlee31 seanlee31 commented Feb 19, 2017

Okay I'll add some testcases to my local servo clone just like @tormeh did. But I'm not sure how to interact the testcases in local servo clone with the local clone of html5ever you mentioned since they're two different repos?
Also, can you please assign me to this bug?
Thanks!

@jdm
Copy link
Member Author

@jdm jdm commented Mar 29, 2017

Sounds good!

@jdm
Copy link
Member Author

@jdm jdm commented Apr 12, 2017

@moonlightdrive Did you make any progress here?

@o0Ignition0o
Copy link
Contributor

@o0Ignition0o o0Ignition0o commented Apr 13, 2017

Heyh @jdm, sorry for the late response (I've somehow muted the repo even though i didn't want to) Please consider me if the issue gets unassigned again !

@moonlightdrive
Copy link
Contributor

@moonlightdrive moonlightdrive commented Apr 14, 2017

I added the tests and have servo building with a local clone of html5ever.

The problem I am facing now is finding where in html5ever I need to make the change. I thought the serialize function in html5ever/src/rcdom.rs seemed reasonable, but changes here have had no effect.

Additionally I am not clear on the difference between html5ever and xml5ever (or if that's even relevant to this issue)

@Ygg01
Copy link

@Ygg01 Ygg01 commented Apr 14, 2017

Hi @moonlightdrive I wrote xml5ever, I'll try to help as much as possible!

Basically html5ever and xml5ever are really, really similarly looking. Think of it as twin library. I tried to make sure that things are generally located in the same space (assuming you start from project root).

For example html5ever/src/rcdom.rs, in xml5ever is located in xml5ever/src/rcdom.rs.

@jdm
Copy link
Member Author

@jdm jdm commented Apr 14, 2017

I linked to the appropriate serializer implementation in one of the earlier comments: https://github.com/servo/html5ever/blob/master/src/serialize/mod.rs

@jasonwilliams
Copy link
Contributor

@jasonwilliams jasonwilliams commented Apr 21, 2017

Looks like this is also a cause of #16532

@KiChjang
Copy link
Member

@KiChjang KiChjang commented Apr 21, 2017

@moonlightdrive Are you still working on this?

@moonlightdrive
Copy link
Contributor

@moonlightdrive moonlightdrive commented Apr 21, 2017

Hi, I am still working on it. Sorry I've been so slow. If this issue needs to be fixed soon I am happy to give it up and maybe take a look at another one.

@jdm
Copy link
Member Author

@jdm jdm commented Apr 29, 2017

Still making progress @moonlightdrive? Do you have any questions that we can answer?

@moonlightdrive
Copy link
Contributor

@moonlightdrive moonlightdrive commented May 2, 2017

I definitely have a question, or at least some confusion, but think I need to poke around a bit more to know exactly what it is.

As far as I understand, the difference between the treatment of innerHTML and outerHTML is that the former is serialized with ChildrenOnly and the latter with IncludeNode. And this bug arises due to the fact that in the ChildrenOnly case, we didn't push that script node.. so when we check the parent to see if we need to escape, we've essentially discarded that information. As has already been said.

But I don't see, in start_elem (where there are 2 instances of nodes being pushed to the stack) how the IncludeNode and ChildrenOnly cases are being treated differently.

Another question: what is happening here? Is this where script is pushed in the outerHTML case?

Not sure if I've been clear with my questions. Please let me know and I'll try to clarify.

@jdm
Copy link
Member Author

@jdm jdm commented May 2, 2017

@moonlightdrive I think you're looking too deeply here. start_elem is responsible for serializing a single element, and leaving a record in the stack member of the element that was just serialized. It doesn't know (or care) about whether it's invoked as part of ChildrenOnly or IncludeNode.

As to your second question, yes, through invoking start_elem there in the IncludeNode case (as used by outerHTML), the stack ends up with a meaningful entry for the outer <script> tag.

@jdm
Copy link
Member Author

@jdm jdm commented May 2, 2017

Looking more carefully, I think I see why you're asking those questions. I think the right thing to do might be to modify the html5ever serialize method to push a meaningful stack entry if the traversal scope is ChildrenOnly, or possibly even modify Serializer::new method to do that.

@jdm
Copy link
Member Author

@jdm jdm commented May 23, 2017

@moonlightdrive Are you still working on this?

@moonlightdrive
Copy link
Contributor

@moonlightdrive moonlightdrive commented May 25, 2017

I am actually a little unclear on where (which repository) to make this change. I have tried the following:

I tried making the fix by changing the Serializer::new method in html5ever. I am not sure how to get the tagname for the html_name field, without importing types from servo. It seems like html5ever is a more general library that should not depend on servo, correct?

Also took a look at pushing to the stack in the serialize method when the traversal scope is ChildrenOnly (implemented in servo repo). Thought I need parent method (to check if the None value should replaced by the correct thing). parent is private. I tried to make it public (would this be undesirable?) but ran into an issue of "can't leak private types" (have done some googling but am still new to rust and don't quite understand why this occurs).

Not sure if I've misunderstood how to implement this.

Thank you

@jdm
Copy link
Member Author

@jdm jdm commented May 29, 2017

Sorry for the delay in responding. Good investigation! You're right that html5ever cannot depend on Servo (especially since Servo depends on html5ever). The error about private types occurs because a public function would now have a private type in its public interface; the private type in this case is ElemInfo which is defined in the same file.

As for the optimal solution, I think I have a useful idea for you. Let's make the ChildrenOnly variant of TraversalScope have a Option<LocalName> field. Serializer::new can look at the traversal_scope member of the options that receives and extract the name if it's present, and use that for the initial ElemInfo value. Am I making sense?

@moonlightdrive
Copy link
Contributor

@moonlightdrive moonlightdrive commented May 31, 2017

Yeah, that makes sense! I'll take a look at implementing that solution. Thanks :)

@moonlightdrive
Copy link
Contributor

@moonlightdrive moonlightdrive commented Jun 9, 2017

I think what I actually need is a function that's part of the Serializer trait, which will push/pop elements to the stack, without doing any of the writes that start_elem or end_elem do.

I believe (and don't have a great grasp of Rust yet) that this function should have a signature like

// i don't know what to name this fn yet
fn start_elem_without_write<T>(&mut self, elem: T);

and that the implementation of this function for HtmlSerializer should be something like

fn start_elem_without_write<ElemInfo>(&mut self, elem: ElemInfo) {
  self.stack.push(elem);
}

but this is evidently not right. The error I am getting is: expected struct 'serialize::ElemInfo', found type parameter. I guess this type parameter is ElemInfo? (Based on a compiler note: expected type serialize::ElemInfo found type ElemInfo) I don't understand why this is a type parameter rather than a thing that is as struct.

moonlightdrive added a commit to moonlightdrive/html5ever that referenced this issue Jun 14, 2017
Add functions for pushing/popping nodes without performing writes.
This is necessary so that these nodes exist on the stack when we
serialize the children nodes and may need the parent for context.
Addresses servo/servo#14975
moonlightdrive added a commit to moonlightdrive/html5ever that referenced this issue Jun 14, 2017
Add functions for pushing/popping nodes without performing writes.
This is necessary so that these nodes exist on the stack when we
serialize the children nodes and may need the parent for context.
Addresses servo/servo#14975
@jdm jdm added the C-has open PR label Jun 20, 2017
bors-servo added a commit to servo/html5ever that referenced this issue Aug 23, 2017
change type ChildrenOnly to ChildrenOnly(Option<QualName>)

This is necessary so that these nodes exist on the stack when we
serialize the children nodes and may need the parent for context.
Addresses servo/servo#14975

---
Opening this PR for feedback/review

~~This does work and solves the above bug. I'm not sure if this is the best implementation, as there's a bit of duplicated code. I thought about adding a boolean flag to `start_elem` that would determine whether the writes should be performed. But then I thought maybe when not writing, it's best not to call a function returning `io::Result<()>`?~~

Also not 100% sure if the xml implementation is the right thing to do

Anyway, please let me know of any improvements/changes to make. Thanks :)

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/html5ever/277)
<!-- Reviewable:end -->
@jdm jdm mentioned this issue Oct 4, 2017
4 of 4 tasks complete
bors-servo added a commit that referenced this issue Oct 6, 2017
Serializing childrenonly

Rebased from #17896.

---
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #14975 (github issue number if applicable).
- [x] There are tests for these changes

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/18747)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Oct 7, 2017
Serializing childrenonly

Rebased from #17896.

---
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #14975 (github issue number if applicable).
- [x] There are tests for these changes

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/18747)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Oct 9, 2017
Serializing childrenonly

Rebased from #17896.

---
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #14975 (github issue number if applicable).
- [x] There are tests for these changes

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/18747)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Oct 9, 2017
Serializing childrenonly

Rebased from #17896.

---
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #14975 (github issue number if applicable).
- [x] There are tests for these changes

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/18747)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

9 participants
You can’t perform that action at this time.