Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new parse5 streaming interface #1316

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Use new parse5 streaming interface #1316

wants to merge 11 commits into from

Conversation

Sebmaster
Copy link
Member

Fixes #1315.

ToDo:

  • Add stylesheet removal test
  • Add test for link styleSheets insertion timing (before / after load)
  • Test that stylesheets are already added/removed (in styleSheets) when dom mutation events are emitted
  • Fix script insertion
  • Add test for incomplete tag document.writes

@Sebmaster
Copy link
Member Author

This doesn't use the new JS capabilities parse5 got, but gives us the new location info that was added.

@@ -237,7 +267,26 @@ function setChild(core, parent, node) {
return null;
}

newNode[locationInfo] = node.__location;
if (node.__location) {
newNode[locationInfo] = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too sure about this. This is mainly needed for backwards compat because start/end got renamed to (start|end)Offset. We could just alias them, or major ourselves...

@domenic
Copy link
Member

domenic commented Dec 9, 2015

I'd like to be a bit more ambitious than this. We should do a major version bump and:

  • Stop using parse5's htmlparser2 adapter, thus solving problems mentioned in Things that are awkward in jsdom right now in 1.5 inikulin/parse5#78. This should remove a lot of the code where we construct the tree manually; if we just provide a nice parse adapter to parse5, it should be able to construct it for us.
  • Remove support for all parsers except sax and parse5. Figure out the best way to change the parser option so that if statements are minimal. If people want to use other parsers they must ape the parse5 or sax interface. E.g. maybe we require HTML parsers to have parse, parseFragment, serialize, serializeFragment, and ParserStream?? We can remove the parser option until we switch parse5's new interface, if that makes it easier.
  • Clean up dead code where we are patching up old parsers, e.g. parseDoctype.

@Sebmaster
Copy link
Member Author

Sounds good to me, will do that.

@Sebmaster
Copy link
Member Author

Finished up our document adapter, however there're a few issues there, one of them being that we don't handle scripts correctly now, not sure if there are others as well. I think parse5 creates a script, appends it to the dom and inserts text after, but we don't notice the appended text. The HTML spec describes how to handle parser inserted scripts correctly - I hope that's do-able with the jsdom-parse5 interface(s).

@Sebmaster Sebmaster force-pushed the feature/new-parse5 branch 7 times, most recently from 046375f to 5803c7e Compare December 13, 2015 20:24
@sykaeh
Copy link

sykaeh commented Jan 14, 2016

Any updates on this?

@Sebmaster Sebmaster force-pushed the feature/new-parse5 branch 2 times, most recently from 4654671 to b7b7e74 Compare January 15, 2016 01:55
@Sebmaster
Copy link
Member Author

@sykaeh It's the next thing I'll tackle, the node rewrite got in the way, but I'll have to re-implement scripts for this to work and that (or a different part afterwards) might reveal some more issues I haven't thought of yet.

setChild(this.core, element, parsed[i]);
const instance = new this.parser.ParserStream({
treeAdapter: adapter(element),
locationInfo: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend to not enable locationInfo by default. It has significant impact on performance and since it's not commonly used, I suggest to make it optional and disabled by default in jsdom.

@domenic
Copy link
Member

domenic commented Jan 15, 2016

Wishlist: allow "invoking the fragment parsing algorithm with text as markup, and context as the context element" from other parts of jsdom. See #1355. Maybe requires parse5 support, haven't checked.

@Sebmaster
Copy link
Member Author

Our resource loader is currently killing my progress - it's way too tightly integrated with our delay-load-event logic, but is also the only place which contains our logic on how to correctly resolve urls to either files or urls and how to handle a custom fetch callback :(

Will need to work around it or clean it up...

During script execution, the document wasn't actually loaded yet, but our old parser implementation loaded the whole document before executing scripts.
@MeirionHughes
Copy link

Any progress?

@Zirro Zirro mentioned this pull request May 13, 2017
domenic pushed a commit that referenced this pull request May 14, 2017
Fixes #1778.

This changes the public API in that the nodeLocation() functions now return differently-structured objects. parse5 has also removed support for serialization of obsolete doctypes.

#1316 is a more ambitious attempt to use parse5's new streaming interface to get correct script execution semantics, but is still not ready; this is just a minimal upgrade to the latest parse5 version.
@TimothyGu
Copy link
Member

An update: parts of the cleanup is now being developed in #2098. Resource loader is still there however, for now.

domenic added a commit that referenced this pull request Jan 22, 2018
This changes the HTML parsing infrastructure, in particular how we interface with parse5. The original motivation for this change was increased performance; the new benchmark, which appends 65535 <tr>s, gets a speedup of ~3x from these changes.

Other user-visible changes to script evaluation might be present. In particular, the tests revealed that one document.write-related test that used to somehow pass now fails, while another one succeeds (at least on Node 8). In most cases, however, behavior should be the same.

While here, we also changed style sheets to reevaluate their rules and update styleEl.sheet or linkEl.sheet appropriately when their child text content changed.

Behind the scenes details:

- We now use a parse5 tree adapter directly for parsing, instead of using the htmlparser2 adapter layer (which had an O(n) insertion cost for new siblings)
- SAX XML parsing code has been simplified by no longer being shared with parse5/htmlparser2 parsing code.
- Nodes no longer have a reference to the "core" god-object. This was only used in a couple places, and was error prone because this reference was not available in cases such as document nodes created via the Document constructor. This removes a lot of code that threaded the object throughout everything.
- We continued to use hacky workarounds for script evaluation, during parsing and elsewhere. Perhaps one day, inspired by #1316 and #1920, we can fix these.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants