New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use new parse5 streaming interface #1316
base: main
Are you sure you want to change the base?
Conversation
This doesn't use the new JS capabilities parse5 got, but gives us the new location info that was added. |
lib/jsdom/browser/htmltodom.js
Outdated
@@ -237,7 +267,26 @@ function setChild(core, parent, node) { | |||
return null; | |||
} | |||
|
|||
newNode[locationInfo] = node.__location; | |||
if (node.__location) { | |||
newNode[locationInfo] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too sure about this. This is mainly needed for backwards compat because start/end got renamed to (start|end)Offset. We could just alias them, or major ourselves...
I'd like to be a bit more ambitious than this. We should do a major version bump and:
|
Sounds good to me, will do that. |
74e4e49
to
e2c9f17
Compare
Finished up our document adapter, however there're a few issues there, one of them being that we don't handle scripts correctly now, not sure if there are others as well. I think parse5 creates a script, appends it to the dom and inserts text after, but we don't notice the appended text. The HTML spec describes how to handle parser inserted scripts correctly - I hope that's do-able with the jsdom-parse5 interface(s). |
046375f
to
5803c7e
Compare
Any updates on this? |
4654671
to
b7b7e74
Compare
@sykaeh It's the next thing I'll tackle, the node rewrite got in the way, but I'll have to re-implement scripts for this to work and that (or a different part afterwards) might reveal some more issues I haven't thought of yet. |
setChild(this.core, element, parsed[i]); | ||
const instance = new this.parser.ParserStream({ | ||
treeAdapter: adapter(element), | ||
locationInfo: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend to not enable locationInfo
by default. It has significant impact on performance and since it's not commonly used, I suggest to make it optional and disabled by default in jsdom.
Wishlist: allow "invoking the fragment parsing algorithm with text as markup, and context as the context element" from other parts of jsdom. See #1355. Maybe requires parse5 support, haven't checked. |
00c0f4b
to
d1aa072
Compare
d1aa072
to
7d4c68a
Compare
Our resource loader is currently killing my progress - it's way too tightly integrated with our delay-load-event logic, but is also the only place which contains our logic on how to correctly resolve urls to either files or urls and how to handle a custom fetch callback :( Will need to work around it or clean it up... |
2cbcd85
to
07a18f7
Compare
During script execution, the document wasn't actually loaded yet, but our old parser implementation loaded the whole document before executing scripts.
07a18f7
to
333ca27
Compare
e8c5051
to
2d71252
Compare
Any progress? |
Fixes #1778. This changes the public API in that the nodeLocation() functions now return differently-structured objects. parse5 has also removed support for serialization of obsolete doctypes. #1316 is a more ambitious attempt to use parse5's new streaming interface to get correct script execution semantics, but is still not ready; this is just a minimal upgrade to the latest parse5 version.
An update: parts of the cleanup is now being developed in #2098. Resource loader is still there however, for now. |
This changes the HTML parsing infrastructure, in particular how we interface with parse5. The original motivation for this change was increased performance; the new benchmark, which appends 65535 <tr>s, gets a speedup of ~3x from these changes. Other user-visible changes to script evaluation might be present. In particular, the tests revealed that one document.write-related test that used to somehow pass now fails, while another one succeeds (at least on Node 8). In most cases, however, behavior should be the same. While here, we also changed style sheets to reevaluate their rules and update styleEl.sheet or linkEl.sheet appropriately when their child text content changed. Behind the scenes details: - We now use a parse5 tree adapter directly for parsing, instead of using the htmlparser2 adapter layer (which had an O(n) insertion cost for new siblings) - SAX XML parsing code has been simplified by no longer being shared with parse5/htmlparser2 parsing code. - Nodes no longer have a reference to the "core" god-object. This was only used in a couple places, and was error prone because this reference was not available in cases such as document nodes created via the Document constructor. This removes a lot of code that threaded the object throughout everything. - We continued to use hacky workarounds for script evaluation, during parsing and elsewhere. Perhaps one day, inspired by #1316 and #1920, we can fix these.
Fixes #1315.
ToDo:
styleSheets
insertion timing (before / after load)styleSheets
) when dom mutation events are emitteddocument.write
s