DOMParser within ServiceWorkers #846

markuskobler · 2016-03-15T19:57:30Z

I'm trying to figure out the best way to parse HTML in response to a fetch request. In my case, I'm attempting to figure out any dependencies the HTML might have so I can also cache those assets as well

Is the only way todo this currently to use a horrendously complicated regex or am I missing something obvious?

Or put it another way what would be the downside of exposing the DOMParser to the ServiceWorker scope.

The text was updated successfully, but these errors were encountered:

annevk · 2016-03-16T07:43:56Z

DOM implementations in browsers are not thread-safe. And don't use a regexp to parse HTML. You need to write an HTML parser if you want to do that.

markuskobler · 2016-03-16T08:48:59Z

Ok, that's useful to know. Is that the technical reason why document types are supported on XHR and not on fetch Response?

domenic · 2016-03-16T12:49:26Z

They aren't supported in web workers.

parse5 works in web workers, as does in fact large parts of the jsdom project (although that is a much larger dependency).

annevk · 2016-03-16T13:17:01Z

@markuskobler the reason fetch() doesn't support nodes is mostly because we didn't think a potential fetch module requiring all of DOM is a reasonable proposition. It's not really related to the DOM not being available in workers.

markuskobler · 2016-03-18T11:28:22Z

So in my case, I want the service worker to have a better understanding of the HTML it's caching so it can make a call on which of its dependencies should also be cached. I would have assumed this to be a common use case?

I only mentioned the DOMParser, not because I'm attached to that API, but because this feels like a core responsibility of the browser.

wanderview · 2016-03-18T16:54:20Z

If you want this done on the client I think you should just use read-through-caching. You can then just use <link rel="prefetch"> for resources you want to aggressively load/cache.

Alternatively, you can have your server pre-compute the list of resources and cache them in your service worker install event.

I think trying to parse html on the fly would be the least efficient way to go here.

Would either of those solutions work for you?

inian · 2016-07-08T22:50:59Z

I am looking to parse HTML received by the SW to replace links to individual JS files with a single link that combines all of them - for performance reasons, and then send it to the browser for parsing. I am thinking the performance gain from lesser RTTs might offset the extra overhead of running a HTML parser within it. Having a DOMParser in this context would be useful..

Is this a reasonable thing to do within a SW?

jakearchibald · 2016-07-09T06:19:38Z

A better solution here would be to use HTTP/2, where the overhead of multiple requests is low.

inian · 2016-07-09T14:28:12Z

Hey Jake,
I understand that but I was thinking of other optimisations too..say injecting preload tags for fonts and other optimisations purely done on the client side HTML..
Plus this would be a solution that we can use right now, before CDNs start adopting HTTP/2 more widely..

annevk · 2016-07-09T15:36:10Z

You cannot use it right now if we need to make DOM thread-safe first…

jakearchibald · 2016-07-09T15:42:23Z

If you're wanting to trigger preload earlier, I think the options discussed in #920 are better.

If you have to download the whole doc to inject a preload, I think you'll lose performance rather than win

delapuente · 2016-07-15T16:20:12Z

@annevk I think @inian does not want to access the DOM, he is simply suggesting to use some kind of parser to get the body of the response as structured data instead of dealing with strings.

RReverser · 2016-07-15T18:00:53Z

@delapuente The thing is that DOMParser creates real DOM nodes (hence the name), so it's not that easy to split functionality of one from the other. Much easier would be to use some external HTML5 parser (like parse5 as per @domenic's suggestion).

Moreover, if you want to use HTML parser for the content you're getting from the server, you will likely want a streaming parser (which DOMParser is not) that would play well with Streaming API where / when it's available, as otherwise Service Worker will become a bottleneck and not a source of optimizations.

inian · 2016-07-15T18:18:07Z

Ah yes, looking like using a (external) Streaming HTML parser along with the Streams API would be better for the things like injecting the preload tag and so on.. That way, I don't need to wait for the entire document to download..

RReverser · 2016-07-15T18:35:15Z

@inian Yup (and parse5 has streaming mode).

As for preload specifically, you don't even need HTML parser, as instead of tags you can use Link header to indicate same intent, and header is easy to add even without retrieving the response body.

inian · 2016-07-15T18:38:45Z

Thanks for clarifying that @RReverser. Some of the people we are working with find it easier to add a script tag to their page than messing with their servers - so if we could all these optimisations just using a SW, it would be cool..so we are working from that angle now..

Thinking about it, we could just add the Link header via the SW too..

mflux · 2017-01-19T00:16:11Z

I just bumped into this issue where, upon using THREE.js ColladaLoader2 (which uses DOMParser) to break open Collada files (3d models), I tried speeding up the load process by putting it through a web worker. To my surprise, this didn't work simply because DOMParser is not available from within web workers.

It's possible to hack ColladaLoader2 to use a DOMLoader alternative but that's just crazy. It's entirely reasonable for a web worker to be parsing things like DOM.

joeyparrish · 2017-05-30T18:43:02Z

I think for most purposes, any XML parser would suffice. It would not have to be DOMParser specifically, but depending on a pure JS XML parser seems like too much trouble.

RReverser · 2017-05-30T18:56:19Z

@joeyparrish No, no, please never use XML parser to parse HTML. Despite visual similarity, they have very different semantics (unless you specifically target XHTML and not HTML5).

joeyparrish · 2017-05-31T15:46:40Z

True, for HTML. I was thinking of a use case of my own where we use DOMParser to parse XML and would like to be able to do so from a service worker. Since the OP wants to parse HTML, please disregard my comment.

v1nce · 2021-07-22T22:40:22Z

IMO there are A LOT of possible uses for domparser in service workers.
My SW does (or try to do) a lot of thing in the fetch section ( (un)zipping, OTF conversion of files not supported in browser, text manipulation in html or xml) and this will be so much easier with domparser (and canvas)
So I don't see the point of arguing against them and telling people they should go for such or such workarounds.
The only valid point is current DOM parser is not thread-safe.

jakearchibald · 2021-07-30T08:27:55Z

Right now the DOM is closely coupled with rendering, with things such as offsetWidth, CSS styles, getBoundingClientRect etc etc. Maybe in some future we could have a different representation of DOM that's lighter and not coupled to rendering which could be used in a service worker.

However, we shouldn't do this just for service workers, it would need to be a feature across all (or at least many) worker type.

Further discussion of this proposal should happen in https://github.com/whatwg/dom.

Stvad · 2022-04-29T03:10:43Z

Another use-case: Chrome bans background pages in manifest v3 extensions, making some set of use-cases not possible (specifically here, I've been using DOMParser in background page)

tuhuynh27 · 2022-08-31T05:37:55Z

Another use-case: Chrome bans background pages in manifest v3 extensions, making some set of use-cases not possible (specifically here, I've been using DOMParser in background page)

Same here, how to use DOMParser in Service Worker for the Manifest v3 migration now 😢

Stvad · 2022-08-31T06:31:33Z

the hack I ended up doing is having an iframe that I would use as a fake background page. this does not work for all use-cases, but worked for me to run the DOM parsing on a separate thread from the rendering. see an example here https://github.com/transclude-me/extension/tree/main/source/content/background-simulation

rektide · 2023-07-29T20:08:30Z

I agree this isn't a ServiceWorker specific need, but it seems like the conversation around this died with this ticket being closed.

It does a lot more than just DOM, it's a full page, but in part I think we have special-purpose one-off Offscreen Documents (w3c/webextensions#170) capability built specifically for Web Extensions, because the good/necessary ideas here failured to get spoken to. Did anyone ever actually take any steps to getting the DOM spec engaged on this topic, after Jake closed this issue?

I'd also point to great articles like https://paul.kinlan.me/we-need-dom-apis-in-workers/ which again validate the general ask here. People really want a programmatic DOM. We don't really care about the rendering for a lot of our work. Shipping the JSDOM library again and again is a tell that we need help here.

jakearchibald · 2023-07-30T09:35:20Z

#846 (comment) is still relevant. This is the wrong venue for the discussion. Service workers haven't chosen to block DOM APIs, it's that DOM APIs haven't been spec'd to work in workers. If you want that, the discussion needs to happen in the repos relating to the DOM APIs, as it's the maintainers of those features that would need to make the change.

jakearchibald · 2023-07-30T11:55:44Z

If folks still aren't convinced, it's like observing that helicopters don't work underwater, and complaining to the sea.

v1nce · 2023-07-31T21:32:12Z

It's more like : Users : Can we have the keys of the boat ? W3c: Sure U : How is the weather at sea today ? W: I don't check. I think there's some wind. U: Is it safe to use the boat ? W: at sea ? No it is not. U: Why ? W: Because of the fan ! U : The Fan ? W: Yeah it's a fan boat. U: Why a fan ? W : to go in the bayou U: why would anyone want to go in the bayou ? W: I do. U: Ok can't we just remove the fan ? W: No we can't. U: Why ? W: Because it's more a fan that a boat. U: why do you call it a boat then ? W: because that's how we call a board with a fan on it that can go in the bayou.

jakearchibald · 2023-08-01T05:11:46Z

No, it's not like that whatwg/dom#1217 (comment)

jakearchibald added this to the Future ideas milestone Jul 25, 2016

jakearchibald closed this as completed Jul 30, 2021

Stvad mentioned this issue Apr 29, 2022

Manifest v3 Migration transclude-me/extension#6

Closed

rektide mentioned this issue Jul 29, 2023

Proposal: Offscreen Documents for Manifest V3 w3c/webextensions#170

Open

rektide mentioned this issue Jul 29, 2023

[dom-parts] Tree structure of parts in the DOM WICG/webcomponents#992

Open

jakearchibald mentioned this issue Aug 1, 2023

Proposal: DOM APIs in web workers? whatwg/dom#1217

Open

bahrus mentioned this issue Aug 31, 2023

Support for HTML/XML stream parsing/rewriting. whatwg/dom#1222

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOMParser within ServiceWorkers #846

DOMParser within ServiceWorkers #846

markuskobler commented Mar 15, 2016

annevk commented Mar 16, 2016

markuskobler commented Mar 16, 2016

domenic commented Mar 16, 2016

annevk commented Mar 16, 2016

markuskobler commented Mar 18, 2016

wanderview commented Mar 18, 2016

inian commented Jul 8, 2016

jakearchibald commented Jul 9, 2016

inian commented Jul 9, 2016

annevk commented Jul 9, 2016

jakearchibald commented Jul 9, 2016

delapuente commented Jul 15, 2016

RReverser commented Jul 15, 2016

inian commented Jul 15, 2016

RReverser commented Jul 15, 2016

inian commented Jul 15, 2016 •

edited

mflux commented Jan 19, 2017

joeyparrish commented May 30, 2017

RReverser commented May 30, 2017

joeyparrish commented May 31, 2017

v1nce commented Jul 22, 2021

jakearchibald commented Jul 30, 2021

Stvad commented Apr 29, 2022

tuhuynh27 commented Aug 31, 2022

Stvad commented Aug 31, 2022

rektide commented Jul 29, 2023

jakearchibald commented Jul 30, 2023

jakearchibald commented Jul 30, 2023

v1nce commented Jul 31, 2023 via email •

edited

jakearchibald commented Aug 1, 2023

DOMParser within ServiceWorkers #846

DOMParser within ServiceWorkers #846

Comments

markuskobler commented Mar 15, 2016

annevk commented Mar 16, 2016

markuskobler commented Mar 16, 2016

domenic commented Mar 16, 2016

annevk commented Mar 16, 2016

markuskobler commented Mar 18, 2016

wanderview commented Mar 18, 2016

inian commented Jul 8, 2016

jakearchibald commented Jul 9, 2016

inian commented Jul 9, 2016

annevk commented Jul 9, 2016

jakearchibald commented Jul 9, 2016

delapuente commented Jul 15, 2016

RReverser commented Jul 15, 2016

inian commented Jul 15, 2016

RReverser commented Jul 15, 2016

inian commented Jul 15, 2016 • edited

mflux commented Jan 19, 2017

joeyparrish commented May 30, 2017

RReverser commented May 30, 2017

joeyparrish commented May 31, 2017

v1nce commented Jul 22, 2021

jakearchibald commented Jul 30, 2021

Stvad commented Apr 29, 2022

tuhuynh27 commented Aug 31, 2022

Stvad commented Aug 31, 2022

rektide commented Jul 29, 2023

jakearchibald commented Jul 30, 2023

jakearchibald commented Jul 30, 2023

v1nce commented Jul 31, 2023 via email • edited

jakearchibald commented Aug 1, 2023

inian commented Jul 15, 2016 •

edited

v1nce commented Jul 31, 2023 via email •

edited