Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a consistent DOM snapshot (including iframes)? #3658

Closed
tolmasky opened this issue Dec 12, 2018 · 4 comments
Closed

Creating a consistent DOM snapshot (including iframes)? #3658

tolmasky opened this issue Dec 12, 2018 · 4 comments

Comments

@tolmasky
Copy link

I would like to take a "snapshot" of the DOM (minus scripts), that takes into account iframe contents, ideally all at the same time. By that I mean, I would prefer not to climb the iframe tree through promises in the top level script (mainFrame().children...), which could result in different parts of the snapshot happening at different times. I'm curious if there would be a way with the current API to have on long block operation that does a "deep" outerHTML of sorts on a page (replacing iframes with a div equivalent containing its associated HTML), and removing all scripts of any sort. If not, is this something that could be considered for Chromium to enable through the protocol? Essentially want something similar to Safari's web archive feature, but generating one big string.

@aslushnikov
Copy link
Contributor

Hey @tolmasky,

You can capture an MHTML of the page using the Page.captureSnapshot devtools protocol method. The MHTML will give you the whole page, with iframes, in a single file. It still might be somewhat asynchronous, given out-of-process iframes architecture in chrome, but that's the closest we have in protocol.

Will this work for you?

cc +@psybuzz

@tolmasky
Copy link
Author

Hi @aslushnikov,

Thanks for the quick reply. The Page.captureScreenshot certainly sounds like what I want at least conceptually, although there remain some questions in the details. As far as asynchronocity, I think that should be sufficient at least for my current use (my main concern was having to traverse these resources myself at the application level, which seems like it would be a lot slower and leave more opportunities for weird out-of-sync issues).

I tried looking online a bit for a good description of the internals of mhtml, but haven't found anything super definitive yet (would certainly appreciate a link if it exists!). With regard to iframes and shadow dom, how are they "translated" into the single file format? Is the iframe turned into a div (with appropriate overflows/etc to simulate a replaced element?). Similarly, is the shadow dom merely inlined along with the rest of the DOM? And additionally, are scripts simply discarded (this would be idea in my case as the way I plan on showing them later I would want them to be "dead").

@aslushnikov
Copy link
Contributor

I tried looking online a bit for a good description of the internals of mhtml, but haven't found anything super definitive yet (would certainly appreciate a link if it exists!).

Check this out: https://goo.gl/GYT7Br

With regard to iframes and shadow dom, how are they "translated" into the single file format?

iframes are serialized as iframes. shadow DOM gets a special "shadowmode" attribute, which afaik is Chrome specific so saving page as mhtml w/ shadow DOM and opening it in Firefox will not work

And additionally, are scripts simply discarded

Yes, this seems to be the case.

Beware though: MHTML is experimental. I'd try playing around it to see how it works.
I'll close this for now since we got you something to research.

@gildas-lormeau
Copy link

gildas-lormeau commented Apr 14, 2019

@tolmasky It's maybe too late but it looks like SingleFile would fulfill your needs, more info here: https://github.com/gildas-lormeau/SingleFile/tree/master/cli. For your information, SingleFile serializes shadow DOM elements into iframes. It's far to be perfect but it works well for embedded tweets today.

edit: it now serializes them into templates (instead of ifraes) and adds a small script to attach them to the shadow root.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants