Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement document.write #3704

Closed
kmcallister opened this issue Oct 16, 2014 · 18 comments
Closed

Implement document.write #3704

kmcallister opened this issue Oct 16, 2014 · 18 comments
Labels
A-content/bindings The DOM bindings B-interesting-project Represents work that is expected to be interesting in some fashion

Comments

@kmcallister
Copy link
Contributor

Part of #1879. See servo/html5ever#6.

https://html.spec.whatwg.org/multipage/#document.write()

@jdm
Copy link
Member

jdm commented Mar 26, 2015

Blocks #5398.

@jdm
Copy link
Member

jdm commented Apr 27, 2015

Blocks #5864.

@kmcallister kmcallister removed their assignment Jun 11, 2015
@SimonSapin
Copy link
Member

Aside from preserving unpaired surrogates (see #6564), I believe that html5ever at this point supports everything needed for document.write. The rest (as far as I understand ) should happen in Servo’s script crate.

@jdm jdm added E-more-complex Variable effort required; may require a mentor. Recommended solution is clearly described in the iss and removed E-more-complex Variable effort required; may require a mentor. Recommended solution is clearly described in the iss labels Jul 16, 2015
@jdm
Copy link
Member

jdm commented Jul 17, 2015

I'm not certain that's true - for example, I don't see any evidence of an insertion point in html5ever, so there's no way to ensure that we're feeding tokens into the tokenizer immediately rather than appending them to an existing queue. In particular, under "An end tag whose tag name is script", it says "Let the old insertion point have the same value as the current insertion point. Let the insertion point be just before the next input character." We'll need to add this functionality (querying and setting the insertion point) to html5ever, along with the ability to make the parser consume a single character and return control to the caller, then add the following to the HTML parsing code in libscript:

Finally we can implement a subset of the steps for document.write leaving stubs for the bits related to document unloading and when there's no active parser (ie. when we should call document.open).

@jdm jdm added the B-interesting-project Represents work that is expected to be interesting in some fashion label Jul 17, 2015
jdm added a commit to jdm/servo that referenced this issue Aug 8, 2015
@jdm
Copy link
Member

jdm commented Aug 8, 2015

I played with this for a few hours and sketched out what I think this should look like in https://github.com/jdm/servo/tree/documentwrite and https://github.com/jdm/html5ever/tree/write . The actual implementation of insertion points in html5ever is completely missing, but I believe the DOM bits are relatively complete.

@jdm
Copy link
Member

jdm commented Aug 8, 2015

I now see why @jgraham kept mentioning document.write and script scheduling in the same breath. This stuff is hella complex to parse out of the spec.

@frewsxcv
Copy link
Contributor

frewsxcv commented Dec 8, 2015

Is there any way the implementation could be split up into more reasonable chunks instead of an all-at-once ordeal? I understand there are downsides to this strategy, but would be nice to get the ball rolling on this.

@jdm
Copy link
Member

jdm commented Dec 8, 2015

I don't see how. It's fundamentally about interacting with the parser, and the parser APIs aren't present yet.

@eefriedman
Copy link
Contributor

It might be easier to implement the special case of document.write() in the case where there isn't any buffered input (writing into a document created with document.open()), compared to the general case; that way, you don't have to worry about the awkward interaction of inserting HTML into an existing input stream. Not especially interesting in terms of web compatibility, but it wouldn't require any special parser support.

@jdm
Copy link
Member

jdm commented Dec 8, 2015

Yeah, but what do we do for the non-easy case if we do that?

@eefriedman
Copy link
Contributor

Umm, throw out the "easy" implementation and put in a correct one? I see what you mean. :( But at least we'd have an implementation of document.open() and document.close().

@jgraham
Copy link
Contributor

jgraham commented Dec 8, 2015

I think document.open is low-value compared to document.write so I don't think there's any reason to make a throwaway implementation just to get that working without tackling the harder problem.

I agree that document.write is going to be tricky to get right and doesn't trivially break down into a small number of well-defined steps. At some point someone will just have to sit down and dedicate a block of time to it.

@nox
Copy link
Contributor

nox commented Oct 20, 2016

14:43 <nox> To implement document.write in Gecko, is the parser called in a reentrant way?
08:56 <hsivonen> nox: whether the parser is called re-entrantly depends on what you consider to be the "parser".
08:57 <hsivonen> nox: The tokenizer & tree builder are not re-entrant
08:59 <hsivonen> nox: for document.write, there are 3 tokenizer+treebuilder pairs:
08:59 <hsivonen> nox: 1) the pair that deals with data from the network (parser thread)
08:59 <hsivonen> nox: 2) the pair that deals with document.written data (main thread)
09:00 <hsivonen> nox: 3) the pair that parses the tail of the document.written data for script, CSS and image preloads when the previous one is blocked
09:01 <hsivonen> nox: any given tokenizer is non-re-entrant
09:02 <hsivonen> nox: oh, and pair 3 is on the main thread, too
09:03 <hsivonen> nox: tokenizer 1 is mTokenizer in nsHtml5StreamParser
09:03 <hsivonen> nox: tokenizer 2 is mTokenizer in nsHtml5Parser
09:04 <hsivonen> nox: tokenizer 3 is mDocWriteSpeculativeTokenizer in nsHtml5Parser
10:56 <nox> hsivonen: When parsing from the network resumes, how is the state of tok/tb pair (1) from the final state of the pair (2)?
10:58 <hsivonen> nox: pair 1 has speculated onwards. the final state of 2 is compared with a snapshot of the state of 1 at the start of speculation.
10:58 <hsivonen> nox: if there's a match, the speculation is valid and the speculative output that 1 has already generated gets shipped to the main thread
10:58 <nox> hsivonen: Ok, and if the speculation is wrong it's just reset to the state of 2 I guess?
10:59 <hsivonen> nox: if there's a mismatch, speculative output is discarded, input stream rewound and the state of 2 copied into 1
10:59 <nox> hsivonen: Basically, I'm looking at implementing Doc.write in html5ever for Servo, and I guess pretty much everything needs to be rewritten.
10:59 <hsivonen> nox: I recommend copying Gecko's architecture
10:59 <nox> hsivonen: Of course you do. ;)
11:00 <hsivonen> nox: but I wouldn't worry about surrogate pairs getting split across doc.write and network
11:00 <nox> https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading Is that up-to-date and correct?
11:00 <hsivonen> nox: yes
11:00 <nox> hsivonen: Oh yes we are not going to worry about that for now.
11:02 <nox> hsivonen: So the tokenizers never own the input stream, right?
11:02 <nox> Which sounds like the first thing to change in our parser...
11:03 <hsivonen> nox: nsHtml5StreamParser and nsHtml5Parser are IO drivers that own the linked list of buffers (network for the former, doc.write for the latter) that represent the input stream
11:03 <hsivonen> they push buffers to the tokenizers
11:04 <hsivonen> nox: the import thing is that when the > in </script> has been processed by the tokenizer, there's no stream lookahead or stuff like that baked into the tokenizer state
11:05 <nox> hsivonen: Yeah, so basically we are in a difficult place. When the tokenizer processes </script> in our stuff, it calls a trait method that the user of the parser implements to prepare a script,
11:05 <nox> so the caller of the script prepare is the tokenizer itself and then everything goes to hell.
11:06 <hsivonen> nox: how do you represent the input stream currently?
11:06 <nox> hsivonen: List of buffers, but it's owned by the Tokenizer, sort of.
11:07 <nox> If the parser is suspended, some script stuff also owns the buffers from the network,
11:07 <nox> because you can't feed the tokenizer without it processing as much as it can.
11:08 <hsivonen> nox: the tokenizer needs to be able to suspend itself upon processing the > in </script>
11:09 <hsivonen> nox: as in, it helps if the tokenizer gets off the call stack at that point and some thingy that's still on the stack owns the rest of the input stream and takes care of executing the script
11:09 <nox> Yeah it does that, but if you feed it again some buffer, it will resume on its own, that's why the owner of the tokenizer in script sometimes holds onto the data received from the network.

@nox
Copy link
Contributor

nox commented Oct 20, 2016

AFAIK, the first needed step is to make h5e not own the input buffer at all. So my first intuition was to feed chunks one by one, but that doesn't fix the issue that h5e needs to give back the input it was fed in the case it encounters a blocking script.

@nox
Copy link
Contributor

nox commented Oct 20, 2016

AFAICT, your branch would never work, and would panic there because the parser is already borrowed mutably, right?

@jdm
Copy link
Member

jdm commented Oct 20, 2016

Maybe! You've clearly put thought into it more recently than I have.

@nox
Copy link
Contributor

nox commented Oct 20, 2016

@jdm Is there a high-level description of our parsing entry point and how it interacts with the data coming from the network? I understand how h5e works when fed data, but I don't have yet a clear picture of how it is started, how it is owned, etc. Would be great if you or someone else could detail that, I think that would help me for this.

@jdm
Copy link
Member

jdm commented Oct 20, 2016

When the HTTP headers are received, ParserContext::headers_available optionally feeds the parser a synthetic response body and disregards future response content. Each time a piece of data is received from the network in ParserContext::data_available we call ServoParser::parse_chunk, which appends the new data and only continues parsing if the parser is not currently suspended. ServoParser holds on to a queue of buffers that have not yet been fed to the parser; each time parsing is supposed to continue (in ServoParser::do_sync) we try to feed the parser with the first buffer in the queue (which causes parsing to continue), or instruct it to run without any new data. When h5e returns, it has either been suspended, or it has run out of input. ParserContext::response_complete ensures that we do not expect any further content, and run the parser again if it's not suspended (this may be unnecessary, now that I think about it).

nox added a commit to nox/servo that referenced this issue Nov 25, 2016
This is a bit crude because of some missing utility methods on BufferQueue.
bors-servo pushed a commit that referenced this issue Nov 25, 2016
Implement document.write (fixes #3704)

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14361)
<!-- Reviewable:end -->
nox added a commit to nox/servo that referenced this issue Nov 26, 2016
This is a bit crude because of some missing utility methods on BufferQueue.
nox added a commit to nox/servo that referenced this issue Nov 26, 2016
This is a bit crude because of some missing utility methods on BufferQueue.
bors-servo pushed a commit that referenced this issue Nov 26, 2016
Implement document.write (fixes #3704)

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14361)
<!-- Reviewable:end -->
nox added a commit to nox/servo that referenced this issue Nov 27, 2016
This is a bit crude because of some missing utility methods on BufferQueue.
bors-servo pushed a commit that referenced this issue Nov 27, 2016
Implement document.write (fixes #3704)

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14361)
<!-- Reviewable:end -->
nox added a commit to nox/servo that referenced this issue Nov 27, 2016
This is a bit crude because of some missing utility methods on BufferQueue.
bors-servo pushed a commit that referenced this issue Nov 27, 2016
Implement document.write (fixes #3704)

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14361)
<!-- Reviewable:end -->
nox added a commit to nox/servo that referenced this issue Nov 27, 2016
This is a bit crude because of some missing utility methods on BufferQueue.
bors-servo pushed a commit that referenced this issue Nov 27, 2016
Implement document.write (fixes #3704)

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14361)
<!-- Reviewable:end -->
nox added a commit to nox/servo that referenced this issue Nov 27, 2016
This is a bit crude because of some missing utility methods on BufferQueue.
bors-servo pushed a commit that referenced this issue Nov 27, 2016
Implement document.write (fixes #3704)

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14361)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Nov 27, 2016
Implement document.write (fixes #3704)

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14361)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Nov 27, 2016
Implement document.write (fixes #3704)

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14361)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Nov 27, 2016
Implement document.write (fixes #3704)

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14361)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Nov 27, 2016
Implement document.write (fixes #3704)

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14361)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Nov 27, 2016
Implement document.write (fixes #3704)

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14361)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Nov 27, 2016
Implement document.write (fixes #3704)

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14361)
<!-- Reviewable:end -->
nox added a commit to nox/servo that referenced this issue Nov 28, 2016
This is a bit crude because of some missing utility methods on BufferQueue.
bors-servo pushed a commit that referenced this issue Nov 28, 2016
Implement document.write (fixes #3704)

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14361)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Nov 29, 2016
Implement document.write (fixes #3704)

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14361)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Nov 29, 2016
Implement document.write (fixes #3704)

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14361)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-content/bindings The DOM bindings B-interesting-project Represents work that is expected to be interesting in some fashion
Projects
None yet
Development

No branches or pull requests

7 participants