Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upPrefetch img, scripts and stylesheets during parsing #24148
Conversation
highfive
commented
Sep 5, 2019
|
Heads up! This PR modifies the following files:
|
highfive
commented
Sep 5, 2019
|
r? @jdm |
|
One thing I'm not sure about is that we're issuing a GET when prefetching, even for content which isn't cacheable. We should probably first issue a HEAD to see if the content is cacheable. Perhaps this should be a follow-up issue? |
|
I like the simplicity of this model. We just need to make sure that we don't unintentionally bypass security features of the platform by prefetching more liberally than the actual request the page intends to make. |
| inner: HtmlTokenizer<PrefetchSink>, | ||
| } | ||
|
|
||
| unsafe_no_jsmanaged_fields!(Tokenizer); |
This comment has been minimized.
This comment has been minimized.
jdm
Sep 6, 2019
Member
I would prefer to derive JSTraceable on Tokenizer instead to avoid accidents in the future if we update that struct.
This comment has been minimized.
This comment has been minimized.
asajeffrey
Sep 6, 2019
Author
Member
Oh this is weird, I replied to this comment but github appears to have lost the reply...
Anyway, the problem is that HtmlTokenizer doesn't implement JSTraceable, and it's difficult to do so as its generic, and the tokenizer API provides no way to access the sink. If it did, the implementation would be pretty easy:
impl<Sink: JSTraceable> for HtmlTokenizer<Sink> {
fn trace(&self) { self.sink().trace() }
} but there's no sink() method, so we can't really do that.
This comment has been minimized.
This comment has been minimized.
jdm
Sep 6, 2019
Member
We have a JSTraceable implementation for the HTML tokenizer in script/dom/servoparser/html.rs. I recommend following its example.
This comment has been minimized.
This comment has been minimized.
asajeffrey
Sep 6, 2019
Author
Member
Er, but stripping out the unnecessary code leaves just:
#[allow(unsafe_code)]
unsafe impl JSTraceable for HtmlTokenizer<PrefetchSink>> {
unsafe fn trace(&self, trc: *mut JSTracer) {
}
}which doesn't have obvious advantages over unsafe_no_jsmanaged_fields!.
This comment has been minimized.
This comment has been minimized.
jdm
Sep 6, 2019
Member
Can't we still derive JStraceable on Sink and call self.sink.trace() at least?
This comment has been minimized.
This comment has been minimized.
asajeffrey
Sep 6, 2019
Author
Member
The problem is we don't have access to the sink, since the tokenizer doesn't expose an API to access it. We couuuld use an Rc, but that seems a bit round-the-houses.
This comment has been minimized.
This comment has been minimized.
jdm
Sep 6, 2019
•
Member
Isn't the tokenizer this type? It has a public sink field. What am I missing?
This comment has been minimized.
This comment has been minimized.
asajeffrey
Sep 6, 2019
Author
Member
Argh, I was looking for an accessor method, I didn't spot that the field was public. D'oh!
This comment has been minimized.
This comment has been minimized.
|
I don't think there's security issues here, since the data requested by the prefetch is only used to populate the cache, and isn't exposed to script. If we do set the headers according to htmlscriptelement and htmlimageelement, then we do need to make sure we're not sending cookies we shouldn't. |
|
One thing I realized doing this is that fetch uses the origin, which is mutable, but for speculative prefetching we don't get much choice but to use the original origin. |
|
@bors-servo try=wpt |
Prefetch img, scripts and stylesheets during parsing <!-- Please describe your changes on the following line: --> Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms. --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: --> - [x] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [x] These changes fix #3847 - [x] These changes do not require tests because it's a perf improvement <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148) <!-- Reviewable:end -->
Prefetch img, scripts and stylesheets during parsing <!-- Please describe your changes on the following line: --> Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms. --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: --> - [x] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [x] These changes fix #3847 - [x] These changes do not require tests because it's a perf improvement <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148) <!-- Reviewable:end -->
|
|
|
OK, that was a CI failure, the WPT test suite passed! |
|
Now waiting for a review from @nox. |
|
Squashed. |
|
I'm a bit anxious about doing that tbh, but it looks ok apart from some nits. |
| @@ -237,6 +237,18 @@ impl FetchTaskTarget for IpcSender<FetchResponseMsg> { | |||
| } | |||
| } | |||
|
|
|||
| impl FetchTaskTarget for () { | |||
This comment has been minimized.
This comment has been minimized.
nox
Sep 11, 2019
Member
Make a new struct type, don't piggy-back the unit type. It's cheap to make a new one anyway.
This comment has been minimized.
This comment has been minimized.
| @@ -101,6 +102,10 @@ pub struct ServoParser { | |||
| aborted: Cell<bool>, | |||
| /// <https://html.spec.whatwg.org/multipage/#script-created-parser> | |||
| script_created_parser: bool, | |||
| /// We do a quick-and-dirty parse of the input looking for resources to prefetch | |||
This comment has been minimized.
This comment has been minimized.
nox
Sep 11, 2019
Member
Let's write a big TODO that we should be carrying the speculative parsing PR to completion and remove that.
This comment has been minimized.
This comment has been minimized.
| @@ -425,20 +432,50 @@ impl ServoParser { | |||
| ) | |||
| } | |||
|
|
|||
| fn push_tendril_input_chunk(&self, chunk: StrTendril) { | |||
| if !chunk.is_empty() { | |||
This comment has been minimized.
This comment has been minimized.
| fn push_bytes_input_chunk(&self, chunk: Vec<u8>) { | ||
| // For byte input, we convert it to text using the network decoder. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
asajeffrey
Sep 11, 2019
Author
Member
Yes, but the matching comment on push_string_input_chunk wasn't (at least not to me!).
| prefetch_input.push_back(chunk.clone()); | ||
| let _ = self.prefetch_tokenizer | ||
| .borrow_mut() | ||
| .feed(&mut *prefetch_input); |
This comment has been minimized.
This comment has been minimized.
nox
Sep 11, 2019
Member
The tokenizer here will be very wrong whenever someone uses document.write, I don't like that but I don't have any suggestion apart from "let's not land this PR and let's land speculative parsing instead".
This comment has been minimized.
This comment has been minimized.
asajeffrey
Sep 11, 2019
Author
Member
Indeed. Speculative parsing will also be wrong in that case, not much you can do apart from add this to the list of reasons not to use document.write().
| } | ||
|
|
||
| impl TokenSink for PrefetchSink { | ||
| type Handle = (); |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| }, | ||
| (TagKind::EndTag, local_name!("script")) => { | ||
| // After the first script tag, the main parser is blocked, so it's worth prefetching. | ||
| self.prefetching = true; |
This comment has been minimized.
This comment has been minimized.
nox
Sep 11, 2019
Member
I feel like that's the place where we should not prefetch ever, because the script blocking the parser may very well set window.domain or do stuff that makes this end tag's script not load or whatever.
This comment has been minimized.
This comment has been minimized.
asajeffrey
Sep 11, 2019
Author
Member
Yeah, but if we waited to find out if script did any of that, we wouldn't be able to prefetch at all.
| (TagKind::StartTag, local_name!("base")) => { | ||
| if let Some(url) = self.get_url(tag, local_name!("href")) { | ||
| debug!("Setting base {}", url); | ||
| self.base = url; |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
asajeffrey
Sep 11, 2019
Author
Member
According to https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base "There can be only one element in a document." But yes, I can imagine this causing resources to be prefetched erroneously.
This comment has been minimized.
This comment has been minimized.
asajeffrey
Sep 11, 2019
Author
Member
OK, the spec has a note about this (https://html.spec.whatwg.org/multipage/semantics.html#the-base-element). Fixed.
| impl TokenSink for PrefetchSink { | ||
| type Handle = (); | ||
| fn process_token(&mut self, token: Token, _line_number: u64) -> TokenSinkResult<()> { | ||
| if let Token::TagToken(ref tag) = token { |
This comment has been minimized.
This comment has been minimized.
nox
Sep 11, 2019
Member
Please early return out of the function if token isn't a TagToken, that will help with rightwards drift.
This comment has been minimized.
This comment has been minimized.
| .send(CoreResourceMsg::Fetch(request, FetchChannels::Prefetch)); | ||
| } | ||
| // Don't prefetch inside script | ||
| self.prefetching = false; |
This comment has been minimized.
This comment has been minimized.
nox
Sep 11, 2019
Member
Nothing but script data will be ever return from parsing a script, please use the proper TokenSinkResult return value to tell the tokenizer to switch modes.
This comment has been minimized.
This comment has been minimized.
|
OK, I think I addressed everything. |
| @@ -237,6 +237,18 @@ impl FetchTaskTarget for IpcSender<FetchResponseMsg> { | |||
| } | |||
| } | |||
|
|
|||
| impl FetchTaskTarget for () { | |||
This comment has been minimized.
This comment has been minimized.
| @@ -101,6 +102,10 @@ pub struct ServoParser { | |||
| aborted: Cell<bool>, | |||
| /// <https://html.spec.whatwg.org/multipage/#script-created-parser> | |||
| script_created_parser: bool, | |||
| /// We do a quick-and-dirty parse of the input looking for resources to prefetch | |||
This comment has been minimized.
This comment has been minimized.
| prefetch_input.push_back(chunk.clone()); | ||
| let _ = self.prefetch_tokenizer | ||
| .borrow_mut() | ||
| .feed(&mut *prefetch_input); |
This comment has been minimized.
This comment has been minimized.
asajeffrey
Sep 11, 2019
Author
Member
Indeed. Speculative parsing will also be wrong in that case, not much you can do apart from add this to the list of reasons not to use document.write().
| fn push_bytes_input_chunk(&self, chunk: Vec<u8>) { | ||
| // For byte input, we convert it to text using the network decoder. |
This comment has been minimized.
This comment has been minimized.
asajeffrey
Sep 11, 2019
Author
Member
Yes, but the matching comment on push_string_input_chunk wasn't (at least not to me!).
| self.network_input.borrow_mut().push_back(chunk.into()); | ||
| // Convert the chunk to a tendril so cloning it isn't expensive. | ||
| // The input has already been decoded as a string, so doesn't need | ||
| // to be decoded by the network decoder again. |
This comment has been minimized.
This comment has been minimized.
asajeffrey
Sep 11, 2019
Author
Member
Not mentioning it was confusing to me though, I spent a while debugging before I realized I was re-encoding UTF-8 string data using the network decoder. I'll remove the comment about tendrils being cheap though, that's more me talking to myself than anything.
| } | ||
|
|
||
| impl TokenSink for PrefetchSink { | ||
| type Handle = (); |
This comment has been minimized.
This comment has been minimized.
| impl TokenSink for PrefetchSink { | ||
| type Handle = (); | ||
| fn process_token(&mut self, token: Token, _line_number: u64) -> TokenSinkResult<()> { | ||
| if let Token::TagToken(ref tag) = token { |
This comment has been minimized.
This comment has been minimized.
| .send(CoreResourceMsg::Fetch(request, FetchChannels::Prefetch)); | ||
| } | ||
| // Don't prefetch inside script | ||
| self.prefetching = false; |
This comment has been minimized.
This comment has been minimized.
| }, | ||
| (TagKind::EndTag, local_name!("script")) => { | ||
| // After the first script tag, the main parser is blocked, so it's worth prefetching. | ||
| self.prefetching = true; |
This comment has been minimized.
This comment has been minimized.
asajeffrey
Sep 11, 2019
Author
Member
Yeah, but if we waited to find out if script did any of that, we wouldn't be able to prefetch at all.
| (TagKind::StartTag, local_name!("base")) => { | ||
| if let Some(url) = self.get_url(tag, local_name!("href")) { | ||
| debug!("Setting base {}", url); | ||
| self.base = url; |
This comment has been minimized.
This comment has been minimized.
asajeffrey
Sep 11, 2019
Author
Member
According to https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base "There can be only one element in a document." But yes, I can imagine this causing resources to be prefetched erroneously.
|
@nox, you okay with the changes? |
|
@bors-servo r=jdm,nox |
|
|
Prefetch img, scripts and stylesheets during parsing <!-- Please describe your changes on the following line: --> Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms. --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: --> - [x] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [x] These changes fix #3847 - [x] These changes do not require tests because it's a perf improvement <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148) <!-- Reviewable:end -->
|
|
asajeffrey commentedSep 5, 2019
•
edited by SimonSapin
Eagerly tokenize html input, and scan for
scriptandimgtags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms../mach build -ddoes not report any errors./mach test-tidydoes not report any errorsThis change is