Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefetch img, scripts and stylesheets during parsing #24148

Merged
merged 4 commits into from Sep 11, 2019

Conversation

@asajeffrey
Copy link
Member

asajeffrey commented Sep 5, 2019

Eagerly tokenize html input, and scan for script and img tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.


  • ./mach build -d does not report any errors
  • ./mach test-tidy does not report any errors
  • These changes fix #3847
  • These changes do not require tests because it's a perf improvement

This change is Reviewable

@highfive
Copy link

highfive commented Sep 5, 2019

Heads up! This PR modifies the following files:

  • @KiChjang: components/script/dom/servoparser/mod.rs, components/script/dom/servoparser/prefetch.rs, components/net/resource_thread.rs, components/net_traits/lib.rs
@highfive
Copy link

highfive commented Sep 5, 2019

warning Warning warning

  • These commits modify net and script code, but no tests are modified. Please consider adding a test!
@asajeffrey
Copy link
Member Author

asajeffrey commented Sep 6, 2019

r? @jdm

@highfive highfive assigned jdm and unassigned nox Sep 6, 2019
@asajeffrey
Copy link
Member Author

asajeffrey commented Sep 6, 2019

One thing I'm not sure about is that we're issuing a GET when prefetching, even for content which isn't cacheable. We should probably first issue a HEAD to see if the content is cacheable. Perhaps this should be a follow-up issue?

Copy link
Member

jdm left a comment

I like the simplicity of this model. We just need to make sure that we don't unintentionally bypass security features of the platform by prefetching more liberally than the actual request the page intends to make.

components/script/dom/servoparser/prefetch.rs Outdated Show resolved Hide resolved
inner: HtmlTokenizer<PrefetchSink>,
}

unsafe_no_jsmanaged_fields!(Tokenizer);

This comment has been minimized.

Copy link
@jdm

jdm Sep 6, 2019

Member

I would prefer to derive JSTraceable on Tokenizer instead to avoid accidents in the future if we update that struct.

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 6, 2019

Author Member

Oh this is weird, I replied to this comment but github appears to have lost the reply...

Anyway, the problem is that HtmlTokenizer doesn't implement JSTraceable, and it's difficult to do so as its generic, and the tokenizer API provides no way to access the sink. If it did, the implementation would be pretty easy:

    impl<Sink: JSTraceable> for HtmlTokenizer<Sink> {
        fn trace(&self) { self.sink().trace() } 
   } 

but there's no sink() method, so we can't really do that.

This comment has been minimized.

Copy link
@jdm

jdm Sep 6, 2019

Member

We have a JSTraceable implementation for the HTML tokenizer in script/dom/servoparser/html.rs. I recommend following its example.

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 6, 2019

Author Member

Er, but stripping out the unnecessary code leaves just:

#[allow(unsafe_code)]
unsafe impl JSTraceable for HtmlTokenizer<PrefetchSink>> {
    unsafe fn trace(&self, trc: *mut JSTracer) {
    }
}

which doesn't have obvious advantages over unsafe_no_jsmanaged_fields!.

This comment has been minimized.

Copy link
@jdm

jdm Sep 6, 2019

Member

Can't we still derive JStraceable on Sink and call self.sink.trace() at least?

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 6, 2019

Author Member

The problem is we don't have access to the sink, since the tokenizer doesn't expose an API to access it. We couuuld use an Rc, but that seems a bit round-the-houses.

This comment has been minimized.

Copy link
@jdm

jdm Sep 6, 2019

Member

Isn't the tokenizer this type? It has a public sink field. What am I missing?

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 6, 2019

Author Member

Argh, I was looking for an accessor method, I didn't spot that the field was public. D'oh!

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 6, 2019

Author Member

Okay, fixed.

components/script/dom/servoparser/prefetch.rs Outdated Show resolved Hide resolved
components/script/dom/servoparser/prefetch.rs Outdated Show resolved Hide resolved
components/script/dom/servoparser/prefetch.rs Outdated Show resolved Hide resolved
@asajeffrey
Copy link
Member Author

asajeffrey commented Sep 6, 2019

I don't think there's security issues here, since the data requested by the prefetch is only used to populate the cache, and isn't exposed to script. If we do set the headers according to htmlscriptelement and htmlimageelement, then we do need to make sure we're not sending cookies we shouldn't.

@asajeffrey
Copy link
Member Author

asajeffrey commented Sep 6, 2019

@asajeffrey asajeffrey changed the title Prefetch img and scripts during parsing Prefetch img, scripts and stylesheets during parsing Sep 6, 2019
@asajeffrey
Copy link
Member Author

asajeffrey commented Sep 6, 2019

One thing I realized doing this is that fetch uses the origin, which is mutable, but for speculative prefetching we don't get much choice but to use the original origin.

@jdm
jdm approved these changes Sep 6, 2019
@jdm
Copy link
Member

jdm commented Sep 6, 2019

@bors-servo try=wpt

@bors-servo
Copy link
Contributor

bors-servo commented Sep 6, 2019

Trying commit 330b940 with merge 419002d...

bors-servo added a commit that referenced this pull request Sep 6, 2019
Prefetch img, scripts and stylesheets during parsing

<!-- Please describe your changes on the following line: -->

Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #3847
- [x] These changes do not require tests because it's a perf improvement

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Sep 10, 2019

Trying commit c6e1b8a with merge 8a20090...

bors-servo added a commit that referenced this pull request Sep 10, 2019
Prefetch img, scripts and stylesheets during parsing

<!-- Please describe your changes on the following line: -->

Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #3847
- [x] These changes do not require tests because it's a perf improvement

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Sep 10, 2019

💔 Test failed - status-taskcluster

@asajeffrey
Copy link
Member Author

asajeffrey commented Sep 10, 2019

OK, that was a CI failure, the WPT test suite passed!

@asajeffrey
Copy link
Member Author

asajeffrey commented Sep 10, 2019

Now waiting for a review from @nox.

@asajeffrey
Copy link
Member Author

asajeffrey commented Sep 10, 2019

@asajeffrey asajeffrey force-pushed the asajeffrey:script-prefetch branch from c6e1b8a to aeac382 Sep 10, 2019
@asajeffrey
Copy link
Member Author

asajeffrey commented Sep 10, 2019

Squashed.

Copy link
Member

nox left a comment

I'm a bit anxious about doing that tbh, but it looks ok apart from some nits.

@@ -237,6 +237,18 @@ impl FetchTaskTarget for IpcSender<FetchResponseMsg> {
}
}

impl FetchTaskTarget for () {

This comment has been minimized.

Copy link
@nox

nox Sep 11, 2019

Member

Make a new struct type, don't piggy-back the unit type. It's cheap to make a new one anyway.

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

Done.

@@ -101,6 +102,10 @@ pub struct ServoParser {
aborted: Cell<bool>,
/// <https://html.spec.whatwg.org/multipage/#script-created-parser>
script_created_parser: bool,
/// We do a quick-and-dirty parse of the input looking for resources to prefetch

This comment has been minimized.

Copy link
@nox

nox Sep 11, 2019

Member

Let's write a big TODO that we should be carrying the speculative parsing PR to completion and remove that.

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

Done. The PR you mean is #19203?

@@ -425,20 +432,50 @@ impl ServoParser {
)
}

fn push_tendril_input_chunk(&self, chunk: StrTendril) {
if !chunk.is_empty() {

This comment has been minimized.

Copy link
@nox

nox Sep 11, 2019

Member

Early return if the chunk is empty instead of drifting towards the right.

fn push_bytes_input_chunk(&self, chunk: Vec<u8>) {
// For byte input, we convert it to text using the network decoder.

This comment has been minimized.

Copy link
@nox

nox Sep 11, 2019

Member

Obvious comment is obvious. :)

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

Yes, but the matching comment on push_string_input_chunk wasn't (at least not to me!).

prefetch_input.push_back(chunk.clone());
let _ = self.prefetch_tokenizer
.borrow_mut()
.feed(&mut *prefetch_input);

This comment has been minimized.

Copy link
@nox

nox Sep 11, 2019

Member

The tokenizer here will be very wrong whenever someone uses document.write, I don't like that but I don't have any suggestion apart from "let's not land this PR and let's land speculative parsing instead".

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

Indeed. Speculative parsing will also be wrong in that case, not much you can do apart from add this to the list of reasons not to use document.write().

}

impl TokenSink for PrefetchSink {
type Handle = ();

This comment has been minimized.

Copy link
@nox

nox Sep 11, 2019

Member

Use a new unit struct type.

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

You are very keen on unit structs. Done.

},
(TagKind::EndTag, local_name!("script")) => {
// After the first script tag, the main parser is blocked, so it's worth prefetching.
self.prefetching = true;

This comment has been minimized.

Copy link
@nox

nox Sep 11, 2019

Member

I feel like that's the place where we should not prefetch ever, because the script blocking the parser may very well set window.domain or do stuff that makes this end tag's script not load or whatever.

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

Yeah, but if we waited to find out if script did any of that, we wouldn't be able to prefetch at all.

(TagKind::StartTag, local_name!("base")) => {
if let Some(url) = self.get_url(tag, local_name!("href")) {
debug!("Setting base {}", url);
self.base = url;

This comment has been minimized.

Copy link
@nox

nox Sep 11, 2019

Member

This is extremely wrong in presence of multiple base tags in an document.

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

According to https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base "There can be only one element in a document." But yes, I can imagine this causing resources to be prefetched erroneously.

This comment has been minimized.

Copy link
@asajeffrey
impl TokenSink for PrefetchSink {
type Handle = ();
fn process_token(&mut self, token: Token, _line_number: u64) -> TokenSinkResult<()> {
if let Token::TagToken(ref tag) = token {

This comment has been minimized.

Copy link
@nox

nox Sep 11, 2019

Member

Please early return out of the function if token isn't a TagToken, that will help with rightwards drift.

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

Done.

.send(CoreResourceMsg::Fetch(request, FetchChannels::Prefetch));
}
// Don't prefetch inside script
self.prefetching = false;

This comment has been minimized.

Copy link
@nox

nox Sep 11, 2019

Member

Nothing but script data will be ever return from parsing a script, please use the proper TokenSinkResult return value to tell the tokenizer to switch modes.

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

Done.

Copy link
Member Author

asajeffrey left a comment

OK, I think I addressed everything.

@@ -237,6 +237,18 @@ impl FetchTaskTarget for IpcSender<FetchResponseMsg> {
}
}

impl FetchTaskTarget for () {

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

Done.

@@ -101,6 +102,10 @@ pub struct ServoParser {
aborted: Cell<bool>,
/// <https://html.spec.whatwg.org/multipage/#script-created-parser>
script_created_parser: bool,
/// We do a quick-and-dirty parse of the input looking for resources to prefetch

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

Done. The PR you mean is #19203?

prefetch_input.push_back(chunk.clone());
let _ = self.prefetch_tokenizer
.borrow_mut()
.feed(&mut *prefetch_input);

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

Indeed. Speculative parsing will also be wrong in that case, not much you can do apart from add this to the list of reasons not to use document.write().

fn push_bytes_input_chunk(&self, chunk: Vec<u8>) {
// For byte input, we convert it to text using the network decoder.

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

Yes, but the matching comment on push_string_input_chunk wasn't (at least not to me!).

self.network_input.borrow_mut().push_back(chunk.into());
// Convert the chunk to a tendril so cloning it isn't expensive.
// The input has already been decoded as a string, so doesn't need
// to be decoded by the network decoder again.

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

Not mentioning it was confusing to me though, I spent a while debugging before I realized I was re-encoding UTF-8 string data using the network decoder. I'll remove the comment about tendrils being cheap though, that's more me talking to myself than anything.

}

impl TokenSink for PrefetchSink {
type Handle = ();

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

You are very keen on unit structs. Done.

impl TokenSink for PrefetchSink {
type Handle = ();
fn process_token(&mut self, token: Token, _line_number: u64) -> TokenSinkResult<()> {
if let Token::TagToken(ref tag) = token {

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

Done.

.send(CoreResourceMsg::Fetch(request, FetchChannels::Prefetch));
}
// Don't prefetch inside script
self.prefetching = false;

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

Done.

},
(TagKind::EndTag, local_name!("script")) => {
// After the first script tag, the main parser is blocked, so it's worth prefetching.
self.prefetching = true;

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

Yeah, but if we waited to find out if script did any of that, we wouldn't be able to prefetch at all.

(TagKind::StartTag, local_name!("base")) => {
if let Some(url) = self.get_url(tag, local_name!("href")) {
debug!("Setting base {}", url);
self.base = url;

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 11, 2019

Author Member

According to https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base "There can be only one element in a document." But yes, I can imagine this causing resources to be prefetched erroneously.

@asajeffrey
Copy link
Member Author

asajeffrey commented Sep 11, 2019

@nox, you okay with the changes?

@nox
Copy link
Member

nox commented Sep 11, 2019

@bors-servo r=jdm,nox

@bors-servo
Copy link
Contributor

bors-servo commented Sep 11, 2019

📌 Commit 2e6f14f has been approved by jdm,nox

bors-servo added a commit that referenced this pull request Sep 11, 2019
Prefetch img, scripts and stylesheets during parsing

<!-- Please describe your changes on the following line: -->

Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #3847
- [x] These changes do not require tests because it's a perf improvement

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Sep 11, 2019

Testing commit 2e6f14f with merge 4e9967b...

@bors-servo
Copy link
Contributor

bors-servo commented Sep 11, 2019

☀️ Test successful - linux-rel-css, linux-rel-wpt, status-taskcluster
Approved by: jdm,nox
Pushing 4e9967b to master...

@bors-servo bors-servo merged commit 2e6f14f into servo:master Sep 11, 2019
2 of 3 checks passed
2 of 3 checks passed
continuous-integration/appveyor/pr AppVeyor build failed
Details
Taskcluster (pull_request) TaskGroup: success
Details
homu Test successful
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants
You can’t perform that action at this time.