Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

css background image parsing takes a very long time/hangs #13778

Closed
jrmuizel opened this issue Oct 15, 2016 · 40 comments
Closed

css background image parsing takes a very long time/hangs #13778

jrmuizel opened this issue Oct 15, 2016 · 40 comments

Comments

@jrmuizel
Copy link

@jrmuizel jrmuizel commented Oct 15, 2016

http://people.mozilla.org/~jmuizelaar/webrender/sites/instagram.html causes servo to spend a very long time in style::properties::longhands::background_image::parse_declared

@bholley

@bholley
Copy link
Contributor

@bholley bholley commented Oct 15, 2016

Thanks for the report!

@Manishearth Do you think this is good piranha food?

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Oct 15, 2016

I don't think so ... this is pretty curious and I don't see any immediately obvious reason behind it, so it might not be best for new contributors.

@wafflespeanut or @canaltinova might be interested in poking at this (otherwise I'll have a look)

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Oct 15, 2016

Looks like it uses data URIs, and perhaps we're very slow at parsing those? @SimonSapin

However, those are directly on <img src>, not via CSS.

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Oct 15, 2016

Oh, there are a bunch of CSS background-images too

@emilio
Copy link
Member

@emilio emilio commented Oct 15, 2016

I doubt we're parsing image src attributes with the CSS parser, seems like those would use the url parser directly?

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Oct 15, 2016

Yes, like I said. But there are other background-images in the inline CSS.

@emilio
Copy link
Member

@emilio emilio commented Oct 15, 2016

And also the font-face rules.

@emilio
Copy link
Member

@emilio emilio commented Oct 15, 2016

Actually I bet most of the time there is spent parsing the font-face rules. I might take a deeper look today if I find the time.

@canova
Copy link
Member

@canova canova commented Oct 15, 2016

This webpage is also hanging on my Firefox 51 and Chrome 53(MacOS 10.12). Are we sure it is servo related?

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Oct 15, 2016

Yeah, but those aren't in the path mentioned in the issue above. Still building, but will run with instruments to see if url parsing is the real culprit.

Looks like rust-url treats data URIs as any other kind of URI? I wonder if we can optimize this case since there's no transformation (punycode or otherwise) to be done here and we just need to verify that it belongs to a set of characters. There should be some bitmask algorithms that are able to operate on chunks here.

@emilio
Copy link
Member

@emilio emilio commented Oct 15, 2016

@Manishearth: Are you sure parse_declared uses rust-url? I don't think that's true.

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Oct 15, 2016

@canaltinova could be. Works fine for me on Firefox Dev. But I have a rather fast laptop so it might still be a hang-y page

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Oct 15, 2016

@emilio The exact path might not be correct due to inlining. It also depends on whether or not @jrmuizel drilled down further in the perf tool he used. parse_declared calls parse which tries to parse it as a url() function, and if so feeds it to rust-url.

@emilio
Copy link
Member

@emilio emilio commented Oct 15, 2016

@Manishearth: it goes to components::style::values::specified::Image::parse, that in turn goes to cssparser::ParserContext::parse_url which does not use rust-url, because you can specify all sorts of declared urls that don't need to be valid.

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Oct 15, 2016

My profile says otherwise 😄

screen shot 2016-10-16 at 12 52 27 am

parse_url calls base_url.join(input), which parses the URL.

Of course it has to use rust-url at some stage -- it's stored as a url::Url, it must have gotten there somehow 😄

@emilio
Copy link
Member

@emilio emilio commented Oct 15, 2016

Hm... You're right, I'm sleepy apparently :-)

@emilio
Copy link
Member

@emilio emilio commented Oct 15, 2016

Heh, I was looking at how did we deal with invalid URIs (because it's a valid declared value to do something like url("foo invalid url"), and apparently we just use about:invalid. We probably need to preserve the original URL, I'll open another bug about it.

@emilio
Copy link
Member

@emilio emilio commented Oct 15, 2016

FWIW, my profile looks way different than yours, taking ~30% of the time in cssparser::tokenizer::consume_ident_like. We have consume_unquoted_url inside that always traverses and makes a copy of the string.

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Oct 15, 2016

Yes, mine also has that fucntion in it

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Oct 15, 2016

Fully expanded at heavy bits:

screen shot 2016-10-16 at 12 55 41 am

It seems to be taking lots of time with tokenizing on the style side. As @emilio said consume_unquoted_url does a clone, which is probably getting to us.

On the URL side, it seems to be very worried about check_url_code_point. I suspect the is_url_code_point function (which is getting inlined) is taking a beating here. It uses a Rust match with ranges, and I have no idea how efficient that will be.

BurntSushi mentioned that he'd gotten some major wins from using utf8-ranges which gives you the tools necessary for matching sets of unicode ranges against a string with fast byte comparisons.

... Really, with a lazy_static precompiled regex, we might be able to steal the regex crate's speed here. Though handling percent-encoding could get tricky.

(Of course, we should investigate further before jumping to a solution)

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Oct 15, 2016

Hm, the clone only happens when there's a backslash char though. These are base64 URIs, they probably won't have those.

Still, that code could do with a set_capacity'd string for large URIs.

(I suspect there are a lot of other small improvements to be made here; parsing URIs isn't something you'd overoptimize, until you realize that folks stuff entire images into URIs 😜 )

@emilio
Copy link
Member

@emilio emilio commented Oct 15, 2016

emilio> Manishearth: does rustc preserve slicing checks in release?
22:06 → bjz joined (bjz@moz-dc8.as2.222.104.IP)
22:06 <emilio> Manishearth: I'm looking at the disassembly of consume_unquoted_url, and I'm looking at a lot of branches like callq  core::str::slice_error_fail::hf30ed83389856bb3. Most of them can be avoided since they come from tokenizer::next_char, and slicing there is useless (the slice always has length 1).
22:07 <•Manishearth> emilio: it should optimize them out
22:07 <•Manishearth> ... why do we use a slice then
22:07 <•Manishearth> but yes, that could be it
22:07 <•Manishearth> emilio: I am annoyed that the data uri spec does not mandate that base64 uris actually be valid base 64
@emilio
Copy link
Member

@emilio emilio commented Oct 15, 2016

The patch over servo/rust-cssparser#110 should make a decent difference on this test case.

@emilio
Copy link
Member

@emilio emilio commented Oct 15, 2016

That patch applied locally moves cssparser::tokenizer::consume_ident_like to the second place in the profile, and now url::parser::Parser::parse_cannot_be_a_base_path is the biggest offender.

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Oct 16, 2016

Oh, because we're just copying non-ascii chars anyway, and we already know that the string is valid utf8. Good catch.

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Oct 16, 2016

We might want to bypass rust-url entirely for data: URLs. Maybe have a type like enum ServoUrl { Data { content_type: …, body: Vec<u8> }, Other(Url) }?

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Oct 16, 2016

Still need to check if its valid. Though the tokenizer does that already for unquoted uris

@emilio
Copy link
Member

@emilio emilio commented Oct 16, 2016

I have local patches to fix #13779 and share the result of url resolution. That makes the tokenizer the higher in the profile again, and now servo crashes due to an assertion on IPC-channel due to a too large font loaded in memory and passed via IPC I think.

I'll try to clean them up a bit and submit them today.

emilio added a commit to emilio/servo that referenced this issue Nov 16, 2016
…-url.

Still needs a bunch of code in net to be converted in order to get more
advantage of this for images and stuff, but meanwhile this should help quite a
bit with servo#13778
@emilio emilio mentioned this issue Nov 16, 2016
emilio added a commit to emilio/servo that referenced this issue Nov 16, 2016
…-url.

Still needs a bunch of code in net to be converted in order to get more
advantage of this for images and stuff, but meanwhile this should help quite a
bit with servo#13778
emilio added a commit to emilio/servo that referenced this issue Nov 16, 2016
…-url.

Still needs a bunch of code in net to be converted in order to get more
advantage of this for images and stuff, but meanwhile this should help quite a
bit with servo#13778
emilio added a commit to emilio/servo that referenced this issue Nov 17, 2016
…-url.

Still needs a bunch of code in net to be converted in order to get more
advantage of this for images and stuff, but meanwhile this should help quite a
bit with servo#13778
bors-servo added a commit that referenced this issue Nov 17, 2016
WIP: Urlmageddon

<!-- Please describe your changes on the following line: -->

Still needs a bunch of code in net to be converted in order to get more
advantage of this for images and stuff, but meanwhile this should help quite a
bit with #13778.

Still wanted to get this in.

r? @SimonSapin

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14246)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 17, 2016
WIP: Urlmageddon

<!-- Please describe your changes on the following line: -->

Still needs a bunch of code in net to be converted in order to get more
advantage of this for images and stuff, but meanwhile this should help quite a
bit with #13778.

Still wanted to get this in.

r? @SimonSapin

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14246)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 17, 2016
WIP: Urlmageddon

<!-- Please describe your changes on the following line: -->

Still needs a bunch of code in net to be converted in order to get more
advantage of this for images and stuff, but meanwhile this should help quite a
bit with #13778.

Still wanted to get this in.

r? @SimonSapin

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14246)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 17, 2016
Update rust-cssparser for speed improvements when tokenizing strings and data uris

<!-- Please describe your changes on the following line: -->

See #13778

r? @SimonSapin (or anyone else really).

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14245)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 17, 2016
WIP: Urlmageddon

<!-- Please describe your changes on the following line: -->

Still needs a bunch of code in net to be converted in order to get more
advantage of this for images and stuff, but meanwhile this should help quite a
bit with #13778.

Still wanted to get this in.

r? @SimonSapin

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14246)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 17, 2016
WIP: Urlmageddon

<!-- Please describe your changes on the following line: -->

Still needs a bunch of code in net to be converted in order to get more
advantage of this for images and stuff, but meanwhile this should help quite a
bit with #13778.

Still wanted to get this in.

r? @SimonSapin

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14246)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 17, 2016
WIP: Urlmageddon

<!-- Please describe your changes on the following line: -->

Still needs a bunch of code in net to be converted in order to get more
advantage of this for images and stuff, but meanwhile this should help quite a
bit with #13778.

Still wanted to get this in.

r? @SimonSapin

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14246)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 17, 2016
Update rust-cssparser for speed improvements when tokenizing strings and data uris

<!-- Please describe your changes on the following line: -->

See #13778

r? @SimonSapin (or anyone else really).

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14245)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 17, 2016
WIP: Urlmageddon

<!-- Please describe your changes on the following line: -->

Still needs a bunch of code in net to be converted in order to get more
advantage of this for images and stuff, but meanwhile this should help quite a
bit with #13778.

Still wanted to get this in.

r? @SimonSapin

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14246)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 17, 2016
WIP: Urlmageddon

<!-- Please describe your changes on the following line: -->

Still needs a bunch of code in net to be converted in order to get more
advantage of this for images and stuff, but meanwhile this should help quite a
bit with #13778.

Still wanted to get this in.

r? @SimonSapin

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14246)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 17, 2016
Urlmageddon

<!-- Please describe your changes on the following line: -->

Still needs a bunch of code in net to be converted in order to get more
advantage of this for images and stuff, but meanwhile this should help quite a
bit with #13778.

Still wanted to get this in.

r? @SimonSapin

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14246)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 17, 2016
Urlmageddon

<!-- Please describe your changes on the following line: -->

Still needs a bunch of code in net to be converted in order to get more
advantage of this for images and stuff, but meanwhile this should help quite a
bit with #13778.

Still wanted to get this in.

r? @SimonSapin

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14246)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 17, 2016
Urlmageddon

<!-- Please describe your changes on the following line: -->

Still needs a bunch of code in net to be converted in order to get more
advantage of this for images and stuff, but meanwhile this should help quite a
bit with #13778.

Still wanted to get this in.

r? @SimonSapin

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14246)
<!-- Reviewable:end -->
@jrmuizel
Copy link
Author

@jrmuizel jrmuizel commented Nov 21, 2016

Even after these changes we're still spending a lot of time in url::parser::Parser::parse_cannot_be_a_base_path.

We're also spending a bigger chunk of time in cssparser::tokenizer::consume_ident_like now.

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Nov 21, 2016

For unquoted URIs I guess we should try and skip to the next closing brace as fast as possible? What does Firefox do?

@emilio
Copy link
Member

@emilio emilio commented Nov 21, 2016

As far as I know Gecko also goes through the whole URI, but using a table to indicate which kind of byte is it instead of the whole branching we go through. See gLexTable: http://searchfox.org/mozilla-central/source/layout/style/nsCSSScanner.cpp#43

@emilio
Copy link
Member

@emilio emilio commented Nov 21, 2016

@jrmuizel are you sure we are expending more time in consume_ident_like? Or just more percentage of time than before?

The first seems really unlikely to me except if you're using debug assertions (in which case it'll verify the final token is valid utf8 in https://github.com/servo/rust-cssparser/pull/110/files#diff-0bd70f3585cde6d2d2aff02435db9070R844)

@jrmuizel
Copy link
Author

@jrmuizel jrmuizel commented Nov 21, 2016

@emilio It could just be a higher percentage of the time. i.e. we used to be spending a much larger amount of time in parse_cannot_be_a_base_path before and so I didn't notice consume_ident_like earlier.

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Feb 4, 2017
…en tokenizing strings and data uris (from emilio:cssparser); r=nox

<!-- Please describe your changes on the following line: -->

See servo/servo#13778

r? @SimonSapin (or anyone else really).

Source-Repo: https://github.com/servo/servo
Source-Revision: 843a25f9f4adf354d5291cb43ccf0f6e7c866b82
@nox
Copy link
Member

@nox nox commented Oct 5, 2017

The sample is gone and many parsing improvements were done as part of Stylo, closing this.

@nox nox closed this Oct 5, 2017
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Oct 1, 2019
…en tokenizing strings and data uris (from emilio:cssparser); r=nox

<!-- Please describe your changes on the following line: -->

See servo/servo#13778

r? SimonSapin (or anyone else really).

Source-Repo: https://github.com/servo/servo
Source-Revision: 843a25f9f4adf354d5291cb43ccf0f6e7c866b82

UltraBlame original commit: a170342a47705943efa35ebbfa4ff8f11fbe72fb
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Oct 1, 2019
…en tokenizing strings and data uris (from emilio:cssparser); r=nox

<!-- Please describe your changes on the following line: -->

See servo/servo#13778

r? SimonSapin (or anyone else really).

Source-Repo: https://github.com/servo/servo
Source-Revision: 843a25f9f4adf354d5291cb43ccf0f6e7c866b82

UltraBlame original commit: a170342a47705943efa35ebbfa4ff8f11fbe72fb
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Oct 1, 2019
…en tokenizing strings and data uris (from emilio:cssparser); r=nox

<!-- Please describe your changes on the following line: -->

See servo/servo#13778

r? SimonSapin (or anyone else really).

Source-Repo: https://github.com/servo/servo
Source-Revision: 843a25f9f4adf354d5291cb43ccf0f6e7c866b82

UltraBlame original commit: a170342a47705943efa35ebbfa4ff8f11fbe72fb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.