Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

script: Stop reallocating so much when converting DOM strings to JS values. #7765

Merged

Conversation

@pcwalton
Copy link
Contributor

pcwalton commented Sep 27, 2015

This is split out from #6900.

size_hint() in utf16_units() seems busted, so we do it ourselves.

r? @jdm

Review on Reviewable

values.

`size_hint()` in `utf16_units()` seems busted, so we do it ourselves.
@jdm
Copy link
Member

jdm commented Sep 27, 2015

@bors-servo
Copy link
Contributor

bors-servo commented Sep 27, 2015

📌 Commit f2f9188 has been approved by jdm

@bors-servo
Copy link
Contributor

bors-servo commented Sep 27, 2015

Testing commit f2f9188 with merge 7a3d2c8...

bors-servo pushed a commit that referenced this pull request Sep 27, 2015
…it, r=jdm

script: Stop reallocating so much when converting DOM strings to JS values.

This is split out from #6900.

`size_hint()` in `utf16_units()` seems busted, so we do it ourselves.

r? @jdm

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/servo/7765)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Sep 28, 2015

@bors-servo bors-servo merged commit f2f9188 into servo:master Sep 28, 2015
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
@Ms2ger
Copy link
Contributor

Ms2ger commented Sep 28, 2015

Can we get a bug filed on the size hint? On first sight, it looks correctly implemented.

@SimonSapin
Copy link
Member

SimonSapin commented Sep 28, 2015

The size_hint of Utf16Units is based on that of the (generic) underlying char iterator, keeping the lower bound unchanged (for all-BMP code points) and doubling the upper bound (for all-non-BMP code points). Meanwhile, str::Chars::size_hint uses the UTF-8 length for its upper bound (all ASCII) and divides it by four for its lower bound (all four bytes UTF-8 sequences).

Since Vec::extends pre-allocates for the lower bound and ignores the upper bound, we get one u16 for every four UTF-8 bytes, which is four times too little if the input is all ASCII.

Since the range of code points that take 4 UTF-8 bytes happens to be the same as those taking two UTF-16 units (non-BMP), with specialization, we could tighten size_hint a bit so that the lower bound is one UTF-16 unit for every three bytes UTF-8, but that’s only marginally better.

The real problem here is that Vec::extend uses the lower bound from size_hint. Would it be better to use the upper bound when available? (Trading off more memory use for less re-allocations.) Or should size_hint also provide an "expected" size for the average case, that doesn’t have to be as conservative as the lower bound for the best/worst case?

CC @alexcrichton @aturon

@SimonSapin
Copy link
Member

SimonSapin commented Sep 28, 2015

Also CC @gankro

@Gankra
Copy link
Contributor

Gankra commented Sep 28, 2015

Using the upper bound would likely cause many programs to waste extreme amounts of memory, to the point of even OOMing on preposterous bounds. This just an instance of "inherent complexity". We can't magically know what you want the hint for, nor what kind of reallocation strategy you're interested in. If you know better, ya gotta reserve and extend.

Using lower_bound unconditionally seems like the most well-behaved option. The standard library will always be suboptimal since we're generic, and therefore cannot leverage specific data properties (other than reasoning about "common" applications).

@Gankra
Copy link
Contributor

Gankra commented Sep 28, 2015

(IMHO, upper bound is basically useless in a generic context)

@SimonSapin
Copy link
Member

SimonSapin commented Sep 28, 2015

Alright, sounds reasonable.

Based on this comment and IRC discussion it sounds like there is no std change to make here and using with_capacity is the right thing to do when we can make a better guess at the expected size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants
You can’t perform that action at this time.