New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use escape_default() for strings in LitKind::token(). #50391

Merged
merged 2 commits into from May 3, 2018

Conversation

Projects
None yet
6 participants
@nnethercote
Contributor

nnethercote commented May 2, 2018

This avoids converting every char to \u{...} form, which bloats the
resulting strings unnecessarily. It also provides consistency with the
existing escape_default() calls in LitKind::token() used for raw
string literals, char literals, and raw byte char literals.

There are two benefits from this change.

  • Compilation is faster. Most of the rustc-perf benchmarks see a
    non-trivial speedup, particularly for incremental rebuilds, with the
    best speedup over 13%, and multiple others over 10%.

  • Generated rlibs are smaller. An extreme example is libfutures.rlib,
    which shrinks from 2073306 bytes to 1765927 bytes, a 15% reduction.

r? @jseyfried

Here are full numbers for all the rustc-perf runs where the improvement was > 1%.
regex-check
	avg: -11.1%	min: -13.4%	max: -5.5%
futures-check
	avg: -7.6%	min: -11.4%	max: -3.5%
futures-opt
	avg: -6.3%	min: -10.3%	max: -2.3%
futures
	avg: -6.6%	min: -10.3%	max: -2.8%
regex-opt
	avg: -4.7%	min: -10.2%	max: -0.4%
regex
	avg: -5.3%	min: -10.2%	max: -1.2%
hyper-check
	avg: -4.8%	min: -6.6%	max: -2.7%
encoding-check
	avg: -4.1%	min: -5.5%	max: -2.5%
issue-46449-check
	avg: -4.7%	min: -5.2%	max: -4.1%
clap-rs-check
	avg: -2.9%	min: -5.2%	max: -1.1%
hyper
	avg: -3.0%	min: -5.1%	max: -0.8%
parser-check
	avg: -4.2%	min: -4.9%	max: -3.2%
hyper-opt
	avg: -2.6%	min: -4.9%	max: -0.3%
encoding-opt
	avg: -2.3%	min: -4.6%	max: -0.5%
encoding
	avg: -2.5%	min: -4.4%	max: -0.6%
issue-46449
	avg: -2.3%	min: -4.4%	max: -1.8%
issue-46449-opt
	avg: -1.7%	min: -4.3%	max: -0.9%
clap-rs-opt
	avg: -1.6%	min: -4.2%	max: -0.2%
serde-check
	avg: -1.4%	min: -4.1%	max: -0.2%
clap-rs
	avg: -1.6%	min: -3.9%	max: -0.7%
unify-linearly-check
	avg: -3.2%	min: -3.7%	max: -2.7%
serde
	avg: -1.1%	min: -3.5%	max: -0.1%
regression-31157-check
	avg: -2.6%	min: -3.4%	max: -1.6%
helloworld-check
	avg: -2.5%	min: -3.4%	max: -0.6%
serde-opt
	avg: -1.3%	min: -3.3%	max: -0.5%
tokio-webpush-simple-check
	avg: -2.4%	min: -3.2%	max: -1.8%
piston-image-check
	avg: -1.7%	min: -3.2%	max: -0.9%
deeply-nested-opt
	avg: -1.5%	min: -3.0%	max: -0.6%
deeply-nested-check
	avg: -1.9%	min: -2.9%	max: -0.4%
deeply-nested
	avg: -1.9%	min: -2.9%	max: -1.2%
syn-check
	avg: -1.8%	min: -2.8%	max: -0.6%
coercions
	avg: -0.5%	min: -2.8%	max: 0.4%
syn-opt
	avg: -0.9%	min: -2.4%	max: -0.1%
syn
	avg: -1.1%	min: -2.2%	max: -0.3%
parser-opt
	avg: -1.9%	min: -2.1%	max: -1.6%
parser
	avg: -1.9%	min: -2.1%	max: -1.6%
style-servo-check
	avg: -1.3%	min: -2.0%	max: -0.8%
regression-31157-opt
	avg: -0.8%	min: -2.0%	max: 0.0%
piston-image
	avg: -0.7%	min: -1.8%	max: -0.2%
piston-image-opt
	avg: -0.6%	min: -1.8%	max: -0.0%
regression-31157
	avg: -1.0%	min: -1.7%	max: -0.3%
html5ever-opt
	avg: -0.6%	min: -1.5%	max: -0.1%
unify-linearly-opt
	avg: -1.3%	min: -1.5%	max: -1.1%
unify-linearly
	avg: -1.3%	min: -1.4%	max: -1.2%
tokio-webpush-simple-opt
	avg: -0.4%	min: -1.2%	max: -0.0%
helloworld-opt
	avg: -1.0%	min: -1.1%	max: -0.6%
helloworld
	avg: -1.0%	min: -1.1%	max: -0.7%
inflate-opt
	avg: -0.3%	min: -1.1%	max: 0.1%
html5ever-check
	avg: -0.6%	min: -1.0%	max: -0.3%
inflate-check
	avg: -0.3%	min: -1.0%	max: -0.1%
@@ -1230,7 +1230,7 @@ impl LitKind {
LitKind::Str(string, ast::StrStyle::Cooked) => {
let mut escaped = String::new();
for ch in string.as_str().chars() {
escaped.extend(ch.escape_unicode());
escaped.extend(ch.escape_default());
}

This comment has been minimized.

@kennytm

kennytm May 2, 2018

Member

We could use str::escape_default nowadays.

let escaped = string.escape_default();

(There is also a fn escape_default(s: &str) -> String in src/libsyntax/parse.rs but that should probably be removed in favor of the std one.)

However, I think only \r, \, and " really needs to be escaped. All other characters can legally appear inside a string. This may provide even more speed up since we don't need the UTF-8 encode/decode step.

This comment has been minimized.

@nnethercote

nnethercote May 2, 2018

Contributor

Note that this function is not the hot one. str_lit(), which parses the string produced here, is the hot function. So str::escape_default() might be more concise, but it's unlikely to make things faster.

Also, in practice these strings are all rustdoc comments, and escaped chars are rare. So it also doesn't matter much exactly which characters we escape.

@kennytm kennytm added the T-compiler label May 2, 2018

@eddyb

This comment has been minimized.

Member

eddyb commented May 2, 2018

r=me with @kennytm's suggestion

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented May 2, 2018

@eddyb: I switched to str::escape_default. I also added a second commit that removes parse::escape_default.

for ch in string.as_str().chars() {
escaped.extend(ch.escape_unicode());
}
let mut escaped = string.as_str().escape_default();

This comment has been minimized.

@kennytm

kennytm May 2, 2018

Member

mut is no longer needed.

nnethercote added some commits May 2, 2018

Use escape_default() for strings in LitKind::token().
This avoids converting every char to \u{...} form, which bloats the
resulting strings unnecessarily. It also provides consistency with the
existing escape_default() calls in LitKind::token() used for raw
string literals, char literals, and raw byte char literals.

There are two benefits from this change.

- Compilation is faster. Most of the rustc-perf benchmarks see a
  non-trivial speedup, particularly for incremental rebuilds, with the
  best speedup over 13%, and multiple others over 10%.

- Generated rlibs are smaller. An extreme example is libfutures.rlib,
  which shrinks from 2073306 bytes to 1765927 bytes, a 15% reduction.
Remove parse::escape_default().
str::escape_default() can be used instead.
@nnethercote

This comment has been minimized.

Contributor

nnethercote commented May 3, 2018

I removed the unnecessary mut. (I'm surprised rustc didn't complain about it...)

@kennytm

This comment has been minimized.

Member

kennytm commented May 3, 2018

@bors r=eddyb

@bors

This comment has been minimized.

Contributor

bors commented May 3, 2018

📌 Commit 7a56360 has been approved by eddyb

@bors

This comment has been minimized.

Contributor

bors commented May 3, 2018

⌛️ Testing commit 7a56360 with merge 698b956...

bors added a commit that referenced this pull request May 3, 2018

Auto merge of #50391 - nnethercote:escape_unicode, r=eddyb
Use escape_default() for strings in LitKind::token().

This avoids converting every char to \u{...} form, which bloats the
resulting strings unnecessarily. It also provides consistency with the
existing escape_default() calls in LitKind::token() used for raw
string literals, char literals, and raw byte char literals.

There are two benefits from this change.

- Compilation is faster. Most of the rustc-perf benchmarks see a
  non-trivial speedup, particularly for incremental rebuilds, with the
  best speedup over 13%, and multiple others over 10%.

- Generated rlibs are smaller. An extreme example is libfutures.rlib,
  which shrinks from 2073306 bytes to 1765927 bytes, a 15% reduction.

r? @jseyfried

<details><summary>Here are full numbers for all the rustc-perf runs where the improvement was > 1%.</summary>

```
regex-check
	avg: -11.1%	min: -13.4%	max: -5.5%
futures-check
	avg: -7.6%	min: -11.4%	max: -3.5%
futures-opt
	avg: -6.3%	min: -10.3%	max: -2.3%
futures
	avg: -6.6%	min: -10.3%	max: -2.8%
regex-opt
	avg: -4.7%	min: -10.2%	max: -0.4%
regex
	avg: -5.3%	min: -10.2%	max: -1.2%
hyper-check
	avg: -4.8%	min: -6.6%	max: -2.7%
encoding-check
	avg: -4.1%	min: -5.5%	max: -2.5%
issue-46449-check
	avg: -4.7%	min: -5.2%	max: -4.1%
clap-rs-check
	avg: -2.9%	min: -5.2%	max: -1.1%
hyper
	avg: -3.0%	min: -5.1%	max: -0.8%
parser-check
	avg: -4.2%	min: -4.9%	max: -3.2%
hyper-opt
	avg: -2.6%	min: -4.9%	max: -0.3%
encoding-opt
	avg: -2.3%	min: -4.6%	max: -0.5%
encoding
	avg: -2.5%	min: -4.4%	max: -0.6%
issue-46449
	avg: -2.3%	min: -4.4%	max: -1.8%
issue-46449-opt
	avg: -1.7%	min: -4.3%	max: -0.9%
clap-rs-opt
	avg: -1.6%	min: -4.2%	max: -0.2%
serde-check
	avg: -1.4%	min: -4.1%	max: -0.2%
clap-rs
	avg: -1.6%	min: -3.9%	max: -0.7%
unify-linearly-check
	avg: -3.2%	min: -3.7%	max: -2.7%
serde
	avg: -1.1%	min: -3.5%	max: -0.1%
regression-31157-check
	avg: -2.6%	min: -3.4%	max: -1.6%
helloworld-check
	avg: -2.5%	min: -3.4%	max: -0.6%
serde-opt
	avg: -1.3%	min: -3.3%	max: -0.5%
tokio-webpush-simple-check
	avg: -2.4%	min: -3.2%	max: -1.8%
piston-image-check
	avg: -1.7%	min: -3.2%	max: -0.9%
deeply-nested-opt
	avg: -1.5%	min: -3.0%	max: -0.6%
deeply-nested-check
	avg: -1.9%	min: -2.9%	max: -0.4%
deeply-nested
	avg: -1.9%	min: -2.9%	max: -1.2%
syn-check
	avg: -1.8%	min: -2.8%	max: -0.6%
coercions
	avg: -0.5%	min: -2.8%	max: 0.4%
syn-opt
	avg: -0.9%	min: -2.4%	max: -0.1%
syn
	avg: -1.1%	min: -2.2%	max: -0.3%
parser-opt
	avg: -1.9%	min: -2.1%	max: -1.6%
parser
	avg: -1.9%	min: -2.1%	max: -1.6%
style-servo-check
	avg: -1.3%	min: -2.0%	max: -0.8%
regression-31157-opt
	avg: -0.8%	min: -2.0%	max: 0.0%
piston-image
	avg: -0.7%	min: -1.8%	max: -0.2%
piston-image-opt
	avg: -0.6%	min: -1.8%	max: -0.0%
regression-31157
	avg: -1.0%	min: -1.7%	max: -0.3%
html5ever-opt
	avg: -0.6%	min: -1.5%	max: -0.1%
unify-linearly-opt
	avg: -1.3%	min: -1.5%	max: -1.1%
unify-linearly
	avg: -1.3%	min: -1.4%	max: -1.2%
tokio-webpush-simple-opt
	avg: -0.4%	min: -1.2%	max: -0.0%
helloworld-opt
	avg: -1.0%	min: -1.1%	max: -0.6%
helloworld
	avg: -1.0%	min: -1.1%	max: -0.7%
inflate-opt
	avg: -0.3%	min: -1.1%	max: 0.1%
html5ever-check
	avg: -0.6%	min: -1.0%	max: -0.3%
inflate-check
	avg: -0.3%	min: -1.0%	max: -0.1%
```

</details>
@bors

This comment has been minimized.

Contributor

bors commented May 3, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: eddyb
Pushing 698b956 to master...

@bors bors merged commit 7a56360 into rust-lang:master May 3, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@nnethercote nnethercote deleted the nnethercote:escape_unicode branch May 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment