Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use escape_default() for strings in LitKind::token(). #50391

Merged
merged 2 commits into from
May 3, 2018

Conversation

nnethercote
Copy link
Contributor

@nnethercote nnethercote commented May 2, 2018

This avoids converting every char to \u{...} form, which bloats the
resulting strings unnecessarily. It also provides consistency with the
existing escape_default() calls in LitKind::token() used for raw
string literals, char literals, and raw byte char literals.

There are two benefits from this change.

  • Compilation is faster. Most of the rustc-perf benchmarks see a
    non-trivial speedup, particularly for incremental rebuilds, with the
    best speedup over 13%, and multiple others over 10%.

  • Generated rlibs are smaller. An extreme example is libfutures.rlib,
    which shrinks from 2073306 bytes to 1765927 bytes, a 15% reduction.

r? @jseyfried

Here are full numbers for all the rustc-perf runs where the improvement was > 1%.
regex-check
	avg: -11.1%	min: -13.4%	max: -5.5%
futures-check
	avg: -7.6%	min: -11.4%	max: -3.5%
futures-opt
	avg: -6.3%	min: -10.3%	max: -2.3%
futures
	avg: -6.6%	min: -10.3%	max: -2.8%
regex-opt
	avg: -4.7%	min: -10.2%	max: -0.4%
regex
	avg: -5.3%	min: -10.2%	max: -1.2%
hyper-check
	avg: -4.8%	min: -6.6%	max: -2.7%
encoding-check
	avg: -4.1%	min: -5.5%	max: -2.5%
issue-46449-check
	avg: -4.7%	min: -5.2%	max: -4.1%
clap-rs-check
	avg: -2.9%	min: -5.2%	max: -1.1%
hyper
	avg: -3.0%	min: -5.1%	max: -0.8%
parser-check
	avg: -4.2%	min: -4.9%	max: -3.2%
hyper-opt
	avg: -2.6%	min: -4.9%	max: -0.3%
encoding-opt
	avg: -2.3%	min: -4.6%	max: -0.5%
encoding
	avg: -2.5%	min: -4.4%	max: -0.6%
issue-46449
	avg: -2.3%	min: -4.4%	max: -1.8%
issue-46449-opt
	avg: -1.7%	min: -4.3%	max: -0.9%
clap-rs-opt
	avg: -1.6%	min: -4.2%	max: -0.2%
serde-check
	avg: -1.4%	min: -4.1%	max: -0.2%
clap-rs
	avg: -1.6%	min: -3.9%	max: -0.7%
unify-linearly-check
	avg: -3.2%	min: -3.7%	max: -2.7%
serde
	avg: -1.1%	min: -3.5%	max: -0.1%
regression-31157-check
	avg: -2.6%	min: -3.4%	max: -1.6%
helloworld-check
	avg: -2.5%	min: -3.4%	max: -0.6%
serde-opt
	avg: -1.3%	min: -3.3%	max: -0.5%
tokio-webpush-simple-check
	avg: -2.4%	min: -3.2%	max: -1.8%
piston-image-check
	avg: -1.7%	min: -3.2%	max: -0.9%
deeply-nested-opt
	avg: -1.5%	min: -3.0%	max: -0.6%
deeply-nested-check
	avg: -1.9%	min: -2.9%	max: -0.4%
deeply-nested
	avg: -1.9%	min: -2.9%	max: -1.2%
syn-check
	avg: -1.8%	min: -2.8%	max: -0.6%
coercions
	avg: -0.5%	min: -2.8%	max: 0.4%
syn-opt
	avg: -0.9%	min: -2.4%	max: -0.1%
syn
	avg: -1.1%	min: -2.2%	max: -0.3%
parser-opt
	avg: -1.9%	min: -2.1%	max: -1.6%
parser
	avg: -1.9%	min: -2.1%	max: -1.6%
style-servo-check
	avg: -1.3%	min: -2.0%	max: -0.8%
regression-31157-opt
	avg: -0.8%	min: -2.0%	max: 0.0%
piston-image
	avg: -0.7%	min: -1.8%	max: -0.2%
piston-image-opt
	avg: -0.6%	min: -1.8%	max: -0.0%
regression-31157
	avg: -1.0%	min: -1.7%	max: -0.3%
html5ever-opt
	avg: -0.6%	min: -1.5%	max: -0.1%
unify-linearly-opt
	avg: -1.3%	min: -1.5%	max: -1.1%
unify-linearly
	avg: -1.3%	min: -1.4%	max: -1.2%
tokio-webpush-simple-opt
	avg: -0.4%	min: -1.2%	max: -0.0%
helloworld-opt
	avg: -1.0%	min: -1.1%	max: -0.6%
helloworld
	avg: -1.0%	min: -1.1%	max: -0.7%
inflate-opt
	avg: -0.3%	min: -1.1%	max: 0.1%
html5ever-check
	avg: -0.6%	min: -1.0%	max: -0.3%
inflate-check
	avg: -0.3%	min: -1.0%	max: -0.1%

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 2, 2018
@@ -1230,7 +1230,7 @@ impl LitKind {
LitKind::Str(string, ast::StrStyle::Cooked) => {
let mut escaped = String::new();
for ch in string.as_str().chars() {
escaped.extend(ch.escape_unicode());
escaped.extend(ch.escape_default());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use str::escape_default nowadays.

let escaped = string.escape_default();

(There is also a fn escape_default(s: &str) -> String in src/libsyntax/parse.rs but that should probably be removed in favor of the std one.)

However, I think only \r, \, and " really needs to be escaped. All other characters can legally appear inside a string. This may provide even more speed up since we don't need the UTF-8 encode/decode step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this function is not the hot one. str_lit(), which parses the string produced here, is the hot function. So str::escape_default() might be more concise, but it's unlikely to make things faster.

Also, in practice these strings are all rustdoc comments, and escaped chars are rare. So it also doesn't matter much exactly which characters we escape.

@kennytm kennytm added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label May 2, 2018
@eddyb
Copy link
Member

eddyb commented May 2, 2018

r=me with @kennytm's suggestion

@kennytm kennytm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 2, 2018
@nnethercote
Copy link
Contributor Author

@eddyb: I switched to str::escape_default. I also added a second commit that removes parse::escape_default.

@kennytm kennytm added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 2, 2018
for ch in string.as_str().chars() {
escaped.extend(ch.escape_unicode());
}
let mut escaped = string.as_str().escape_default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mut is no longer needed.

This avoids converting every char to \u{...} form, which bloats the
resulting strings unnecessarily. It also provides consistency with the
existing escape_default() calls in LitKind::token() used for raw
string literals, char literals, and raw byte char literals.

There are two benefits from this change.

- Compilation is faster. Most of the rustc-perf benchmarks see a
  non-trivial speedup, particularly for incremental rebuilds, with the
  best speedup over 13%, and multiple others over 10%.

- Generated rlibs are smaller. An extreme example is libfutures.rlib,
  which shrinks from 2073306 bytes to 1765927 bytes, a 15% reduction.
str::escape_default() can be used instead.
@nnethercote
Copy link
Contributor Author

I removed the unnecessary mut. (I'm surprised rustc didn't complain about it...)

@kennytm
Copy link
Member

kennytm commented May 3, 2018

@bors r=eddyb

@bors
Copy link
Contributor

bors commented May 3, 2018

📌 Commit 7a56360 has been approved by eddyb

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 3, 2018
@bors
Copy link
Contributor

bors commented May 3, 2018

⌛ Testing commit 7a56360 with merge 698b956...

bors added a commit that referenced this pull request May 3, 2018
Use escape_default() for strings in LitKind::token().

This avoids converting every char to \u{...} form, which bloats the
resulting strings unnecessarily. It also provides consistency with the
existing escape_default() calls in LitKind::token() used for raw
string literals, char literals, and raw byte char literals.

There are two benefits from this change.

- Compilation is faster. Most of the rustc-perf benchmarks see a
  non-trivial speedup, particularly for incremental rebuilds, with the
  best speedup over 13%, and multiple others over 10%.

- Generated rlibs are smaller. An extreme example is libfutures.rlib,
  which shrinks from 2073306 bytes to 1765927 bytes, a 15% reduction.

r? @jseyfried

<details><summary>Here are full numbers for all the rustc-perf runs where the improvement was > 1%.</summary>

```
regex-check
	avg: -11.1%	min: -13.4%	max: -5.5%
futures-check
	avg: -7.6%	min: -11.4%	max: -3.5%
futures-opt
	avg: -6.3%	min: -10.3%	max: -2.3%
futures
	avg: -6.6%	min: -10.3%	max: -2.8%
regex-opt
	avg: -4.7%	min: -10.2%	max: -0.4%
regex
	avg: -5.3%	min: -10.2%	max: -1.2%
hyper-check
	avg: -4.8%	min: -6.6%	max: -2.7%
encoding-check
	avg: -4.1%	min: -5.5%	max: -2.5%
issue-46449-check
	avg: -4.7%	min: -5.2%	max: -4.1%
clap-rs-check
	avg: -2.9%	min: -5.2%	max: -1.1%
hyper
	avg: -3.0%	min: -5.1%	max: -0.8%
parser-check
	avg: -4.2%	min: -4.9%	max: -3.2%
hyper-opt
	avg: -2.6%	min: -4.9%	max: -0.3%
encoding-opt
	avg: -2.3%	min: -4.6%	max: -0.5%
encoding
	avg: -2.5%	min: -4.4%	max: -0.6%
issue-46449
	avg: -2.3%	min: -4.4%	max: -1.8%
issue-46449-opt
	avg: -1.7%	min: -4.3%	max: -0.9%
clap-rs-opt
	avg: -1.6%	min: -4.2%	max: -0.2%
serde-check
	avg: -1.4%	min: -4.1%	max: -0.2%
clap-rs
	avg: -1.6%	min: -3.9%	max: -0.7%
unify-linearly-check
	avg: -3.2%	min: -3.7%	max: -2.7%
serde
	avg: -1.1%	min: -3.5%	max: -0.1%
regression-31157-check
	avg: -2.6%	min: -3.4%	max: -1.6%
helloworld-check
	avg: -2.5%	min: -3.4%	max: -0.6%
serde-opt
	avg: -1.3%	min: -3.3%	max: -0.5%
tokio-webpush-simple-check
	avg: -2.4%	min: -3.2%	max: -1.8%
piston-image-check
	avg: -1.7%	min: -3.2%	max: -0.9%
deeply-nested-opt
	avg: -1.5%	min: -3.0%	max: -0.6%
deeply-nested-check
	avg: -1.9%	min: -2.9%	max: -0.4%
deeply-nested
	avg: -1.9%	min: -2.9%	max: -1.2%
syn-check
	avg: -1.8%	min: -2.8%	max: -0.6%
coercions
	avg: -0.5%	min: -2.8%	max: 0.4%
syn-opt
	avg: -0.9%	min: -2.4%	max: -0.1%
syn
	avg: -1.1%	min: -2.2%	max: -0.3%
parser-opt
	avg: -1.9%	min: -2.1%	max: -1.6%
parser
	avg: -1.9%	min: -2.1%	max: -1.6%
style-servo-check
	avg: -1.3%	min: -2.0%	max: -0.8%
regression-31157-opt
	avg: -0.8%	min: -2.0%	max: 0.0%
piston-image
	avg: -0.7%	min: -1.8%	max: -0.2%
piston-image-opt
	avg: -0.6%	min: -1.8%	max: -0.0%
regression-31157
	avg: -1.0%	min: -1.7%	max: -0.3%
html5ever-opt
	avg: -0.6%	min: -1.5%	max: -0.1%
unify-linearly-opt
	avg: -1.3%	min: -1.5%	max: -1.1%
unify-linearly
	avg: -1.3%	min: -1.4%	max: -1.2%
tokio-webpush-simple-opt
	avg: -0.4%	min: -1.2%	max: -0.0%
helloworld-opt
	avg: -1.0%	min: -1.1%	max: -0.6%
helloworld
	avg: -1.0%	min: -1.1%	max: -0.7%
inflate-opt
	avg: -0.3%	min: -1.1%	max: 0.1%
html5ever-check
	avg: -0.6%	min: -1.0%	max: -0.3%
inflate-check
	avg: -0.3%	min: -1.0%	max: -0.1%
```

</details>
@bors
Copy link
Contributor

bors commented May 3, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: eddyb
Pushing 698b956 to master...

@bors bors merged commit 7a56360 into rust-lang:master May 3, 2018
@nnethercote nnethercote deleted the escape_unicode branch May 4, 2018 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants