Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Use escape_default() for strings in LitKind::token(). #50391
Conversation
| @@ -1230,7 +1230,7 @@ impl LitKind { | |||
| LitKind::Str(string, ast::StrStyle::Cooked) => { | |||
| let mut escaped = String::new(); | |||
| for ch in string.as_str().chars() { | |||
| escaped.extend(ch.escape_unicode()); | |||
| escaped.extend(ch.escape_default()); | |||
| } | |||
kennytm
May 2, 2018
Member
We could use str::escape_default nowadays.
let escaped = string.escape_default();
(There is also a fn escape_default(s: &str) -> String in src/libsyntax/parse.rs but that should probably be removed in favor of the std one.)
However, I think only \r, \, and " really needs to be escaped. All other characters can legally appear inside a string. This may provide even more speed up since we don't need the UTF-8 encode/decode step.
We could use str::escape_default nowadays.
let escaped = string.escape_default();(There is also a fn escape_default(s: &str) -> String in src/libsyntax/parse.rs but that should probably be removed in favor of the std one.)
However, I think only \r, \, and " really needs to be escaped. All other characters can legally appear inside a string. This may provide even more speed up since we don't need the UTF-8 encode/decode step.
nnethercote
May 2, 2018
Author
Contributor
Note that this function is not the hot one. str_lit(), which parses the string produced here, is the hot function. So str::escape_default() might be more concise, but it's unlikely to make things faster.
Also, in practice these strings are all rustdoc comments, and escaped chars are rare. So it also doesn't matter much exactly which characters we escape.
Note that this function is not the hot one. str_lit(), which parses the string produced here, is the hot function. So str::escape_default() might be more concise, but it's unlikely to make things faster.
Also, in practice these strings are all rustdoc comments, and escaped chars are rare. So it also doesn't matter much exactly which characters we escape.
|
r=me with @kennytm's suggestion |
|
@eddyb: I switched to |
| for ch in string.as_str().chars() { | ||
| escaped.extend(ch.escape_unicode()); | ||
| } | ||
| let mut escaped = string.as_str().escape_default(); |
kennytm
May 2, 2018
Member
mut is no longer needed.
mut is no longer needed.
This avoids converting every char to \u{...} form, which bloats the
resulting strings unnecessarily. It also provides consistency with the
existing escape_default() calls in LitKind::token() used for raw
string literals, char literals, and raw byte char literals.
There are two benefits from this change.
- Compilation is faster. Most of the rustc-perf benchmarks see a
non-trivial speedup, particularly for incremental rebuilds, with the
best speedup over 13%, and multiple others over 10%.
- Generated rlibs are smaller. An extreme example is libfutures.rlib,
which shrinks from 2073306 bytes to 1765927 bytes, a 15% reduction.
str::escape_default() can be used instead.
|
I removed the unnecessary |
|
@bors r=eddyb |
|
|
Use escape_default() for strings in LitKind::token().
This avoids converting every char to \u{...} form, which bloats the
resulting strings unnecessarily. It also provides consistency with the
existing escape_default() calls in LitKind::token() used for raw
string literals, char literals, and raw byte char literals.
There are two benefits from this change.
- Compilation is faster. Most of the rustc-perf benchmarks see a
non-trivial speedup, particularly for incremental rebuilds, with the
best speedup over 13%, and multiple others over 10%.
- Generated rlibs are smaller. An extreme example is libfutures.rlib,
which shrinks from 2073306 bytes to 1765927 bytes, a 15% reduction.
r? @jseyfried
<details><summary>Here are full numbers for all the rustc-perf runs where the improvement was > 1%.</summary>
```
regex-check
avg: -11.1% min: -13.4% max: -5.5%
futures-check
avg: -7.6% min: -11.4% max: -3.5%
futures-opt
avg: -6.3% min: -10.3% max: -2.3%
futures
avg: -6.6% min: -10.3% max: -2.8%
regex-opt
avg: -4.7% min: -10.2% max: -0.4%
regex
avg: -5.3% min: -10.2% max: -1.2%
hyper-check
avg: -4.8% min: -6.6% max: -2.7%
encoding-check
avg: -4.1% min: -5.5% max: -2.5%
issue-46449-check
avg: -4.7% min: -5.2% max: -4.1%
clap-rs-check
avg: -2.9% min: -5.2% max: -1.1%
hyper
avg: -3.0% min: -5.1% max: -0.8%
parser-check
avg: -4.2% min: -4.9% max: -3.2%
hyper-opt
avg: -2.6% min: -4.9% max: -0.3%
encoding-opt
avg: -2.3% min: -4.6% max: -0.5%
encoding
avg: -2.5% min: -4.4% max: -0.6%
issue-46449
avg: -2.3% min: -4.4% max: -1.8%
issue-46449-opt
avg: -1.7% min: -4.3% max: -0.9%
clap-rs-opt
avg: -1.6% min: -4.2% max: -0.2%
serde-check
avg: -1.4% min: -4.1% max: -0.2%
clap-rs
avg: -1.6% min: -3.9% max: -0.7%
unify-linearly-check
avg: -3.2% min: -3.7% max: -2.7%
serde
avg: -1.1% min: -3.5% max: -0.1%
regression-31157-check
avg: -2.6% min: -3.4% max: -1.6%
helloworld-check
avg: -2.5% min: -3.4% max: -0.6%
serde-opt
avg: -1.3% min: -3.3% max: -0.5%
tokio-webpush-simple-check
avg: -2.4% min: -3.2% max: -1.8%
piston-image-check
avg: -1.7% min: -3.2% max: -0.9%
deeply-nested-opt
avg: -1.5% min: -3.0% max: -0.6%
deeply-nested-check
avg: -1.9% min: -2.9% max: -0.4%
deeply-nested
avg: -1.9% min: -2.9% max: -1.2%
syn-check
avg: -1.8% min: -2.8% max: -0.6%
coercions
avg: -0.5% min: -2.8% max: 0.4%
syn-opt
avg: -0.9% min: -2.4% max: -0.1%
syn
avg: -1.1% min: -2.2% max: -0.3%
parser-opt
avg: -1.9% min: -2.1% max: -1.6%
parser
avg: -1.9% min: -2.1% max: -1.6%
style-servo-check
avg: -1.3% min: -2.0% max: -0.8%
regression-31157-opt
avg: -0.8% min: -2.0% max: 0.0%
piston-image
avg: -0.7% min: -1.8% max: -0.2%
piston-image-opt
avg: -0.6% min: -1.8% max: -0.0%
regression-31157
avg: -1.0% min: -1.7% max: -0.3%
html5ever-opt
avg: -0.6% min: -1.5% max: -0.1%
unify-linearly-opt
avg: -1.3% min: -1.5% max: -1.1%
unify-linearly
avg: -1.3% min: -1.4% max: -1.2%
tokio-webpush-simple-opt
avg: -0.4% min: -1.2% max: -0.0%
helloworld-opt
avg: -1.0% min: -1.1% max: -0.6%
helloworld
avg: -1.0% min: -1.1% max: -0.7%
inflate-opt
avg: -0.3% min: -1.1% max: 0.1%
html5ever-check
avg: -0.6% min: -1.0% max: -0.3%
inflate-check
avg: -0.3% min: -1.0% max: -0.1%
```
</details>
|
|
This avoids converting every char to \u{...} form, which bloats the
resulting strings unnecessarily. It also provides consistency with the
existing escape_default() calls in LitKind::token() used for raw
string literals, char literals, and raw byte char literals.
There are two benefits from this change.
Compilation is faster. Most of the rustc-perf benchmarks see a
non-trivial speedup, particularly for incremental rebuilds, with the
best speedup over 13%, and multiple others over 10%.
Generated rlibs are smaller. An extreme example is libfutures.rlib,
which shrinks from 2073306 bytes to 1765927 bytes, a 15% reduction.
r? @jseyfried
Here are full numbers for all the rustc-perf runs where the improvement was > 1%.