Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a fast-path to Debug ASCII &str #121150

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Swatinem
Copy link
Contributor

Instead of going through the EscapeDebug machinery, we can just skip over ASCII chars that don’t need any escaping.


This is an alternative / a companion to #121138.

The other PR is adding the fast path deep within EscapeDebug, whereas this skips as early as possible.

Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping.
@rustbot
Copy link
Collaborator

rustbot commented Feb 15, 2024

r? @cuviper

rustbot has assigned @cuviper.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 15, 2024
@@ -2340,6 +2340,11 @@ impl Debug for str {
f.write_char('"')?;
let mut from = 0;
for (i, c) in self.char_indices() {
// a fast path for ASCII chars that do not need escapes:
if matches!(c, ' '..='~') && !matches!(c, '\\' | '\"') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use https://doc.rust-lang.org/std/primitive.char.html#method.is_ascii_graphic or some combination of other methods rather than hard-coding this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_ascii_graphic does not include space, and I don’t think \ or " are part of any of the other methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's escape_ascii but it escapes more than those two.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop doesn't look like it would vectorize or even unroll well. For a string of chars that don't need escaping it might be worth processing the utf8 bytes directly like EscapeAscii

@Swatinem
Copy link
Contributor Author

While we are still bikeshedding the implementation details, I would appreciate a benchmark run. Not sure if the compiler performance testsuite will show any change, in a micro-benchmark, I was able to get a 10x improvement for a pure-ASCII string, and even a tiny improvement in the "pure"-Unicode case (because even a "pure" unicode sentence still contains ASCII spaces)

@the8472
Copy link
Member

the8472 commented Feb 18, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 18, 2024
@bors
Copy link
Contributor

bors commented Feb 18, 2024

⌛ Trying commit 04602a2 with merge d3c44b1...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 18, 2024
Add a fast-path to `Debug` ASCII `&str`

Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping.

---

This is an alternative / a companion to rust-lang#121138.

The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
@bors
Copy link
Contributor

bors commented Feb 18, 2024

☀️ Try build successful - checks-actions
Build commit: d3c44b1 (d3c44b181fb6776a28574e754c3db10da6172bd9)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d3c44b1): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.4% [1.4%, 1.4%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.8% [1.8%, 1.8%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 640.535s -> 641.862s (0.21%)
Artifact size: 308.83 MiB -> 308.80 MiB (-0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 18, 2024
@Swatinem
Copy link
Contributor Author

Not too surprisingly, the compile time benchmarks did not budge, or look bogus.

Though if you select "runtime", there is a very significant improvement to fmt-debug-derive, which sounds like exactly the benchmark touching this code.

https://perf.rust-lang.org/compare.html?start=6f726205a1b7992537ddec96c83f2b054b03e04f&end=d3c44b181fb6776a28574e754c3db10da6172bd9&stat=instructions%3Au&tab=runtime&nonRelevant=true&showRawData=true


So the question remains, what can I do to improve the confidence in this change? I have looked through other similar existing methods, but none of which do exactly what I expect.

Another option would be to also perf-test #121138, as in my local benchmarking I found that the slowness came from binary searching the Grapheme_Extend table. Maybe the compiler is smart enough to generate equally fast code with an arguably simpler change?

@cuviper
Copy link
Member

cuviper commented Mar 6, 2024

Let's see what this now does on top of #121138...

@bors try @rust-timer queue

And for good measure, let's mention that #122013 is also making changes around this.

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 6, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 6, 2024
Add a fast-path to `Debug` ASCII `&str`

Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping.

---

This is an alternative / a companion to rust-lang#121138.

The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
@bors
Copy link
Contributor

bors commented Mar 6, 2024

⌛ Trying commit 04602a2 with merge d253c69...

@bors
Copy link
Contributor

bors commented Mar 6, 2024

☀️ Try build successful - checks-actions
Build commit: d253c69 (d253c6933da1078e2c622d865089469a722c376b)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d253c69): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
4.5% [1.4%, 7.7%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.6% [-2.6%, -2.6%] 1
All ❌✅ (primary) 4.5% [1.4%, 7.7%] 2

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 644.65s -> 646.087s (0.22%)
Artifact size: 175.06 MiB -> 175.06 MiB (0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 6, 2024
@Swatinem
Copy link
Contributor Author

Swatinem commented Mar 8, 2024

Another -26.93% on the fmt-debug-derive runtime benchmark.
I can take another look if I can maybe improve vectorization of the loop as suggested by @the8472. I will reach out on zulip if I have questions :-)

@cuviper
Copy link
Member

cuviper commented Mar 19, 2024

AIUI you're still investigating...
@rustbot author

@rustbot rustbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 19, 2024
@rustbot rustbot added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Mar 19, 2024
@Swatinem
Copy link
Contributor Author

2 weeks ago

Yes, I would still love to get to this, just got side tracked a bit with other stuff in the meantime 😅

@cuviper
Copy link
Member

cuviper commented Mar 20, 2024

No problem, just managing my review queue. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants