Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor core::char::EscapeDefault and co. structures #105076

Merged
merged 3 commits into from
May 2, 2023
Merged

Conversation

mina86
Copy link
Contributor

@mina86 mina86 commented Nov 29, 2022

Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug}
structures from using a state machine to computing escaped sequence
upfront and during iteration just going through the characters.

This is arguably simpler since it’s easier to think about having
a buffer and start..end range to iterate over rather than thinking
about a state machine.

This also harmonises implementation of aforementioned iterators and
core::ascii::EscapeDefault struct. This is done by introducing a new
helper EscapeIterInner struct which holds the buffer and offers simple
methods for iterating over range.

As a side effect, this probably optimises Display implementation for
those types since rather than calling write_char repeatedly, write_str
is invoked once. On 64-bit platforms, it also reduces size of some of
the structs:

| Struct                     | Before | After |
|----------------------------+--------+-------+
| core::char::EscapeUnicode  |     16 |    12 |
| core::char::EscapeDefault  |     16 |    12 |
| core::char::EscapeDebug    |     16 |    16 |

My ulterior motive and reason why I started looking into this is
addition of as_str method to the iterators. With this change this
will became trivial. It’s also going to be trivial to implement
DoubleEndedIterator if that’s ever desired.

@rustbot
Copy link
Collaborator

rustbot commented Nov 29, 2022

r? @scottmcm

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 29, 2022
@rustbot
Copy link
Collaborator

rustbot commented Nov 29, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@bors
Copy link
Contributor

bors commented Feb 12, 2023

☔ The latest upstream changes (presumably #105671) made this pull request unmergeable. Please resolve the merge conflicts.

@anden3 anden3 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 5, 2023
@anden3
Copy link
Contributor

anden3 commented Apr 5, 2023

Hello @mina86! I just want to ping you as part of the triage procedure as this PR has merge conflicts :)

Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug}
structures from using a state machine to computing escaped sequence
upfront and during iteration just going through the characters.

This is arguably simpler since it’s easier to think about having
a buffer and start..end range to iterate over rather than thinking
about a state machine.

This also harmonises implementation of aforementioned iterators and
core::ascii::EscapeDefault struct.  This is done by introducing a new
helper EscapeIterInner struct which holds the buffer and offers simple
methods for iterating over range.

As a side effect, this probably optimises Display implementation for
those types since rather than calling write_char repeatedly, write_str
is invoked once.  On 64-bit platforms, it also reduces size of some of
the structs:

    | Struct                     | Before | After |
    |----------------------------+--------+-------+
    | core::char::EscapeUnicode  |     16 |    12 |
    | core::char::EscapeDefault  |     16 |    12 |
    | core::char::EscapeDebug    |     16 |    16 |

My ulterior motive and reason why I started looking into this is
addition of as_str method to the iterators.  With this change this
will became trivial.  It’s also going to be trivial to implement
DoubleEndedIterator if that’s ever desired.
@mina86
Copy link
Contributor Author

mina86 commented Apr 5, 2023

Hello @mina86! I just want to ping you as part of the triage procedure as this PR has merge conflicts :)

Done.

@anden3 anden3 added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 5, 2023
Copy link
Member

@scottmcm scottmcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking a bazillion years to review this. I like the approach of not bothering with an inline state machine for these -- it seems unlikely that people ever want just the first couple bytes without the rest, and just computing them straight-line upfront ought to be net much cheaper than when it's spread over multiple steps.

I've left a bunch of thoughts as I went through, but nothing drastic. Please go through and address them -- either with code changes or by replying to them with why you think the existing is better -- then we can get it landed!

@rustbot author

use crate::num::NonZeroUsize;
use crate::ops::Range;

const HEX_DIGITS: [u8; 16] = *b"0123456789abcdef";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curiosity: I see that the old escape_ascii had hex_digits: &[u8; 16]. Any idea if using it by value here (instead of the reference) makes any difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It most likely doesn’t matter here. I use array out of habit but I’m not even sure it’s relevant for Rust where everything is compiled statically and LTO is common.

@@ -0,0 +1,97 @@
//! Helper code for character escaping.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pondering: it's not obvious to me that a new top-level module is the right place for this.

Maybe have it be ascii/escape.rs instead, since it's always escaping things to ascii?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I didn’t want to put it inside ascii is because of slight differences in escaping. '\0'.escape_default() produces \u{0} while 0u8.escape_default() produces \x00 (also '\0'.escape_debug() produces \0). The way I was thinking about it is that std::ascii::escape_default and char::escape_default both use EscapeIterInner so they are at the same ‘hierarchy’ meaning that EscapeIterInner shouldn’t be inside std::ascii.

But yes, I understand your concerns and I’m not completely convinced either. Maybe I’m just overthinking this?

library/core/src/escape.rs Outdated Show resolved Hide resolved
library/core/src/escape.rs Show resolved Hide resolved
library/core/src/escape.rs Show resolved Hide resolved
library/core/src/escape.rs Outdated Show resolved Hide resolved
library/core/src/char/methods.rs Outdated Show resolved Hide resolved
library/core/src/escape.rs Outdated Show resolved Hide resolved
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 29, 2023
@mina86
Copy link
Contributor Author

mina86 commented Apr 30, 2023

Strangely if I try to update the branch I’m getting:

$ ./x.py test library/core
...
error: two packages named `la-arena` in this workspace:
- /srv/mpn/rust/src/tools/rust-analyzer/lib/arena/Cargo.toml
- /srv/mpn/rust/src/tools/rust-analyzer/lib/la-arena/Cargo.toml

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 30, 2023
@scottmcm
Copy link
Member

Thanks!

@bors r+

@bors
Copy link
Contributor

bors commented Apr 30, 2023

📌 Commit 76c9947 has been approved by scottmcm

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 30, 2023
bors added a commit to rust-lang-ci/rust that referenced this pull request May 2, 2023
Rollup of 7 pull requests

Successful merges:

 - rust-lang#105076 (Refactor core::char::EscapeDefault and co. structures)
 - rust-lang#108161 (Add `ConstParamTy` trait)
 - rust-lang#108668 (Stabilize debugger_visualizer)
 - rust-lang#110512 (Fix elaboration with associated type bounds)
 - rust-lang#110895 (Remove `all` in target_thread_local cfg)
 - rust-lang#110955 (uplift `clippy::clone_double_ref` as `suspicious_double_ref_op`)
 - rust-lang#111048 (Mark`feature(return_position_impl_trait_in_trait)` and`feature(async_fn_in_trait)` as not incomplete)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit f916c44 into rust-lang:master May 2, 2023
11 checks passed
@rustbot rustbot added this to the 1.71.0 milestone May 2, 2023
@mina86 mina86 deleted the a branch January 27, 2024 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants