Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for str escaping #27791

Closed
alexcrichton opened this Issue Aug 13, 2015 · 28 comments

Comments

Projects
None yet
@alexcrichton
Copy link
Member

alexcrichton commented Aug 13, 2015

This is a tracking issue for the unstable str_escape feature in the standard library. The open questions I know of here are:

  • Do we want to support these relatively arbitrary ways to escape strings? Note that these same methods for escaping characters are already stable.
  • Are the precise forms of escaping stable? (probably because they delegate to char implementations)

I suspect that because the implementations on char are stable that we'll also stabilize these.

Signatures:

String::escape_debug(&self) -> String;
String::escape_default(&self) -> String;
String::escape_unicode(&self) -> String;
@AnthonyBroadCrawford

This comment has been minimized.

Copy link

AnthonyBroadCrawford commented Sep 21, 2015

@alexcrichton Mind if I take this issue?

@alexcrichton

This comment has been minimized.

Copy link
Member Author

alexcrichton commented Sep 22, 2015

@AnthonyBroadCrawford certainly! The workflow for this would probably look like:

  • Come up with a story for these methods, e.g. should they be stabilized as-is or tweaked?
  • Gain feedback about that state of affairs (e.g. IRC, reddit, forums, etc).
  • If large enough, write an RFC
  • Write a PR
  • Stabilize in the next cycle.
@Diggsey

This comment has been minimized.

Copy link
Contributor

Diggsey commented Sep 22, 2015

The reason given for why they're currently unstable says that they might be rewritten to return a char iterator instead?

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Sep 23, 2015

Maybe we can consider the idea of them returning iterators, but those iterators implementing Display. That makes it easy to use them composably (without having to allocate a string) while also having the string available (the .to_string() method).

@Diggsey

This comment has been minimized.

Copy link
Contributor

Diggsey commented Sep 23, 2015

@bluss You can already do .collect() on a char iterator to turn it into a string.

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Sep 23, 2015

True. With Display you can {}-format the escaped string without intermediate String or write-each-char loop.

@alexcrichton

This comment has been minimized.

Copy link
Member Author

alexcrichton commented Dec 17, 2015

🔔 This issue is now entering its final comment period for stabilization 🔔

For now this FCP will be for returning a String as returning an Iterator which implements Display isn't found much elsewhere in the standard library, but FCP can help duke this out!

@BurntSushi

This comment has been minimized.

Copy link
Member

BurntSushi commented Jan 13, 2016

Given that we have char::escape_{default,unicode} already stabilized, these seem like natural things to stabilize as well. I also agree with returning String, because that's probably what you'll use most of the time. If the extra allocation really matters, then expecting the programmer to build the iterator themselves with the underlying methods on char seems pretty reasonable IMO.

I do admit that I find it strange that these methods are in std, but I have found them occasionally very useful.

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Jan 13, 2016

As previous, I favour using Display-based interfaces for composability.

For example write!(w, "{}:{}", s.escape_default(), t.escape_default()) does then not need to allocate intermediate strings, and it can be part of a much larger function, where the output and allocation is ultimately determined by the caller.

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Jan 13, 2016

@BurntSushi The problem with going through char iterators is extra utf-8 en/decoding.

@BurntSushi

This comment has been minimized.

Copy link
Member

BurntSushi commented Jan 14, 2016

@aturon Good point. If others feel good about returning an iterator, then I'm happy with that!

@alexcrichton

This comment has been minimized.

Copy link
Member Author

alexcrichton commented Jan 15, 2016

The libs team discussed this in triage recently and the decision was to punt on this for now.

It was brought up that with the exact same name as char::escape_default it is reasonable to expect that the two methods work the same way, so returning a String may be surprising vs returning an iterator with char::escape_default. It was not clear that we had consensus for returning a structure that implements Display + Iterator, but the options we were considering were:

// signature
fn escape_default(&self) -> EscapeDefault;
  1. EscapeDefault just implements Display
  2. EscapeDefault just implements Iterator
  3. EscapeDefault implements both Display + Iterator

As pointed out, just implementing Iterator may not be the most performant because you have to decode and then re-encode, whereas a direct Display implementation would lend itself to perhaps less work.

It's also a possibility that we could return one trait implementation and then add the other in a backwards-compatible fashion in the future.

While we decide this, however, this is being removed from FCP.

@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented Jan 10, 2017

Just wanted to note that if it's decided that these methods are best implemented as Iterators or Displayable types, that to_lowercase and to_uppercase should be planned to eventually get the same treatment. Right now, they're implemented as Strings, which isn't the best IMO especially when in many cases it'd be more efficient to just use a Display implementation.

(obviously that'd be a breaking change, but it's something definitely worth considering)

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Jan 11, 2017

@clarcharr to_lowercase and to_uppercase exist both as methods of &str where they’re convenience methods that return String, and of char where they return iterators that doesn’t allocate. The theory is that the former are easy to build on top of the latter (e.g. if you need something other than String) but there is one exception: str::to_lowercase has additional code to handle the case of Σ in "end of word" positions (which maps to ς instead of σ). So… maybe there should still be an iterator that does this.

(obviously that'd be a breaking change, but it's something definitely worth considering)

Probably we’d deprecate the old methods but not remove them.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Mar 28, 2018

This feature is about three inherent methods of the str type: escape_default, escape_unicode, escape_debug. Three corresponding inherent methods of char are already stable.

The libs team discussed this and the consensus was to stabilize, after changing the return type (from String) to dedicated types that implement both Iterator<Item=char> and Display, similar to char methods. If Display is used after Iterator::next has been called once or more, only the "remaining" chars are displayed. (Again like with char methods.)

@rfcbot fcp merge

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Mar 28, 2018

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Mar 28, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@mominul

This comment has been minimized.

Copy link

mominul commented May 31, 2018

As the FCP period is completed, now the rest is stabilization, isn't it? Should I follow these steps to tackle this?

Sorry if I am getting things wrong!
Thanks!

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented May 31, 2018

Correct, the next step is a stabilization PR. But those steps are for language features implemented in the compiler and mostly not relevant to library features. In the standard library some #[unstable] attributes need to be replaced by #[stable] (look for existing examples for what parameters to include).

Note however that in this case we agreed to stabilized after making some changes to the API: #27791 (comment). (This can be two commits in the same PR.)

If you want to submit a PR feel free to ping me on it for review.

@est31

This comment has been minimized.

Copy link
Contributor

est31 commented May 31, 2018

This is an interesting case here, as it is possibly the first time since the 1.26.0 release where we are stabilizing a function that returns an iterator. So we have the chance to return impl Trait here, but note that like @BurntSushi notes, impl Trait right now has disadvantages.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented May 31, 2018

The agreed API is not just an iterator, but also something that implements Display. Is -> impl Iterator<Item=char> + Display supported?

Given the noted disadvantages I’m inclined to go with "traditional" dedicated types here. In particular, not being able to name the return type seems significant.

@est31

This comment has been minimized.

Copy link
Contributor

est31 commented May 31, 2018

Is -> impl Iterator<Item=char> + Display supported?

Yes.

Note that @mominul had been reaching out for help on #rust-internals, this is what spawned my comment above. You may want to read the scrollback for context. Someone suggested use of impl Trait, then @rkruppe pointed out that this wasn't what the libs team agreed on.

@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented May 31, 2018

Personally, I think that it would be best to make concrete types for now, like the rest of the standard library. At some point, though, it would be nice to be able to soft-deprecate all of the concrete iterator types and replace them with impl Iterator in the docs.

@BurntSushi

This comment has been minimized.

Copy link
Member

BurntSushi commented May 31, 2018

and replace them with impl Iterator in the docs.

For many of the adapters, that will actually need to be quite longer. For iter::Map for example, I think it would need to be, impl Iterator<Item=B> + DoubleEndedIterator + Debug + ExactSizeIterator + FusedIterator + TrustedLen + Clone + Send + Sync.

@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented May 31, 2018

Err, you're right. It seems that a few trait aliases might come in handy for that.

Not sure about Send and Sync though.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented May 31, 2018

For most iterator adapters (including Map) that doesn't even work because many impls are conditional on what traits the wrapped iterator implements -- DoubleEndedIterator, ExactSizeIterator, Clone, etc.

I also suspect there's problems with variance but that's probably less of a problem in practice.

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 1, 2019

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 1, 2019

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 1, 2019

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 1, 2019

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 1, 2019

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 1, 2019

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 1, 2019

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 1, 2019

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 3, 2019

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 3, 2019

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 6, 2019

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 6, 2019

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 7, 2019

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 7, 2019

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 12, 2019

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 12, 2019

bors added a commit that referenced this issue Feb 12, 2019

Auto merge of #58051 - SimonSapin:str_escape, r=alexcrichton
Stabilize str::escape_* methods with new return types…

… that implement `Display` and `Iterator<Item=char>`, as proposed in FCP: #27791 (comment)
@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Feb 13, 2019

Stabilized in #58051.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.