Tracking issue for str escaping #27791

Open
alexcrichton opened this Issue Aug 13, 2015 · 27 comments

Comments

Projects
None yet
@alexcrichton
Member

alexcrichton commented Aug 13, 2015

This is a tracking issue for the unstable str_escape feature in the standard library. The open questions I know of here are:

  • Do we want to support these relatively arbitrary ways to escape strings? Note that these same methods for escaping characters are already stable.
  • Are the precise forms of escaping stable? (probably because they delegate to char implementations)

I suspect that because the implementations on char are stable that we'll also stabilize these.

Signatures:

String::escape_debug(&self) -> String;
String::escape_default(&self) -> String;
String::escape_unicode(&self) -> String;
@AnthonyBroadCrawford

This comment has been minimized.

Show comment
Hide comment
@AnthonyBroadCrawford

AnthonyBroadCrawford Sep 21, 2015

@alexcrichton Mind if I take this issue?

@alexcrichton Mind if I take this issue?

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Sep 22, 2015

Member

@AnthonyBroadCrawford certainly! The workflow for this would probably look like:

  • Come up with a story for these methods, e.g. should they be stabilized as-is or tweaked?
  • Gain feedback about that state of affairs (e.g. IRC, reddit, forums, etc).
  • If large enough, write an RFC
  • Write a PR
  • Stabilize in the next cycle.
Member

alexcrichton commented Sep 22, 2015

@AnthonyBroadCrawford certainly! The workflow for this would probably look like:

  • Come up with a story for these methods, e.g. should they be stabilized as-is or tweaked?
  • Gain feedback about that state of affairs (e.g. IRC, reddit, forums, etc).
  • If large enough, write an RFC
  • Write a PR
  • Stabilize in the next cycle.
@Diggsey

This comment has been minimized.

Show comment
Hide comment
@Diggsey

Diggsey Sep 22, 2015

Contributor

The reason given for why they're currently unstable says that they might be rewritten to return a char iterator instead?

Contributor

Diggsey commented Sep 22, 2015

The reason given for why they're currently unstable says that they might be rewritten to return a char iterator instead?

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Sep 23, 2015

Contributor

Maybe we can consider the idea of them returning iterators, but those iterators implementing Display. That makes it easy to use them composably (without having to allocate a string) while also having the string available (the .to_string() method).

Contributor

bluss commented Sep 23, 2015

Maybe we can consider the idea of them returning iterators, but those iterators implementing Display. That makes it easy to use them composably (without having to allocate a string) while also having the string available (the .to_string() method).

@Diggsey

This comment has been minimized.

Show comment
Hide comment
@Diggsey

Diggsey Sep 23, 2015

Contributor

@bluss You can already do .collect() on a char iterator to turn it into a string.

Contributor

Diggsey commented Sep 23, 2015

@bluss You can already do .collect() on a char iterator to turn it into a string.

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Sep 23, 2015

Contributor

True. With Display you can {}-format the escaped string without intermediate String or write-each-char loop.

Contributor

bluss commented Sep 23, 2015

True. With Display you can {}-format the escaped string without intermediate String or write-each-char loop.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Dec 17, 2015

Member

🔔 This issue is now entering its final comment period for stabilization 🔔

For now this FCP will be for returning a String as returning an Iterator which implements Display isn't found much elsewhere in the standard library, but FCP can help duke this out!

Member

alexcrichton commented Dec 17, 2015

🔔 This issue is now entering its final comment period for stabilization 🔔

For now this FCP will be for returning a String as returning an Iterator which implements Display isn't found much elsewhere in the standard library, but FCP can help duke this out!

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Jan 13, 2016

Member

Given that we have char::escape_{default,unicode} already stabilized, these seem like natural things to stabilize as well. I also agree with returning String, because that's probably what you'll use most of the time. If the extra allocation really matters, then expecting the programmer to build the iterator themselves with the underlying methods on char seems pretty reasonable IMO.

I do admit that I find it strange that these methods are in std, but I have found them occasionally very useful.

Member

BurntSushi commented Jan 13, 2016

Given that we have char::escape_{default,unicode} already stabilized, these seem like natural things to stabilize as well. I also agree with returning String, because that's probably what you'll use most of the time. If the extra allocation really matters, then expecting the programmer to build the iterator themselves with the underlying methods on char seems pretty reasonable IMO.

I do admit that I find it strange that these methods are in std, but I have found them occasionally very useful.

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Jan 13, 2016

Contributor

As previous, I favour using Display-based interfaces for composability.

For example write!(w, "{}:{}", s.escape_default(), t.escape_default()) does then not need to allocate intermediate strings, and it can be part of a much larger function, where the output and allocation is ultimately determined by the caller.

Contributor

bluss commented Jan 13, 2016

As previous, I favour using Display-based interfaces for composability.

For example write!(w, "{}:{}", s.escape_default(), t.escape_default()) does then not need to allocate intermediate strings, and it can be part of a much larger function, where the output and allocation is ultimately determined by the caller.

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Jan 13, 2016

Member

@BurntSushi The problem with going through char iterators is extra utf-8 en/decoding.

Member

aturon commented Jan 13, 2016

@BurntSushi The problem with going through char iterators is extra utf-8 en/decoding.

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Jan 14, 2016

Member

@aturon Good point. If others feel good about returning an iterator, then I'm happy with that!

Member

BurntSushi commented Jan 14, 2016

@aturon Good point. If others feel good about returning an iterator, then I'm happy with that!

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Jan 15, 2016

Member

The libs team discussed this in triage recently and the decision was to punt on this for now.

It was brought up that with the exact same name as char::escape_default it is reasonable to expect that the two methods work the same way, so returning a String may be surprising vs returning an iterator with char::escape_default. It was not clear that we had consensus for returning a structure that implements Display + Iterator, but the options we were considering were:

// signature
fn escape_default(&self) -> EscapeDefault;
  1. EscapeDefault just implements Display
  2. EscapeDefault just implements Iterator
  3. EscapeDefault implements both Display + Iterator

As pointed out, just implementing Iterator may not be the most performant because you have to decode and then re-encode, whereas a direct Display implementation would lend itself to perhaps less work.

It's also a possibility that we could return one trait implementation and then add the other in a backwards-compatible fashion in the future.

While we decide this, however, this is being removed from FCP.

Member

alexcrichton commented Jan 15, 2016

The libs team discussed this in triage recently and the decision was to punt on this for now.

It was brought up that with the exact same name as char::escape_default it is reasonable to expect that the two methods work the same way, so returning a String may be surprising vs returning an iterator with char::escape_default. It was not clear that we had consensus for returning a structure that implements Display + Iterator, but the options we were considering were:

// signature
fn escape_default(&self) -> EscapeDefault;
  1. EscapeDefault just implements Display
  2. EscapeDefault just implements Iterator
  3. EscapeDefault implements both Display + Iterator

As pointed out, just implementing Iterator may not be the most performant because you have to decode and then re-encode, whereas a direct Display implementation would lend itself to perhaps less work.

It's also a possibility that we could return one trait implementation and then add the other in a backwards-compatible fashion in the future.

While we decide this, however, this is being removed from FCP.

@clarcharr

This comment has been minimized.

Show comment
Hide comment
@clarcharr

clarcharr Jan 10, 2017

Contributor

Just wanted to note that if it's decided that these methods are best implemented as Iterators or Displayable types, that to_lowercase and to_uppercase should be planned to eventually get the same treatment. Right now, they're implemented as Strings, which isn't the best IMO especially when in many cases it'd be more efficient to just use a Display implementation.

(obviously that'd be a breaking change, but it's something definitely worth considering)

Contributor

clarcharr commented Jan 10, 2017

Just wanted to note that if it's decided that these methods are best implemented as Iterators or Displayable types, that to_lowercase and to_uppercase should be planned to eventually get the same treatment. Right now, they're implemented as Strings, which isn't the best IMO especially when in many cases it'd be more efficient to just use a Display implementation.

(obviously that'd be a breaking change, but it's something definitely worth considering)

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Jan 11, 2017

Contributor

@clarcharr to_lowercase and to_uppercase exist both as methods of &str where they’re convenience methods that return String, and of char where they return iterators that doesn’t allocate. The theory is that the former are easy to build on top of the latter (e.g. if you need something other than String) but there is one exception: str::to_lowercase has additional code to handle the case of Σ in "end of word" positions (which maps to ς instead of σ). So… maybe there should still be an iterator that does this.

(obviously that'd be a breaking change, but it's something definitely worth considering)

Probably we’d deprecate the old methods but not remove them.

Contributor

SimonSapin commented Jan 11, 2017

@clarcharr to_lowercase and to_uppercase exist both as methods of &str where they’re convenience methods that return String, and of char where they return iterators that doesn’t allocate. The theory is that the former are easy to build on top of the latter (e.g. if you need something other than String) but there is one exception: str::to_lowercase has additional code to handle the case of Σ in "end of word" positions (which maps to ς instead of σ). So… maybe there should still be an iterator that does this.

(obviously that'd be a breaking change, but it's something definitely worth considering)

Probably we’d deprecate the old methods but not remove them.

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Mar 28, 2018

Contributor

This feature is about three inherent methods of the str type: escape_default, escape_unicode, escape_debug. Three corresponding inherent methods of char are already stable.

The libs team discussed this and the consensus was to stabilize, after changing the return type (from String) to dedicated types that implement both Iterator<Item=char> and Display, similar to char methods. If Display is used after Iterator::next has been called once or more, only the "remaining" chars are displayed. (Again like with char methods.)

@rfcbot fcp merge

Contributor

SimonSapin commented Mar 28, 2018

This feature is about three inherent methods of the str type: escape_default, escape_unicode, escape_debug. Three corresponding inherent methods of char are already stable.

The libs team discussed this and the consensus was to stabilize, after changing the return type (from String) to dedicated types that implement both Iterator<Item=char> and Display, similar to char methods. If Display is used after Iterator::next has been called once or more, only the "remaining" chars are displayed. (Again like with char methods.)

@rfcbot fcp merge

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Mar 28, 2018

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot commented Mar 28, 2018

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Mar 28, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot commented Mar 28, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Apr 7, 2018

The final comment period is now complete.

rfcbot commented Apr 7, 2018

The final comment period is now complete.

@mominul

This comment has been minimized.

Show comment
Hide comment
@mominul

mominul May 31, 2018

As the FCP period is completed, now the rest is stabilization, isn't it? Should I follow these steps to tackle this?

Sorry if I am getting things wrong!
Thanks!

mominul commented May 31, 2018

As the FCP period is completed, now the rest is stabilization, isn't it? Should I follow these steps to tackle this?

Sorry if I am getting things wrong!
Thanks!

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin May 31, 2018

Contributor

Correct, the next step is a stabilization PR. But those steps are for language features implemented in the compiler and mostly not relevant to library features. In the standard library some #[unstable] attributes need to be replaced by #[stable] (look for existing examples for what parameters to include).

Note however that in this case we agreed to stabilized after making some changes to the API: #27791 (comment). (This can be two commits in the same PR.)

If you want to submit a PR feel free to ping me on it for review.

Contributor

SimonSapin commented May 31, 2018

Correct, the next step is a stabilization PR. But those steps are for language features implemented in the compiler and mostly not relevant to library features. In the standard library some #[unstable] attributes need to be replaced by #[stable] (look for existing examples for what parameters to include).

Note however that in this case we agreed to stabilized after making some changes to the API: #27791 (comment). (This can be two commits in the same PR.)

If you want to submit a PR feel free to ping me on it for review.

@est31

This comment has been minimized.

Show comment
Hide comment
@est31

est31 May 31, 2018

Contributor

This is an interesting case here, as it is possibly the first time since the 1.26.0 release where we are stabilizing a function that returns an iterator. So we have the chance to return impl Trait here, but note that like @BurntSushi notes, impl Trait right now has disadvantages.

Contributor

est31 commented May 31, 2018

This is an interesting case here, as it is possibly the first time since the 1.26.0 release where we are stabilizing a function that returns an iterator. So we have the chance to return impl Trait here, but note that like @BurntSushi notes, impl Trait right now has disadvantages.

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin May 31, 2018

Contributor

The agreed API is not just an iterator, but also something that implements Display. Is -> impl Iterator<Item=char> + Display supported?

Given the noted disadvantages I’m inclined to go with "traditional" dedicated types here. In particular, not being able to name the return type seems significant.

Contributor

SimonSapin commented May 31, 2018

The agreed API is not just an iterator, but also something that implements Display. Is -> impl Iterator<Item=char> + Display supported?

Given the noted disadvantages I’m inclined to go with "traditional" dedicated types here. In particular, not being able to name the return type seems significant.

@est31

This comment has been minimized.

Show comment
Hide comment
@est31

est31 May 31, 2018

Contributor

Is -> impl Iterator<Item=char> + Display supported?

Yes.

Note that @mominul had been reaching out for help on #rust-internals, this is what spawned my comment above. You may want to read the scrollback for context. Someone suggested use of impl Trait, then @rkruppe pointed out that this wasn't what the libs team agreed on.

Contributor

est31 commented May 31, 2018

Is -> impl Iterator<Item=char> + Display supported?

Yes.

Note that @mominul had been reaching out for help on #rust-internals, this is what spawned my comment above. You may want to read the scrollback for context. Someone suggested use of impl Trait, then @rkruppe pointed out that this wasn't what the libs team agreed on.

@clarcharr

This comment has been minimized.

Show comment
Hide comment
@clarcharr

clarcharr May 31, 2018

Contributor

Personally, I think that it would be best to make concrete types for now, like the rest of the standard library. At some point, though, it would be nice to be able to soft-deprecate all of the concrete iterator types and replace them with impl Iterator in the docs.

Contributor

clarcharr commented May 31, 2018

Personally, I think that it would be best to make concrete types for now, like the rest of the standard library. At some point, though, it would be nice to be able to soft-deprecate all of the concrete iterator types and replace them with impl Iterator in the docs.

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi May 31, 2018

Member

and replace them with impl Iterator in the docs.

For many of the adapters, that will actually need to be quite longer. For iter::Map for example, I think it would need to be, impl Iterator<Item=B> + DoubleEndedIterator + Debug + ExactSizeIterator + FusedIterator + TrustedLen + Clone + Send + Sync.

Member

BurntSushi commented May 31, 2018

and replace them with impl Iterator in the docs.

For many of the adapters, that will actually need to be quite longer. For iter::Map for example, I think it would need to be, impl Iterator<Item=B> + DoubleEndedIterator + Debug + ExactSizeIterator + FusedIterator + TrustedLen + Clone + Send + Sync.

@clarcharr

This comment has been minimized.

Show comment
Hide comment
@clarcharr

clarcharr May 31, 2018

Contributor

Err, you're right. It seems that a few trait aliases might come in handy for that.

Not sure about Send and Sync though.

Contributor

clarcharr commented May 31, 2018

Err, you're right. It seems that a few trait aliases might come in handy for that.

Not sure about Send and Sync though.

@rkruppe

This comment has been minimized.

Show comment
Hide comment
@rkruppe

rkruppe May 31, 2018

Contributor

For most iterator adapters (including Map) that doesn't even work because many impls are conditional on what traits the wrapped iterator implements -- DoubleEndedIterator, ExactSizeIterator, Clone, etc.

I also suspect there's problems with variance but that's probably less of a problem in practice.

Contributor

rkruppe commented May 31, 2018

For most iterator adapters (including Map) that doesn't even work because many impls are conditional on what traits the wrapped iterator implements -- DoubleEndedIterator, ExactSizeIterator, Clone, etc.

I also suspect there's problems with variance but that's probably less of a problem in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment