Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some of `[T]`’s methods to strings and vice versa #1152

Merged
merged 2 commits into from Jul 1, 2015

Conversation

Projects
None yet
@P1start
Copy link
Contributor

P1start commented Jun 6, 2015

Add some methods that already exist on slices to strings and vice versa. Specifically, the following methods should be added:

  • str::chunks
  • str::windows
  • str::into_string
  • String::into_boxed_slice
  • <[T]>::subslice_offset

Rendered

@alexcrichton alexcrichton added the T-libs label Jun 6, 2015

@Kimundi

This comment has been minimized.

Copy link
Member

Kimundi commented Jun 6, 2015

Is the reason for adding <[T]>::subslice_offset just symmetry? I'm asking because I'm not sure if that function is actually used by anyone.

@P1start

This comment has been minimized.

Copy link
Contributor Author

P1start commented Jun 6, 2015

@Kimundi I’ve wanted to use it in the past, but yeah, it’s mainly just to match the str API. I wouldn’t really care if subslice_offset got removed from str (it’s not very useful), but I’d certainly prefer to either have it for both types or neither, rather than just one.

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Jun 6, 2015

Is there really a usecase for str::chunks or windows?

@gkoz

This comment has been minimized.

Copy link

gkoz commented Jun 6, 2015

Out of curiosity, what is Box<str> for? I haven't encountered any mentions of this type before.

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Jun 6, 2015

@gkoz In principle it's just a runtime-constructed &'static str (string litteral). one usize thinner than a String, but effectively immutable as a result.

@P1start

This comment has been minimized.

Copy link
Contributor Author

P1start commented Jun 6, 2015

@Gankro str::windows could be used to calculate the Sørensen–Dice coefficient of two strings, and str::chunks has been asked for on IRC a few times.

@huonw

This comment has been minimized.

Copy link
Member

huonw commented Jun 6, 2015

Hm, I suspect windows and chunks will encourage bad Unicode practice without a huge amount of benefit other than symmetry. Specifically they will split between combining characters.

Add the following methods to `str`, presumably as inherent methods:

- `chunks(&self, n: usize) -> Chunks`: Returns an iterator that yields the
*characters* (not bytes) of the string in groups of `n` at a time. Iterator

This comment has been minimized.

@SimonSapin

SimonSapin Jun 9, 2015

Contributor

“Character” is kinda ambiguous. (Unicode has four different definitions.) Maybe use chars instead?

@krdln

This comment has been minimized.

Copy link
Contributor

krdln commented Jun 9, 2015

+1 to functions involving Box<str>. I was surprised they weren't implemented. And it's good to lose those 4/8 bytes sometimes (especially when within an enum).

-1 to windows and chunks. I second @huonw that they encourage bad practices. I think they belong in unicode crate with codepoint_ and grapheme_ variants.

@krdln

This comment has been minimized.

Copy link
Contributor

krdln commented Jun 10, 2015

I just stumbled upon the lack of str::shift_char() counterpart for a slice. I guess it would also be worth adding.

@P1start

This comment has been minimized.

Copy link
Contributor Author

P1start commented Jun 10, 2015

I just stumbled upon the lack of str::shift_char() counterpart for a slice. I guess it would also be worth adding.

@krdln #1058 already adds that, I believe.

@comex

This comment has been minimized.

Copy link

comex commented Jun 11, 2015

May be out of scope, but I'd like to see some form of find, rfind, replace, and other pattern-based methods added to [T] or Vec<T>. This would be useful, for example, when dealing with a &[u8] which represents a "string" that may not be valid UTF-8 (such as the contents of a Unix OsStr) - or just random binary data, for that matter.

@Gankro Gankro self-assigned this Jun 11, 2015

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Jun 11, 2015

I concur with @huonw on the dubiousness of windows and chunks in the face of combining characters. It's not clear to me how the Sørensen–Dice coefficient is even supposed to interact with utf8 in that regard.

👍 on the Box conversions. Makes sense.

I'm completely indifferent on subslice_offset

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jun 16, 2015

This RFC is now entering its final comment period.


- `subslice_offset(&self, inner: &[T]) -> usize`: Returns the offset (in
elements) of an inner slice relative to an outer slice. Panics of `inner` is
not contained within `self`.

This comment has been minimized.

@alexcrichton

alexcrichton Jun 16, 2015

Member

I would personally prefer to not add this method to slices just yet, I suspect it won't survive stabilization of strings.


Add the following method to `String` as an inherent method:

- `into_boxed_slice(self) -> Box<str>`: Returns `self` as a `Box<str>`,

This comment has been minimized.

@alexcrichton

alexcrichton Jun 16, 2015

Member

Perhaps this could be into_boxed_str? I think we tend to refer to string slices as str instead of slice.

This comment has been minimized.

@nagisa

nagisa Jun 17, 2015

Contributor

Perhaps this could be an implementation of Into trait? I don’t see this being used enough to warrant a new method. I’d actually argue for deprecation of Vec::into_boxed_slice and implementation of Into trait too.

This comment has been minimized.

@alexcrichton

alexcrichton Jun 17, 2015

Member

That's somewhat related to #1153, but relying solely on Into unfortunately isn't without downsides. For example if I have a String and I know that I want to go to a Box<str> using the .into() method would require a type annotation, where a method like this would not. Overall we've been thinking so far that specific conversions (e.g. like this) will have named methods so they can be called ergonomically, but there will also likely be an Into implementation to satisfy generic bounds as well. Now that being said, we haven't yet posted the RFC to clarify these conversion conventions.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jun 16, 2015

I also agree with @huonw about being hesitant to add windows/chunks to strings due to unicode fun stuff.

- `windows(&self, n: usize) -> Windows`: Returns an iterator over all contiguous
windows of character length `n`. Iterator element type: `&str`.

- `into_string(self: Box<str>) -> String`: Returns `self` as a `String`. This is

This comment has been minimized.

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Jun 21, 2015

The Box<str> methods seem good to me but I share the unicode weirdness concerns above with respect to chunks and windows and am not sold on the utility of subslice_offset.

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Jun 22, 2015

@P1start the libs team seems pretty unanimous about Box<str> is great, everything else is not. Is this an acceptable result of this RFC, for you?

@P1start

This comment has been minimized.

Copy link
Contributor Author

P1start commented Jun 22, 2015

@Gankro Yep, that’s fine by me. I’ll update the RFC soon to include just those methods (and rename into_boxed_slice to into_boxed_str unless anyone has any objections).

@P1start

This comment has been minimized.

Copy link
Contributor Author

P1start commented Jun 27, 2015

RFC updated, removing str::windows, str::chunks, and <[T]>::subslice_offset.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 1, 2015

The consensus of the libs team is to merge this RFC, so I will do so. Thanks again @P1start!

@alexcrichton alexcrichton merged commit 566bb70 into rust-lang:master Jul 1, 2015

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Jul 1, 2015

bluss added a commit to bluss/rust that referenced this pull request Aug 12, 2015

Rename String::into_boxed_slice -> into_boxed_str
This is the name that was decided in rust-lang/rfcs#1152, and it's
better if we say “boxed str” for `Box<str>`.

bluss added a commit to bluss/rust that referenced this pull request Aug 13, 2015

Rename String::into_boxed_slice -> into_boxed_str
This is the name that was decided in rust-lang/rfcs#1152, and it's
better if we say “boxed str” for `Box<str>`.

bluss added a commit to bluss/rust that referenced this pull request Aug 13, 2015

Rename String::into_boxed_slice -> into_boxed_str
This is the name that was decided in rust-lang/rfcs#1152, and it's
better if we say “boxed str” for `Box<str>`.

The old name `String::into_boxed_slice` is deprecated.

bors added a commit to rust-lang/rust that referenced this pull request Aug 13, 2015

Auto merge of #27696 - bluss:into-boxed-str, r=alexcrichton
Rename String::into_boxed_slice -> into_boxed_str

This is the name that was decided in rust-lang/rfcs#1152, and it's
better if we say “boxed str” for `Box<str>`.

The old name `String::into_boxed_slice` is deprecated.

bors added a commit to rust-lang/rust that referenced this pull request Aug 14, 2015

Auto merge of #27696 - bluss:into-boxed-str, r=alexcrichton
Rename String::into_boxed_slice -> into_boxed_str

This is the name that was decided in rust-lang/rfcs#1152, and it's
better if we say “boxed str” for `Box<str>`.

The old name `String::into_boxed_slice` is deprecated.

@Centril Centril added the A-slice label Nov 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.