RFC: conventions for placement of unsafe APIs #240

Merged
merged 4 commits into from Oct 7, 2014

Conversation

Projects
None yet
10 participants
@aturon
Member

aturon commented Sep 15, 2014

This is a conventions RFC for settling the location of unsafe APIs relative
to the types they work with, as well as the use of raw submodules.

The brief summary is:

  • Unsafe APIs should be made into methods or static functions in the same cases
    that safe APIs would be.
  • raw submodules should be used only to provide APIs directly on low-level
    representations.

Rendered

+ string. This method makes it easy to work with the byte-based representation
+ of the string, but thereby also allows violation of the utf8 guarantee.
+
+* A `raw` submodule with a number of free functions, like `from_parts`, that

This comment has been minimized.

@thestinger

thestinger Sep 15, 2014

There are problems with using a raw module that are not addressed. There are modules are btree treemap, trie exposing more than one type and a raw module would fail to distinguish between these. A raw module also interacts very poorly with re-exports and adds verbosity to usage of the API.

The motivation for making unsafe code uglier in these cases is not laid out. The point of unsafe is to draw a boundary between safe and unsafe code without the need for conventions to distinguish between these. The rule about construction from raw pointer representation is very obscure, and doesn't change the fact that it makes the API less consistent and more painful to use without gaining anything in return.

@thestinger

thestinger Sep 15, 2014

There are problems with using a raw module that are not addressed. There are modules are btree treemap, trie exposing more than one type and a raw module would fail to distinguish between these. A raw module also interacts very poorly with re-exports and adds verbosity to usage of the API.

The motivation for making unsafe code uglier in these cases is not laid out. The point of unsafe is to draw a boundary between safe and unsafe code without the need for conventions to distinguish between these. The rule about construction from raw pointer representation is very obscure, and doesn't change the fact that it makes the API less consistent and more painful to use without gaining anything in return.

This comment has been minimized.

@aturon

aturon Sep 15, 2014

Member

@thestinger

There are problems with using a raw module that are not addressed. There are modules are btree treemap, trie exposing more than one type and a raw module would fail to distinguish between these. A raw module also interacts very poorly with re-exports and adds verbosity to usage of the API. The motivation for making unsafe code uglier in these cases is not laid out.

I'm a little confused by this comment. The RFC recommends making unsafe functions into methods or static functions in the same cases you'd do so for safe APIs. It cites as an explicit benefit that importing/using these APIs becomes easier by doing so:

The benefit to moving unsafe APIs into methods (resp. static functions) is the usual one: you can gain easy access to these APIs merely by having a value of the type (resp. importing the type).

@aturon

aturon Sep 15, 2014

Member

@thestinger

There are problems with using a raw module that are not addressed. There are modules are btree treemap, trie exposing more than one type and a raw module would fail to distinguish between these. A raw module also interacts very poorly with re-exports and adds verbosity to usage of the API. The motivation for making unsafe code uglier in these cases is not laid out.

I'm a little confused by this comment. The RFC recommends making unsafe functions into methods or static functions in the same cases you'd do so for safe APIs. It cites as an explicit benefit that importing/using these APIs becomes easier by doing so:

The benefit to moving unsafe APIs into methods (resp. static functions) is the usual one: you can gain easy access to these APIs merely by having a value of the type (resp. importing the type).

This comment has been minimized.

@kballard

kballard Sep 15, 2014

Contributor

You keep asserting that this makes the API harder to use, but that's extremely subjective. I've always found the raw modules in str/slice/string/vec make the API much easier to use, because any time I need to construct a value from raw pointers, I know precisely where to look.

@kballard

kballard Sep 15, 2014

Contributor

You keep asserting that this makes the API harder to use, but that's extremely subjective. I've always found the raw modules in str/slice/string/vec make the API much easier to use, because any time I need to construct a value from raw pointers, I know precisely where to look.

This comment has been minimized.

@thestinger

thestinger Sep 15, 2014

@aturon: Sorry, I misinterpreted the proposed guidelines. The reason I didn't find the details about how to handle those edge cases and a detailed rationale is because it's not what you're proposing.

@thestinger

thestinger Sep 15, 2014

@aturon: Sorry, I misinterpreted the proposed guidelines. The reason I didn't find the details about how to handle those edge cases and a detailed rationale is because it's not what you're proposing.

This comment has been minimized.

@huonw

huonw Sep 15, 2014

Member

I think much of the "know where to look" problem would be fixed by segmenting the things in the source, and also allowing rustdoc to divide functions/methods into user-defined subsections. This would mean it is clear both in the .rs file and in the docs.

@huonw

huonw Sep 15, 2014

Member

I think much of the "know where to look" problem would be fixed by segmenting the things in the source, and also allowing rustdoc to divide functions/methods into user-defined subsections. This would mean it is clear both in the .rs file and in the docs.

@kballard

This comment has been minimized.

Show comment
Hide comment
@kballard

kballard Sep 15, 2014

Contributor

This description of raw doesn't really match what we're doing today. The 4 modules std::slice::raw, std::str::raw, std::string::raw, and std::vec::raw all operate on raw pointers. More specifically, they construct the given type from raw pointers. The only exception are two functions in std::str::raw that derive a &str from another &str without doing utf-8 checks.

Arguably, "raw pointers" could be considered "low-level representation", but I'm not comfortable making that assumption, because only some of the functions actually treat the raw pointer as the low-level representation. Other functions, such as std::string::raw::from_buf_len(), copy data out of the raw pointer instead of using it directly.

There's also a couple of functions that don't even use raw pointers or low-level representations. I'm thinking here of std::str::raw::slice_bytes() and std::str::raw::slice_unchecked(). They both construct a &str based on another &str without checking if the resulting slice maintains the utf-8 invariant (the _unchecked variant also skips the bounds test).

Given these last two functions, I think the more general description of raw is functions that construct a value of the type without checking for one or more invariants. This covers those functions, it covers all the functions that construct values by copying data from raw pointers, and it covers constructing a value out of its private low-level representation. So I think we should document the rule based on invariants, and then explain that this generally includes constructing values from raw pointers and low-level representations.

Contributor

kballard commented Sep 15, 2014

This description of raw doesn't really match what we're doing today. The 4 modules std::slice::raw, std::str::raw, std::string::raw, and std::vec::raw all operate on raw pointers. More specifically, they construct the given type from raw pointers. The only exception are two functions in std::str::raw that derive a &str from another &str without doing utf-8 checks.

Arguably, "raw pointers" could be considered "low-level representation", but I'm not comfortable making that assumption, because only some of the functions actually treat the raw pointer as the low-level representation. Other functions, such as std::string::raw::from_buf_len(), copy data out of the raw pointer instead of using it directly.

There's also a couple of functions that don't even use raw pointers or low-level representations. I'm thinking here of std::str::raw::slice_bytes() and std::str::raw::slice_unchecked(). They both construct a &str based on another &str without checking if the resulting slice maintains the utf-8 invariant (the _unchecked variant also skips the bounds test).

Given these last two functions, I think the more general description of raw is functions that construct a value of the type without checking for one or more invariants. This covers those functions, it covers all the functions that construct values by copying data from raw pointers, and it covers constructing a value out of its private low-level representation. So I think we should document the rule based on invariants, and then explain that this generally includes constructing values from raw pointers and low-level representations.

+* Use `raw` submodules to group together *all* manipulation of low-level
+ representations. No module in `std` currently does this; existing modules
+ provide some free functions in `raw`, and some unsafe methods, without a clear
+ driving principle. The ergonomics of moving *everything* into free functions

This comment has been minimized.

@thestinger

thestinger Sep 15, 2014

The ergonomics of moving anything into free functions is poor. Downgrading more methods is worse, but doing it to a few is still more verbose and harder to navigate - especially due to re-exports. It also means introducing type prefixes in modules exposing more than one type.

@thestinger

thestinger Sep 15, 2014

The ergonomics of moving anything into free functions is poor. Downgrading more methods is worse, but doing it to a few is still more verbose and harder to navigate - especially due to re-exports. It also means introducing type prefixes in modules exposing more than one type.

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 15, 2014

Member

@kballard Thanks for the clarifications. I will try to incorporate them into the RFC to more accurately convey the current state of affairs.

That said, I'm curious what you think about the proposed conventions?

One thing I'm not yet happy with is the recommendation for when raw submodules remains appropriate. There are certain cases, like sync::raw, where these submodules actually define some "lower-level" types, and the intent of the current RFC text is to capture such cases. But I don't think the guideline is terribly clear, yet. If you have any suggestions there, I'd love to hear them.

Member

aturon commented Sep 15, 2014

@kballard Thanks for the clarifications. I will try to incorporate them into the RFC to more accurately convey the current state of affairs.

That said, I'm curious what you think about the proposed conventions?

One thing I'm not yet happy with is the recommendation for when raw submodules remains appropriate. There are certain cases, like sync::raw, where these submodules actually define some "lower-level" types, and the intent of the current RFC text is to capture such cases. But I don't think the guideline is terribly clear, yet. If you have any suggestions there, I'd love to hear them.

active/0000-unsafe-api-location.md
+ underlying representation of a data structure (which is otherwise private),
+ the API should use `raw` in its name. Specifically, `from_raw_parts` is the
+ typical name used for constructing a value from e.g. a pointer-based
+ representation.

This comment has been minimized.

@kballard

kballard Sep 15, 2014

Contributor

I'm a bit confused here. Vec::from_raw_parts seems to be covered under the convention for raw submodules, as it is constructing a Vec from the low-level representation (pointer, length, capacity). If it's put in the raw submodule then it shouldn't have raw in the name, because that's unnecessarily verbose (raw::from_raw_parts?),

@kballard

kballard Sep 15, 2014

Contributor

I'm a bit confused here. Vec::from_raw_parts seems to be covered under the convention for raw submodules, as it is constructing a Vec from the low-level representation (pointer, length, capacity). If it's put in the raw submodule then it shouldn't have raw in the name, because that's unnecessarily verbose (raw::from_raw_parts?),

This comment has been minimized.

@aturon

aturon Sep 15, 2014

Member

@kballard It seems that the RFC text is unclear. I meant for from_raw_parts to be a static function (according to the first bullet). The second bullet was just a clarification about the name.

I'm going to revise the RFC text and push; will ping when done.

@aturon

aturon Sep 15, 2014

Member

@kballard It seems that the RFC text is unclear. I meant for from_raw_parts to be a static function (according to the first bullet). The second bullet was just a clarification about the name.

I'm going to revise the RFC text and push; will ping when done.

active/0000-unsafe-api-location.md
+ them `unsafe`), and given that rustdoc could easily provide API grouping, it's
+ unclear exactly what the benefit is.
+
+* Use `raw` submodules to group together *all* manipulation of low-level

This comment has been minimized.

@thestinger

thestinger Sep 15, 2014

The raw modules that exist in the standard library were mostly recent additions without consensus based on the same push as this RFC. The previous consensus was to turn everything into methods, and only a few old raw:: functions remained because no one had taken the time to port away from them yet.

@thestinger

thestinger Sep 15, 2014

The raw modules that exist in the standard library were mostly recent additions without consensus based on the same push as this RFC. The previous consensus was to turn everything into methods, and only a few old raw:: functions remained because no one had taken the time to port away from them yet.

This comment has been minimized.

@kballard

kballard Sep 15, 2014

Contributor

The previous consensus was to turn everything into methods

You keep stating that, but where are you getting that from? As far as I'm aware there was no consensus at all, because there had been no real discussion about this.

@kballard

kballard Sep 15, 2014

Contributor

The previous consensus was to turn everything into methods

You keep stating that, but where are you getting that from? As far as I'm aware there was no consensus at all, because there had been no real discussion about this.

This comment has been minimized.

@thestinger

thestinger Sep 15, 2014

There was plenty of real discussion about methods. You would have to rewrite history to ignore the work myself, @cmr, @huonw and @Kimundi spent implementing the consensus from the mailing list. There were also many issues about this on the bug tracker like rust-lang/rust#6045 and rust-lang/rust#2868 but most of the discussion occurred on rust-dev.

@thestinger

thestinger Sep 15, 2014

There was plenty of real discussion about methods. You would have to rewrite history to ignore the work myself, @cmr, @huonw and @Kimundi spent implementing the consensus from the mailing list. There were also many issues about this on the bug tracker like rust-lang/rust#6045 and rust-lang/rust#2868 but most of the discussion occurred on rust-dev.

This comment has been minimized.

@kballard

kballard Sep 15, 2014

Contributor

@thestinger Why are you being so antagonistic? I'm not trying to "rewrite history". I've asked repeatedly for you to provide some source for your claim that there has been general consensus, and this is the first time you've actually provided any source whatsoever, and it doesn't even include the alleged rust-dev conversation.

rust-lang/rust#6045 is a very general "don't use functions where methods are appropriate" issue. It's completely non-controversial and doesn't say anything in particular about the raw case. rust-lang/rust#2868 is even less relevant, that's just "don't have functions that duplicate methods".

As for the rust-dev conversation, you still haven't actually linked it. If you want to use a rust-dev conversation to support your claim, you need to actually link it so we can read it.

Also, FWIW, 4 people does not a "general consensus" make.

@kballard

kballard Sep 15, 2014

Contributor

@thestinger Why are you being so antagonistic? I'm not trying to "rewrite history". I've asked repeatedly for you to provide some source for your claim that there has been general consensus, and this is the first time you've actually provided any source whatsoever, and it doesn't even include the alleged rust-dev conversation.

rust-lang/rust#6045 is a very general "don't use functions where methods are appropriate" issue. It's completely non-controversial and doesn't say anything in particular about the raw case. rust-lang/rust#2868 is even less relevant, that's just "don't have functions that duplicate methods".

As for the rust-dev conversation, you still haven't actually linked it. If you want to use a rust-dev conversation to support your claim, you need to actually link it so we can read it.

Also, FWIW, 4 people does not a "general consensus" make.

This comment has been minimized.

@thestinger

thestinger Sep 15, 2014

You only have to look in the git history to see that we migrated away from raw modules as part of replacing free functions with methods. It wasn't treated as a special case because no one thought unsafe was inadequate. I was only the listing some of the people involved in implementing the consensus, not everyone involved in the discussion.

If you don't believe what I'm saying, then I have no interest in continuing to talk. I'm not going to waste time digging up all of the mailing list threads about functions vs. methods and UFCS to satisfy you so you can move on to nitpicking or misrepresenting something else I said. You can confirm it yourself with a search engine.

@thestinger

thestinger Sep 15, 2014

You only have to look in the git history to see that we migrated away from raw modules as part of replacing free functions with methods. It wasn't treated as a special case because no one thought unsafe was inadequate. I was only the listing some of the people involved in implementing the consensus, not everyone involved in the discussion.

If you don't believe what I'm saying, then I have no interest in continuing to talk. I'm not going to waste time digging up all of the mailing list threads about functions vs. methods and UFCS to satisfy you so you can move on to nitpicking or misrepresenting something else I said. You can confirm it yourself with a search engine.

active/0000-unsafe-api-location.md
+usual one: you can gain easy access to these APIs merely by having a value of
+the type (resp. importing the type).
+
+The perspective here is that marking APIs `unsafe` is enough to deter their use

This comment has been minimized.

@aturon

aturon Sep 15, 2014

Member

@thestinger Perhaps the RFC text is unclear, but this is not what I'm proposing: this RFC proposes that from_raw_parts should be a static function on e.g. String.

@aturon

aturon Sep 15, 2014

Member

@thestinger Perhaps the RFC text is unclear, but this is not what I'm proposing: this RFC proposes that from_raw_parts should be a static function on e.g. String.

@kballard

This comment has been minimized.

Show comment
Hide comment
@kballard

kballard Sep 15, 2014

Contributor

@aturon

One thing I'm not yet happy with is the recommendation for when raw submodules remains appropriate.

Based on the raw modules I cited before, I'm currently leaning towards saying raw modules are for functions that construct a value in a manner that may break the invariants normally maintained by the type (as opposed to breaking invariants on parameters to the function, which is covered by the _unchecked suffix). The easiest way to do this is of course to construct a value directly out of its low-level representation. But another way to do this is e.g. to copy data out of a raw pointer without checking the format of the data, which is what the std::string::raw functions generally do.

With this rule, the usage because pretty simple. Do you want to construct a value and bypass invariant checks? Look in the raw module. Otherwise, if you're maintaining invariants, it's a static function on the type, and if you're manipulating an existing value, it's methods on the type.

Contributor

kballard commented Sep 15, 2014

@aturon

One thing I'm not yet happy with is the recommendation for when raw submodules remains appropriate.

Based on the raw modules I cited before, I'm currently leaning towards saying raw modules are for functions that construct a value in a manner that may break the invariants normally maintained by the type (as opposed to breaking invariants on parameters to the function, which is covered by the _unchecked suffix). The easiest way to do this is of course to construct a value directly out of its low-level representation. But another way to do this is e.g. to copy data out of a raw pointer without checking the format of the data, which is what the std::string::raw functions generally do.

With this rule, the usage because pretty simple. Do you want to construct a value and bypass invariant checks? Look in the raw module. Otherwise, if you're maintaining invariants, it's a static function on the type, and if you're manipulating an existing value, it's methods on the type.

+
+# Unresolved questions
+
+The `core::raw` module provides structs with public representations equivalent

This comment has been minimized.

@thestinger

thestinger Sep 15, 2014

Box, Closure and Procedure will end up being removed since they're becoming obsolete. That will only leave behind Slice (could just be in the slice module) and TraitObject so it probably does need to be replaced.

@thestinger

thestinger Sep 15, 2014

Box, Closure and Procedure will end up being removed since they're becoming obsolete. That will only leave behind Slice (could just be in the slice module) and TraitObject so it probably does need to be replaced.

@kballard

This comment has been minimized.

Show comment
Hide comment
@kballard

kballard Sep 15, 2014

Contributor

@aturon I'd also like to see this RFC recommend that unsafe methods should all be placed into the same impl block (or blocks, if some methods require different bounds), and that these impl blocks should be placed at the end of the file. This will organize them better in the rustdoc output without requiring any rustdoc changes at all, as rustdoc maintains the impl grouping and order. We can still extend rustdoc later to have a visibility toggle for unsafe methods, but I think it still makes sense to put the unsafe methods at the end of the documentation.

Contributor

kballard commented Sep 15, 2014

@aturon I'd also like to see this RFC recommend that unsafe methods should all be placed into the same impl block (or blocks, if some methods require different bounds), and that these impl blocks should be placed at the end of the file. This will organize them better in the rustdoc output without requiring any rustdoc changes at all, as rustdoc maintains the impl grouping and order. We can still extend rustdoc later to have a visibility toggle for unsafe methods, but I think it still makes sense to put the unsafe methods at the end of the documentation.

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 15, 2014

Member

@kballard

I'd also like to see this RFC recommend that unsafe methods should all be placed into the same impl block (or blocks, if some methods require different bounds), and that these impl blocks should be placed at the end of the file. This will organize them better in the rustdoc output without requiring any rustdoc changes at all, as rustdoc maintains the impl grouping and order. We can still extend rustdoc later to have a visibility toggle for unsafe methods, but I think it still makes sense to put the unsafe methods at the end of the documentation.

That's a great idea!

Member

aturon commented Sep 15, 2014

@kballard

I'd also like to see this RFC recommend that unsafe methods should all be placed into the same impl block (or blocks, if some methods require different bounds), and that these impl blocks should be placed at the end of the file. This will organize them better in the rustdoc output without requiring any rustdoc changes at all, as rustdoc maintains the impl grouping and order. We can still extend rustdoc later to have a visibility toggle for unsafe methods, but I think it still makes sense to put the unsafe methods at the end of the documentation.

That's a great idea!

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 15, 2014

Contributor

@kballard I like the suggestion of making unsafe stuff internally separate and at the end, as this matches the current convention of seperating private/public methods into seperate impls as well. I'd rather just see an additional subheading or maybe a red coloured "unsafe operations" box, than any kind of hide/show unsafe functionality though.

Contributor

Gankro commented Sep 15, 2014

@kballard I like the suggestion of making unsafe stuff internally separate and at the end, as this matches the current convention of seperating private/public methods into seperate impls as well. I'd rather just see an additional subheading or maybe a red coloured "unsafe operations" box, than any kind of hide/show unsafe functionality though.

@thestinger

This comment has been minimized.

Show comment
Hide comment
@thestinger

thestinger Sep 15, 2014

@kballard: Your definition of raw modules means that functions like set_len and as_ptr would need to become methods. There are other ways to break the invariants of the type with unsafe code anyway. The methods are there for convenience, and making them more verbose / inconvenient to use just means more manual usage of functions like transmute_copy.

@kballard: Your definition of raw modules means that functions like set_len and as_ptr would need to become methods. There are other ways to break the invariants of the type with unsafe code anyway. The methods are there for convenience, and making them more verbose / inconvenient to use just means more manual usage of functions like transmute_copy.

@kballard

This comment has been minimized.

Show comment
Hide comment
@kballard

kballard Sep 15, 2014

Contributor

@thestinger I assume you mean methods would need to become functions. And no they wouldn't. Neither of those methods are value constructors. I even explicitly stated that methods that modify existing values (such as set_len) would stay as methods.

Contributor

kballard commented Sep 15, 2014

@thestinger I assume you mean methods would need to become functions. And no they wouldn't. Neither of those methods are value constructors. I even explicitly stated that methods that modify existing values (such as set_len) would stay as methods.

@thestinger

This comment has been minimized.

Show comment
Hide comment
@thestinger

thestinger Sep 15, 2014

@kballard: There's no functional difference between the from_raw_parts convenience method and assigning the pointer / length / capacity to a vector constructed with new. Both are valid ways of constructing a vector from raw parts, and I don't see a distinction between them that merits making one way more verbose and inconsistent with the rest of the API. That's why what you described also applies to methods like set_len - those methods expose a superset of the functionality, with the same internal details leaked and the same memory safety issues.

@kballard: There's no functional difference between the from_raw_parts convenience method and assigning the pointer / length / capacity to a vector constructed with new. Both are valid ways of constructing a vector from raw parts, and I don't see a distinction between them that merits making one way more verbose and inconsistent with the rest of the API. That's why what you described also applies to methods like set_len - those methods expose a superset of the functionality, with the same internal details leaked and the same memory safety issues.

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 15, 2014

Member

@kballard @thestinger I've pushed a significant revision, which I hope clarifies what's being proposed here.

@kballard Specifically regarding naming conventions, I don't think that requiring a raw prefix or suffix everywhere we currently use a raw submodule makes sense; I'd prefer to reserve "raw" for a "lower-level" representation. I tried to give some general naming guidelines, nevertheless, but I think we'll also just have to use best case-by-case judgment during stabilization to finalize these names.

Member

aturon commented Sep 15, 2014

@kballard @thestinger I've pushed a significant revision, which I hope clarifies what's being proposed here.

@kballard Specifically regarding naming conventions, I don't think that requiring a raw prefix or suffix everywhere we currently use a raw submodule makes sense; I'd prefer to reserve "raw" for a "lower-level" representation. I tried to give some general naming guidelines, nevertheless, but I think we'll also just have to use best case-by-case judgment during stabilization to finalize these names.

@jfager

This comment has been minimized.

Show comment
Hide comment
@jfager

jfager Sep 15, 2014

Just spitballing: if the convention is going to be that unsafe methods should be in their own impl block, would there be value in creating a new unsafe impl that only allows unsafe methods and forbidding unsafe methods in a regular impl? That is, have the language actually enforce the convention?

jfager commented Sep 15, 2014

Just spitballing: if the convention is going to be that unsafe methods should be in their own impl block, would there be value in creating a new unsafe impl that only allows unsafe methods and forbidding unsafe methods in a regular impl? That is, have the language actually enforce the convention?

@kballard

This comment has been minimized.

Show comment
Hide comment
@kballard

kballard Sep 15, 2014

Contributor

@jfager What benefit would that provide? Language rules != conventions.

Contributor

kballard commented Sep 15, 2014

@jfager What benefit would that provide? Language rules != conventions.

+
+# Alternatives
+
+There are a few alternatives:

This comment has been minimized.

@kballard

kballard Sep 15, 2014

Contributor

One unlisted alternative is the design I gave in another issue (I forget where) that uses associated items (RFC PR #195) to provide e.g. String::raw::from_parts(). This solves the ergonomics of needing to import something to get access to the values. This would look like

impl String {
    /// Raw operations
    static raw: StringRaw = StringRaw;
}

struct StringRaw;

impl StringRaw {
    unsafe fn from_parts(buf: *mut u8, length: uint, capacity: uint) -> String;
    // ...
}

Personally, I think the ergonomics of this are nice. It's basically providing a tiny module structure for the static methods of a type, similar to how we use a module structure in general for types/functions. The biggest downside from a usability perspective is that it's not immediately obvious in rustdoc how this is supposed to be used.

@kballard

kballard Sep 15, 2014

Contributor

One unlisted alternative is the design I gave in another issue (I forget where) that uses associated items (RFC PR #195) to provide e.g. String::raw::from_parts(). This solves the ergonomics of needing to import something to get access to the values. This would look like

impl String {
    /// Raw operations
    static raw: StringRaw = StringRaw;
}

struct StringRaw;

impl StringRaw {
    unsafe fn from_parts(buf: *mut u8, length: uint, capacity: uint) -> String;
    // ...
}

Personally, I think the ergonomics of this are nice. It's basically providing a tiny module structure for the static methods of a type, similar to how we use a module structure in general for types/functions. The biggest downside from a usability perspective is that it's not immediately obvious in rustdoc how this is supposed to be used.

This comment has been minimized.

@aturon

aturon Sep 15, 2014

Member

@kballard I'm sorry I left this out -- I'll add a reference to it from the RFC text.

@aturon

aturon Sep 15, 2014

Member

@kballard I'm sorry I left this out -- I'll add a reference to it from the RFC text.

@jfager

This comment has been minimized.

Show comment
Hide comment
@jfager

jfager Sep 16, 2014

@kballard Of course convention != language rule; you weren't able to infer that the question was whether the convention should be promoted to a rule? If not a full-blown rule, then perhaps a lint detecting safe and unsafe methods in the same impl block.

jfager commented Sep 16, 2014

@kballard Of course convention != language rule; you weren't able to infer that the question was whether the convention should be promoted to a rule? If not a full-blown rule, then perhaps a lint detecting safe and unsafe methods in the same impl block.

@kballard

This comment has been minimized.

Show comment
Hide comment
@kballard

kballard Sep 16, 2014

Contributor

@jfager Of course I was, you don't need to be condescending. I'm telling you that there's no need to put hard rules into the language in order to enforce a convention. I don't believe this even qualifies for a stylistic lint. This is a recommendation that improves documentation, not a universal style. More importantly, while our standard libraries are going under heavy API review, most third-party code won't have the same level of API review and won't benefit from having compiler-enforced rules around the structure of their impl blocks.

Contributor

kballard commented Sep 16, 2014

@jfager Of course I was, you don't need to be condescending. I'm telling you that there's no need to put hard rules into the language in order to enforce a convention. I don't believe this even qualifies for a stylistic lint. This is a recommendation that improves documentation, not a universal style. More importantly, while our standard libraries are going under heavy API review, most third-party code won't have the same level of API review and won't benefit from having compiler-enforced rules around the structure of their impl blocks.

@jfager

This comment has been minimized.

Show comment
Hide comment
@jfager

jfager Sep 16, 2014

@kballard Hey, you're the one who felt the need to inform me that a language rule and a convention aren't the same thing. Anyways, that's a much better answer to the original question, thank you. It wasn't clear that this was intended primarily for the standard lib and not as a declaration of Idiomatic Rust Code that people should feel bad for ignoring in their own code.

jfager commented Sep 16, 2014

@kballard Hey, you're the one who felt the need to inform me that a language rule and a convention aren't the same thing. Anyways, that's a much better answer to the original question, thank you. It wasn't clear that this was intended primarily for the standard lib and not as a declaration of Idiomatic Rust Code that people should feel bad for ignoring in their own code.

@huonw

This comment has been minimized.

Show comment
Hide comment
@huonw

huonw Sep 16, 2014

Member

Can we stop focusing on documentation? We can equally improve rustdoc to improve how the documentation of methods are laid out to get those benefits without raw modules. :)

Member

huonw commented Sep 16, 2014

Can we stop focusing on documentation? We can equally improve rustdoc to improve how the documentation of methods are laid out to get those benefits without raw modules. :)

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Sep 16, 2014

Contributor

@aturon I agree with these conventions in principle, though I am very curious about how this specifically affects the various raw modules currently in existence, e.g. I would expect this to make them less prevalent.

Contributor

brson commented Sep 16, 2014

@aturon I agree with these conventions in principle, though I am very curious about how this specifically affects the various raw modules currently in existence, e.g. I would expect this to make them less prevalent.

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 16, 2014

Member

@brson

I agree with these conventions in principle, though I am very curious about how this specifically affects the various raw modules currently in existence, e.g. I would expect this to make them less prevalent.

That's right. Concretely, I think we'd be left with only sync::raw and core::raw. The key justification for those modules is that they define raw representation types.

Member

aturon commented Sep 16, 2014

@brson

I agree with these conventions in principle, though I am very curious about how this specifically affects the various raw modules currently in existence, e.g. I would expect this to make them less prevalent.

That's right. Concretely, I think we'd be left with only sync::raw and core::raw. The key justification for those modules is that they define raw representation types.

@Ericson2314

This comment has been minimized.

Show comment
Hide comment
@Ericson2314

Ericson2314 Sep 17, 2014

Contributor

I was thinking that a unsafe impl could just be shorthand for making everything within unsafe, and a normal impl would work the same as today. Not sure if that adds enough convenience to be worth it, but just thought I'd throw it out there.

Contributor

Ericson2314 commented Sep 17, 2014

I was thinking that a unsafe impl could just be shorthand for making everything within unsafe, and a normal impl would work the same as today. Not sure if that adds enough convenience to be worth it, but just thought I'd throw it out there.

@Kimundi

This comment has been minimized.

Show comment
Hide comment
@Kimundi

Kimundi Sep 20, 2014

Member

+1 For this proposal, unsafe methods/functions can already be grouped together by being unsafe and naming conventions, no point in making them more annoying to use with free functions in a raw module.

Member

Kimundi commented Sep 20, 2014

+1 For this proposal, unsafe methods/functions can already be grouped together by being unsafe and naming conventions, no point in making them more annoying to use with free functions in a raw module.

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Sep 22, 2014

collections: Stabilize String
# Rationale

When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name.  There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).

The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:

> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.

# Method stabilization

The following methods have been marked #[stable]

* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice

The following methods have been marked #[unstable]

* String::from_utf8 - The error type in the returned `Result` may change to
                      provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
                            stabilization
* String::from_utf16 - The return type may change to become a `Result` which
                       includes more contextual information like where the error
                       occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
                       as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
                      been marked #[unstable] becuase it can be seen as a
                      duplicate of iterator-based functionality as well as
                      possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
                     less efficient because of decoding/encoding. Due to the
                     desire to minimize API surface this may be able to be
                     removed in the future for something possibly generic with
                     no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
                 become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
                     indices isn't totally clear at this time, so the failure
                     semantics and return value of this method are subject to
                     change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]

[2]: rust-lang/rfcs#240

The following method have been marked #[experimental]

* String::from_str - This function only exists as it's more efficient than
                     to_string(), but having a less ergonomic function for
                     performance reasons isn't the greatest reason to keep it
                     around. Like Vec::push_all, this has been marked
                     experimental for now.

The following methods have been #[deprecated]

* String::append - This method has been deprecated to remain consistent with the
                   deprecation of Vec::append. While convenient, it is one of
                   the only functional-style apis on String, and requires more
                   though as to whether it belongs as a first-class method or
                   now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
                      str::from_utf8 plus an assert plus a call to to_string().
                      Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
                          above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
                       type which allow bypassing utf-8 checks. These have all
                       been deprecated in favor of calling `.as_mut_vec()` and
                       then operating directly on the vector returned. These
                       methods were deprecated because naming them with relation
                       to other methods was difficult to rationalize and it's
                       arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes

# Reservation methods

This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]

[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth

bors added a commit to rust-lang/rust that referenced this pull request Sep 23, 2014

auto merge of #17438 : alexcrichton/rust/string-stable, r=aturon
# Rationale

When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name.  There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).

The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:

> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.

# Method stabilization

The following methods have been marked #[stable]

* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice

The following methods have been marked #[unstable]

* String::from_utf8 - The error type in the returned `Result` may change to
                      provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
                            stabilization
* String::from_utf16 - The return type may change to become a `Result` which
                       includes more contextual information like where the error
                       occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
                       as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
                      been marked #[unstable] becuase it can be seen as a
                      duplicate of iterator-based functionality as well as
                      possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
                     less efficient because of decoding/encoding. Due to the
                     desire to minimize API surface this may be able to be
                     removed in the future for something possibly generic with
                     no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
                 become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
                     indices isn't totally clear at this time, so the failure
                     semantics and return value of this method are subject to
                     change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]

[2]: rust-lang/rfcs#240

The following method have been marked #[experimental]

* String::from_str - This function only exists as it's more efficient than
                     to_string(), but having a less ergonomic function for
                     performance reasons isn't the greatest reason to keep it
                     around. Like Vec::push_all, this has been marked
                     experimental for now.

The following methods have been #[deprecated]

* String::append - This method has been deprecated to remain consistent with the
                   deprecation of Vec::append. While convenient, it is one of
                   the only functional-style apis on String, and requires more
                   though as to whether it belongs as a first-class method or
                   now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
                      str::from_utf8 plus an assert plus a call to to_string().
                      Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
                          above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
                       type which allow bypassing utf-8 checks. These have all
                       been deprecated in favor of calling `.as_mut_vec()` and
                       then operating directly on the vector returned. These
                       methods were deprecated because naming them with relation
                       to other methods was difficult to rationalize and it's
                       arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes

# Reservation methods

This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]

[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth

bors added a commit to rust-lang/rust that referenced this pull request Sep 23, 2014

auto merge of #17438 : alexcrichton/rust/string-stable, r=aturon
# Rationale

When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name.  There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).

The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:

> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.

# Method stabilization

The following methods have been marked #[stable]

* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice

The following methods have been marked #[unstable]

* String::from_utf8 - The error type in the returned `Result` may change to
                      provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
                            stabilization
* String::from_utf16 - The return type may change to become a `Result` which
                       includes more contextual information like where the error
                       occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
                       as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
                      been marked #[unstable] becuase it can be seen as a
                      duplicate of iterator-based functionality as well as
                      possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
                     less efficient because of decoding/encoding. Due to the
                     desire to minimize API surface this may be able to be
                     removed in the future for something possibly generic with
                     no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
                 become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
                     indices isn't totally clear at this time, so the failure
                     semantics and return value of this method are subject to
                     change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]

[2]: rust-lang/rfcs#240

The following method have been marked #[experimental]

* String::from_str - This function only exists as it's more efficient than
                     to_string(), but having a less ergonomic function for
                     performance reasons isn't the greatest reason to keep it
                     around. Like Vec::push_all, this has been marked
                     experimental for now.

The following methods have been #[deprecated]

* String::append - This method has been deprecated to remain consistent with the
                   deprecation of Vec::append. While convenient, it is one of
                   the only functional-style apis on String, and requires more
                   though as to whether it belongs as a first-class method or
                   now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
                      str::from_utf8 plus an assert plus a call to to_string().
                      Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
                          above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
                       type which allow bypassing utf-8 checks. These have all
                       been deprecated in favor of calling `.as_mut_vec()` and
                       then operating directly on the vector returned. These
                       methods were deprecated because naming them with relation
                       to other methods was difficult to rationalize and it's
                       arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes

# Reservation methods

This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]

[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth

bors added a commit to rust-lang/rust that referenced this pull request Sep 24, 2014

auto merge of #17438 : alexcrichton/rust/string-stable, r=aturon
# Rationale

When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name.  There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).

The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:

> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.

# Method stabilization

The following methods have been marked #[stable]

* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice

The following methods have been marked #[unstable]

* String::from_utf8 - The error type in the returned `Result` may change to
                      provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
                            stabilization
* String::from_utf16 - The return type may change to become a `Result` which
                       includes more contextual information like where the error
                       occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
                       as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
                      been marked #[unstable] becuase it can be seen as a
                      duplicate of iterator-based functionality as well as
                      possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
                     less efficient because of decoding/encoding. Due to the
                     desire to minimize API surface this may be able to be
                     removed in the future for something possibly generic with
                     no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
                 become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
                     indices isn't totally clear at this time, so the failure
                     semantics and return value of this method are subject to
                     change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]

[2]: rust-lang/rfcs#240

The following method have been marked #[experimental]

* String::from_str - This function only exists as it's more efficient than
                     to_string(), but having a less ergonomic function for
                     performance reasons isn't the greatest reason to keep it
                     around. Like Vec::push_all, this has been marked
                     experimental for now.

The following methods have been #[deprecated]

* String::append - This method has been deprecated to remain consistent with the
                   deprecation of Vec::append. While convenient, it is one of
                   the only functional-style apis on String, and requires more
                   though as to whether it belongs as a first-class method or
                   now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
                      str::from_utf8 plus an assert plus a call to to_string().
                      Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
                          above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
                       type which allow bypassing utf-8 checks. These have all
                       been deprecated in favor of calling `.as_mut_vec()` and
                       then operating directly on the vector returned. These
                       methods were deprecated because naming them with relation
                       to other methods was difficult to rationalize and it's
                       arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes

# Reservation methods

This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]

[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth

bors added a commit to rust-lang/rust that referenced this pull request Sep 24, 2014

auto merge of #17438 : alexcrichton/rust/string-stable, r=aturon
# Rationale

When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name.  There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).

The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:

> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.

# Method stabilization

The following methods have been marked #[stable]

* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice

The following methods have been marked #[unstable]

* String::from_utf8 - The error type in the returned `Result` may change to
                      provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
                            stabilization
* String::from_utf16 - The return type may change to become a `Result` which
                       includes more contextual information like where the error
                       occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
                       as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
                      been marked #[unstable] becuase it can be seen as a
                      duplicate of iterator-based functionality as well as
                      possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
                     less efficient because of decoding/encoding. Due to the
                     desire to minimize API surface this may be able to be
                     removed in the future for something possibly generic with
                     no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
                 become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
                     indices isn't totally clear at this time, so the failure
                     semantics and return value of this method are subject to
                     change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]

[2]: rust-lang/rfcs#240

The following method have been marked #[experimental]

* String::from_str - This function only exists as it's more efficient than
                     to_string(), but having a less ergonomic function for
                     performance reasons isn't the greatest reason to keep it
                     around. Like Vec::push_all, this has been marked
                     experimental for now.

The following methods have been #[deprecated]

* String::append - This method has been deprecated to remain consistent with the
                   deprecation of Vec::append. While convenient, it is one of
                   the only functional-style apis on String, and requires more
                   though as to whether it belongs as a first-class method or
                   now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
                      str::from_utf8 plus an assert plus a call to to_string().
                      Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
                          above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
                       type which allow bypassing utf-8 checks. These have all
                       been deprecated in favor of calling `.as_mut_vec()` and
                       then operating directly on the vector returned. These
                       methods were deprecated because naming them with relation
                       to other methods was difficult to rationalize and it's
                       arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes

# Reservation methods

This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]

[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth

bors added a commit to rust-lang/rust that referenced this pull request Sep 24, 2014

auto merge of #17438 : alexcrichton/rust/string-stable, r=aturon
# Rationale

When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name.  There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).

The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:

> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.

# Method stabilization

The following methods have been marked #[stable]

* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice

The following methods have been marked #[unstable]

* String::from_utf8 - The error type in the returned `Result` may change to
                      provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
                            stabilization
* String::from_utf16 - The return type may change to become a `Result` which
                       includes more contextual information like where the error
                       occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
                       as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
                      been marked #[unstable] becuase it can be seen as a
                      duplicate of iterator-based functionality as well as
                      possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
                     less efficient because of decoding/encoding. Due to the
                     desire to minimize API surface this may be able to be
                     removed in the future for something possibly generic with
                     no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
                 become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
                     indices isn't totally clear at this time, so the failure
                     semantics and return value of this method are subject to
                     change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]

[2]: rust-lang/rfcs#240

The following method have been marked #[experimental]

* String::from_str - This function only exists as it's more efficient than
                     to_string(), but having a less ergonomic function for
                     performance reasons isn't the greatest reason to keep it
                     around. Like Vec::push_all, this has been marked
                     experimental for now.

The following methods have been #[deprecated]

* String::append - This method has been deprecated to remain consistent with the
                   deprecation of Vec::append. While convenient, it is one of
                   the only functional-style apis on String, and requires more
                   though as to whether it belongs as a first-class method or
                   now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
                      str::from_utf8 plus an assert plus a call to to_string().
                      Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
                          above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
                       type which allow bypassing utf-8 checks. These have all
                       been deprecated in favor of calling `.as_mut_vec()` and
                       then operating directly on the vector returned. These
                       methods were deprecated because naming them with relation
                       to other methods was difficult to rationalize and it's
                       arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes

# Reservation methods

This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]

[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth

@alexcrichton alexcrichton referenced this pull request in rust-lang/rust Oct 7, 2014

Closed

Tighten up conventions with unsafe apis #17863

@alexcrichton alexcrichton merged commit 2e00888 into rust-lang:master Oct 7, 2014

@alexcrichton

This comment has been minimized.

Show comment
Hide comment

@aturon aturon referenced this pull request Oct 9, 2014

Closed

RFC: Raw Reform #365

@chriskrycho chriskrycho referenced this pull request in rust-lang-nursery/reference Mar 29, 2017

Closed

Document all features #9

18 of 48 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment