Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved docs for CStr, CString, OsStr, OsString #44855

Merged
merged 15 commits into from Oct 13, 2017

Conversation

Projects
None yet
10 participants
@federicomenaquintero
Copy link
Contributor

federicomenaquintero commented Sep 26, 2017

This expands the documentation for those structs and their corresponding traits, per #29354

federicomenaquintero added some commits Sep 22, 2017

Expand the introduction to the ffi module.
We describe the representation of C strings, and the purpose of
OsString/OsStr.

Part of #29354
Overhaul the ffi::CString docs
Explain the struct's reason for being, and its most common usage
patterns.  Add a bunch of links.

Clarify the method docs a bit.

Part of #29354
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Sep 26, 2017

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@cuviper

This comment has been minimized.

Copy link
Member

cuviper commented Sep 26, 2017

CStr and CString are not necessarily UTF-8 at all! If they were, then CStr::to_str() and CString::into_string() would be infallible conversion, not needing a Result.

@federicomenaquintero

This comment has been minimized.

Copy link
Contributor Author

federicomenaquintero commented Sep 26, 2017

Oops, long lines... will fix.

I'll also clarify that CStr/CString are bags of zero-terminated bytes, and UTF-8 only happens when making a string out of them.

/// This type serves the primary purpose of being able to safely generate a
/// C-compatible string from a Rust byte slice or vector. An instance of this
/// This type serves the purpose of being able to safely generate a
/// C-compatible UTF-8 string from a Rust byte slice or vector. An instance of this

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

This isn't guaranteed by CStr.

@@ -8,7 +8,145 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.

//! Utilities related to FFI bindings.
//! This module provides utilities to handle C-like strings. It is

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

I think that this is a bit misleading because OsString isn't a C string on Windows.

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

A better way to describe it might be "to handle data across non-Rust interfaces, like other programming languages and the underlying operating system"

//! borrowed slices of strings with the [`str`] primitive. Both are
//! always in UTF-8 encoding, and may contain nul bytes in the middle,
//! i.e. if you look at the bytes that make up the string, there may
//! be a `0` among them. Both `String` and `str` know their length;

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

nit: the '0' here makes it look like you're referring to a zero digit, not a literal zero. Perhaps use '\0' instead?

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

another nit: I'd word "know their length" as "store their length explicitly" because technically we "know" the length of a C-string, but it's not computed in O(1) time.

//!
//! C strings are different from Rust strings:
//!
//! * **Encodings** - C strings may have different encodings. If

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

I think that "encoding" here is a bit inaccessible to people who are unfamiliar with how string encoding works. I'd say introduce it with "Rust strings are UTF-8, but C strings may use other encodings. If you're using a string from C, you may have to check its encoding explicitly, rather than just assuming that it's UTF-8 like you can in Rust."

//! you are bringing in strings from C APIs, you should check what
//! encoding you are getting. Rust strings are always UTF-8.
//!
//! * **Character width** - C strings may use "normal" or "wide"

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

"Width" here may be what C uses, but it's again misleading because Unicode has its own specific definition of width. I'd say "size" instead. Instead of using "normal" and "wide," I'd just say directly that C uses two types, char (clarifying that this is different from Rust's type) and wchar_t, which are different sizes.

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

You can also clarify that wchar_t is referred to by "wide character" but that this doesn't actually reflect the Unicode width, but the size of the character in bytes.

//! '[Unicode code point]'.
//!
//! * **Nul terminators and implicit string lengths** - Often, C
//! strings are nul-terminated, i.e. they have a `0` character at the

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

Again, I'd use '\0' here instead of '0'.

//!
//! * **Nul terminators and implicit string lengths** - Often, C
//! strings are nul-terminated, i.e. they have a `0` character at the
//! end. The length of a string buffer is not known *a priori*;

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

No need to use Latin; just say that it isn't stored, but has to be calculated. IMHO we should keep language simple if possible to be more accessible to non-native speakers.

//! `wcslen()` for `wchar_t`-based ones. Those functions return the
//! number of characters in the string excluding the nul terminator,
//! so the buffer length is really `len+1` characters. Rust strings
//! don't have a nul terminator, and they always know their length.

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

I'd also note in here somewhere that Rust's way of doing it means that you can easily access a string's length, whereas there's an implicit cost to it in C. This also may carry over to CStr if its implementation changes.

//! so the buffer length is really `len+1` characters. Rust strings
//! don't have a nul terminator, and they always know their length.
//!
//! * **No nul characters in the middle of the string** - When C

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

I'd word this as "Internal NULs" as a more succinct version

//! strings have a nul terminator character, this usually means that
//! they cannot have nul characters in the middle — a nul character
//! would essentially truncate the string. Rust strings *can* have
//! nul characters in the middle, since they don't use nul

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

Rather than "don't use nul terminators," it's clearer to say "because NUL doesn't have to mark the end of the string in Rust"

//! # Representations of non-Rust strings
//!
//! [`CString`] and [`CStr`] are useful when you need to transfer
//! UTF-8 strings to and from C, respectively:

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

I'd expand this to languages with a C ABI like Python, etc. People should know that a CStr might be necessary when interacting with other languages too.

//! UTF-8 strings to and from C, respectively:
//!
//! * **From Rust to C:** [`CString`] represents an owned, C-friendly
//! UTF-8 string: it is valid UTF-8, it is nul-terminated, and has no

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

Not always valid UTF-8.

//!
//! * **From C to Rust:** [`CStr`] represents a borrowed C string; it
//! is what you would use to wrap a raw `*const u8` that you got from
//! a C function. A `CStr` is just guaranteed to be a nul-terminated

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

"just" seems out of place here; I'd remove it.

//! * **From C to Rust:** [`CStr`] represents a borrowed C string; it
//! is what you would use to wrap a raw `*const u8` that you got from
//! a C function. A `CStr` is just guaranteed to be a nul-terminated
//! array of bytes; the UTF-8 validation step only happens when you

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

"the UTF-8 validation step" is only just mentioned here so I'd just make a separate sentence describing how that works instead, along the lines of "once you have a CStr, you can convert it to a Rust str if it's valid UTF-8, or lossily convert it by adding replacement characters"

//! request to convert it to a `&str`.
//!
//! [`OsString`] and [`OsStr`] are useful when you need to transfer
//! strings to and from operating system calls. If you need Rust

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

A lot of programmers may not know what system calls are; I'd probably word this as "the operating system itself."

It may also make sense to include examples where this happens, like in opening files and running external commands.

This comment has been minimized.

@clarfon

clarfon Sep 27, 2017

Contributor

I feel like the "If you need Rust strings out of them [...]" section is kind of redundant and wordy. I'd probably just say that conversions between OsStr and str work very similarly to CStr and leave it at that.

@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented Sep 27, 2017

Great work! I've interacted a lot with CStr and OsStr so I added some comments on ways that I think the docs could be made clearer. Hopefully it's more helpful than overwhelming ><

@federicomenaquintero

This comment has been minimized.

Copy link
Contributor Author

federicomenaquintero commented Oct 2, 2017

I've integrated the changes per your comments. How's it look now? :)

@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented Oct 2, 2017

Looks good to me! Again, great work! :)

@federicomenaquintero

This comment has been minimized.

Copy link
Contributor Author

federicomenaquintero commented Oct 2, 2017

Thank you!

@shepmaster

This comment has been minimized.

Copy link
Member

shepmaster commented Oct 6, 2017

Poke @aturon — this is now ready for your masterful reviewing skills!

@carols10cents

This comment has been minimized.

Copy link
Member

carols10cents commented Oct 9, 2017

Actually, @aturon wasn't available last week and is on PTO this week, so let's try....

r? @steveklabnik

@rust-highfive rust-highfive assigned steveklabnik and unassigned aturon Oct 9, 2017

@steveklabnik
Copy link
Member

steveklabnik left a comment

This is fantastic, thank you so much!

I have a few little formatting nits, but after that, let's get this merged!

@@ -149,8 +209,13 @@ pub struct CStr {
}

/// An error returned from [`CString::new`] to indicate that a nul byte was found
/// in the vector provided.
/// in the vector provided. While Rust strings may contain nul bytes in the middle,
/// C strings can't, as that byte would effectively truncate the string.

This comment has been minimized.

@steveklabnik

steveklabnik Oct 11, 2017

Member

Could we change this up a bit? We try to have a summary sentence first, then the rest of it. This one has a long summary, and repeats itself since you added the information below. How about:

/// An error indicating that an interior nul byte was found.
///
/// While Rust strings may contain nul bytes in the middle, C strings can't, as that byte would effectively
/// truncate the string.
///
/// This `struct`....

with the correct wrapping, I just guessed here. What do you think?

/// that a nul byte was found too early in the slice provided, or one
/// wasn't found at all for the nul terminator. The slice used to
/// create a `CStr` must have one and only one nul byte at the end of
/// the slice.

This comment has been minimized.

@steveklabnik

steveklabnik Oct 11, 2017

Member

Same thing here; don't repeat where it came from, make sure to have a short summary, some space, and then a longer description.

/// UTF-8 error was encountered during the conversion. `CString` is
/// just a wrapper over a buffer of bytes with a nul terminator;
/// [`into_string`][`CString::into_string`] performs UTF-8 validation
/// and may return this error.

This comment has been minimized.

@steveklabnik
/// underlying bytes to construct a new string, ensuring that
/// there is a trailing 0 byte. This trailing 0 byte will be
/// appended by this method; the provided data should *not*
/// contain any 0 bytes in it.

This comment has been minimized.

@steveklabnik

steveklabnik Oct 11, 2017

Member

this isn't a method; could you say "function" instead?

@@ -8,7 +8,156 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.

//! Utilities related to FFI bindings.
//! This module provides utilities to handle data across non-Rust

This comment has been minimized.

@steveklabnik

steveklabnik Oct 11, 2017

Member

I'd keep this short summary, but with a newline between it, so you get the summary. That is:

///! Utilities related to FFI bindings.
//!
//! This module provides utilities....
//! C strings are different from Rust strings:
//!
//! * **Encodings** - Rust strings are UTF-8, but C strings may use
//! other encodings. If you are using a string from C, you should

This comment has been minimized.

@steveklabnik

steveklabnik Oct 11, 2017

Member

one space after a period, not two please!

//! characters; please **note** that C's `char` is different from Rust's.
//! The C standard leaves the actual sizes of those types open to
//! interpretation, but defines different APIs for strings made up of
//! each character type. Rust strings are always UTF-8, so different

This comment has been minimized.

@steveklabnik

steveklabnik Oct 11, 2017

Member

and here, and everywhere 😄

@steveklabnik

This comment has been minimized.

Copy link
Member

steveklabnik commented Oct 12, 2017

Thanks! @bors: r+ rollup

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Oct 12, 2017

📌 Commit 5fb8e3d has been approved by steveklabnik

kennytm added a commit to kennytm/rust that referenced this pull request Oct 13, 2017

Rollup merge of rust-lang#44855 - federicomenaquintero:master, r=stev…
…eklabnik

Improved docs for CStr, CString, OsStr, OsString

This expands the documentation for those structs and their corresponding traits, per rust-lang#29354

bors added a commit that referenced this pull request Oct 13, 2017

Auto merge of #45261 - kennytm:rollup, r=kennytm
Rollup of 14 pull requests

- Successful merges: #44855, #45110, #45122, #45133, #45173, #45178, #45189, #45203, #45209, #45221, #45236, #45240, #45245, #45253
- Failed merges:

@bors bors merged commit 5fb8e3d into rust-lang:master Oct 13, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

GuillaumeGomez added a commit to GuillaumeGomez/this-week-in-rust-docs that referenced this pull request Oct 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.