Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added From<Vec<NonZeroU8>> for CString #64069

Conversation

@danielhenrymantilla
Copy link
Contributor

danielhenrymantilla commented Sep 1, 2019

Added a From<Vec<NonZeroU8>> impl for CString

Rationale

  • CString::from_vec_unchecked is a subtle function, that makes unsafe code harder to audit when the generated Vec's creation is non-trivial. This impl allows to write safer unsafe code thanks to the very explicit semantics of the Vec<NonZeroU8> type.

  • One such situation is when trying to .read() a CString, see issue #59229.

    • this lead to a PR: #59314, that was closed for being too specific / narrow (it only targetted being able to .read() a CString, when this pattern could have been generalized).

    • the issue suggested another route, based on From<Vec<NonZeroU8>>, which is indeed a less general and more concise code pattern.

  • quoting @Shnatsel:

    • For me the main thing about making this safe is simplifying auditing - people have spent like an hour looking at just this one unsafe block in libflate because it's not clear what exactly is unchecked, so you have to look it up when auditing anyway. This has distracted us from much more serious memory safety issues the library had.
      Having this trivial impl in stdlib would turn this into safe code with compiler more or less guaranteeing that it's fine, and save anyone auditing the code a whole lot of time.

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Sep 1, 2019

r? @withoutboats

(rust_highfive has picked a reviewer for you, use r? to override)

src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Sep 1, 2019

The job mingw-check of your PR failed (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2019-09-01T17:30:24.9862540Z ##[command]git remote add origin https://github.com/rust-lang/rust
2019-09-01T17:30:25.6788628Z ##[command]git config gc.auto 0
2019-09-01T17:30:25.6791906Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2019-09-01T17:30:25.6795190Z ##[command]git config --get-all http.proxy
2019-09-01T17:30:25.6799980Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/64069/merge:refs/remotes/pull/64069/merge
---
2019-09-01T17:36:38.0473105Z     Checking backtrace v0.3.35
2019-09-01T17:36:43.2265818Z error: impl has missing stability attribute
2019-09-01T17:36:43.2267576Z    --> src/libstd/ffi/c_str.rs:722:1
2019-09-01T17:36:43.2268155Z     |
2019-09-01T17:36:43.2268657Z 722 | / impl From<Vec<NonZeroU8>> for CString {
2019-09-01T17:36:43.2269186Z 723 | |     /// Converts a [`Vec`]`<`[`NonZeroU8`]`>` into a [`CString`] without
2019-09-01T17:36:43.2269717Z 724 | |     /// copying nor checking for inner null bytes.
2019-09-01T17:36:43.2270910Z ...   |
2019-09-01T17:36:43.2271338Z 793 | |     }
2019-09-01T17:36:43.2271800Z 794 | | }
2019-09-01T17:36:43.2272242Z     | |_^
---
2019-09-01T17:36:43.4492110Z == clock drift check ==
2019-09-01T17:36:43.4512422Z   local time: Sun Sep  1 17:36:43 UTC 2019
2019-09-01T17:36:43.6012744Z   network time: Sun, 01 Sep 2019 17:36:43 GMT
2019-09-01T17:36:43.6017002Z == end clock drift check ==
2019-09-01T17:36:50.5865241Z ##[error]Bash exited with code '1'.
2019-09-01T17:36:50.5899521Z ##[section]Starting: Checkout
2019-09-01T17:36:50.5902055Z ==============================================================================
2019-09-01T17:36:50.5902124Z Task         : Get sources
2019-09-01T17:36:50.5902167Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@scottmcm scottmcm added the needs-fcp label Sep 2, 2019
@Centril Centril added this to the 1.39 milestone Sep 3, 2019
@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented Sep 3, 2019

Perhaps there could be a method to return the nonzero bytes as a slice as well?

@Shnatsel

This comment has been minimized.

Copy link

Shnatsel commented Sep 3, 2019

Perhaps there could be a method to return the nonzero bytes as a slice as well?

Sounds reasonable to me. That should be its own PR though, as that can be feature-gated for some time while trait implementations are instantly stable.

@Shnatsel Shnatsel mentioned this pull request Sep 3, 2019
@JohnCSimon

This comment has been minimized.

Copy link
Member

JohnCSimon commented Sep 14, 2019

Ping from triage
@Centril @withoutboats This PR is still waiting on review.
CC @danielhenrymantilla

Thanks.

Copy link
Member

Centril left a comment

Mostly just some nits...

/// - for any `cap: usize`, `Layout<[T; cap]>` needs to be equal to
/// `Layout<[U; cap]>` (for the allocator)
#[inline]
unsafe

This comment has been minimized.

Copy link
@Centril

Centril Sep 14, 2019

Member

what's with the line break here?

This comment has been minimized.

Copy link
@danielhenrymantilla

danielhenrymantilla Sep 14, 2019

Author Contributor

Woops, my own Rust style leaked here (Out of topic: I see pub and unsafe, among others, as markers, hence the logic of them being on an extra line rather than on the same line, in the same vein as other #[meta] attributes)

#[inline]
unsafe
fn transmute_vec<T, U>(v: Vec<T>) -> Vec<U> {
// necessary conditions for `Layout<[T; N]> == Layout<[U; N]>`

This comment has been minimized.

Copy link
@Centril

Centril Sep 14, 2019

Member

Why not check Layout::array::<T / U>().unwrap() against each other?

/// `[T; length]` and `[U; length]`.
///
/// - for any `cap: usize`, `Layout<[T; cap]>` needs to be equal to
/// `Layout<[U; cap]>` (for the allocator)

This comment has been minimized.

Copy link
@Centril

Centril Sep 14, 2019

Member
Suggested change
/// `Layout<[U; cap]>` (for the allocator)
/// `Layout<[U; cap]>` (for the allocator).
src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
@Centril

This comment has been minimized.

Copy link
Member

Centril commented Sep 14, 2019

r? @RalfJung for checking the safety argument more in-depth. Also cc @rkruppe.

(Do ping T-libs once that review is done...)

Copy link
Member

rkruppe left a comment

Soundness argument seems fine to me, just some nits.

//
// - `v` cannot contain null bytes, given the type-level
// invariant of `NonZeroU8` (this would still apply even if
// this invariant was a safety invariant and not a validity

This comment has been minimized.

Copy link
@rkruppe

rkruppe Sep 14, 2019

Member

Nit: strike the hypothetical. I am reasonably sure you do need the safety invariant here. Most operations on vectors don't assert validity of any elements except perhaps the ones they move into or out of the vec. Of course, a Vec<T> is only safe if elements 0..len are safe at T, so the soundness argument can rest on that plus the safety/validity (same thing) of the NonZeroU8.

(Insert standard disclaimer about all of these details being not yet ratified here.)

This comment has been minimized.

Copy link
@danielhenrymantilla

danielhenrymantilla Sep 14, 2019

Author Contributor

You're right, I will remove the part between brackets then (I'm still glad I put it, it is good to clearly think about these things before merging).

Vec::<U>::from_raw_parts(ptr as *mut U, length, capacity)
}

let v = unsafe {

This comment has been minimized.

Copy link
@rkruppe

rkruppe Sep 14, 2019

Member

A type annotation won't hurt here IMO.

This comment has been minimized.

Copy link
@danielhenrymantilla

danielhenrymantilla Sep 14, 2019

Author Contributor

Agreed

crate::alloc::Layout::new::<U>(),
);
// The previous assert should imply the following one:
debug_assert_eq!(

This comment has been minimized.

Copy link
@danielhenrymantilla

danielhenrymantilla Sep 14, 2019

Author Contributor

The first check is an assert since it is expected to be easily optimized out (only const / compile-time information here).
On the other hand, the check involving Layout::array cannot be const, since it involves the runtime value of the vector capacity.
That being said, I expect Layout<T> == Layout<U> to imply that for all N: usize, Layout<[T; N]> == Layout<[U; N]>.
If this were not to be correct / guaranteed by the language, this debug assertion could be upgraded to a normal assert!; in this case however, the function is only used with the monomorphisation to T = NonZeroU8 and V = u8, which does not require such check at runtime.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Sep 16, 2019

Member

I think it would be a serious bug in Layout::array if this is not given... its comparison test would fail to take something into account, or so.

Not sure if this check is worth the clutter it introduces.

This comment has been minimized.

Copy link
@danielhenrymantilla

danielhenrymantilla Sep 16, 2019

Author Contributor

Great! I will remove the debug_assert! involving the Layout::array check

@danielhenrymantilla danielhenrymantilla force-pushed the danielhenrymantilla:feature/cstring_from_vec_of_nonzerou8 branch from af2b287 to 755fe39 Sep 14, 2019
danielhenrymantilla and others added 2 commits Sep 14, 2019
Thanks to @Centril's code review

Co-Authored-By: Mazdak Farrokhzad <twingoow@gmail.com>
@danielhenrymantilla danielhenrymantilla force-pushed the danielhenrymantilla:feature/cstring_from_vec_of_nonzerou8 branch from 755fe39 to b32aec7 Sep 14, 2019
@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Sep 17, 2019

I’m not disputing that the conversion is correct, but that it is useful. The code in #59229 that would use this conversion doesn’t look better than the original to me.

(To answer my own question: the way to construct Vec<NonZeroU8> in #59229 is one byte at a time with NonZeroU8::new.)

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Sep 17, 2019

The advantage of using this API over the code in #59229 is that the unsafe is isolated in libstd. So it protects against mistakes such as adding the 0 terminator to the Vec<u8> before calling CString::from_vec_unchecked (happened in #59229 (comment)) and simplifies auditing for memory safety. The latter point was already made in the PR description with this quote from @Shnatsel:

For me the main thing about making this safe is simplifying auditing - people have spent like an hour looking at just this one unsafe block in libflate because it's not clear what exactly is unchecked, so you have to look it up when auditing anyway. This has distracted us from much more serious memory safety issues the library had.
Having this trivial impl in stdlib would turn this into safe code with compiler more or less guaranteeing that it's fine, and save anyone auditing the code a whole lot of time.

I'm in no position to argue whether it is useful for other crates but the libflate snippet in question does benefits from this API addition IMO.

@Shnatsel

This comment has been minimized.

Copy link

Shnatsel commented Sep 17, 2019

The primary use case motivating this is constructing a CString from a reader. There is currently no safe way to do that without scanning the CString for null bytes twice.

A bespoke unsafe implementation works, but without an std abstraction people need to reinvent it every time, and get it right. Even if done right, presence of a bespoke unsafe block increases the cost of auditing safety of the crate.

This API addition isolates the unsafety inside std, so it can be implemented correctly once and for all.

@Centril Centril modified the milestones: 1.39, 1.40 Sep 26, 2019
@JohnTitor

This comment has been minimized.

Copy link
Member

JohnTitor commented Oct 6, 2019

Marked as waiting-on-team because of waiting T-libs approval

@Centril Centril modified the milestones: 1.40, 1.41 Nov 7, 2019
@Centril

This comment has been minimized.

Copy link
Member

Centril commented Nov 7, 2019

@rust-highfive rust-highfive assigned sfackler and unassigned rkruppe Nov 7, 2019
src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
@JohnTitor

This comment has been minimized.

Copy link
Member

JohnTitor commented Nov 17, 2019

Ping from triage: @sfackler and @rust-lang/libs could you review this PR?

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Nov 24, 2019

@rfcbot fcp merge

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Nov 24, 2019

Team member @sfackler has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@dtolnay

This comment has been minimized.

Copy link
Member

dtolnay commented Nov 24, 2019

I am on board with adding the impl, but the implementation here looks nuts to me. I am curious whether this is the way we expect unsafe code in the standard library to be written? Is this the way we recommend unsafe code outside the standard library to be written?

I would much rather see something like:

fn from(vec_nonzero: Vec<NonZeroU8>) -> CString {
    // Safety: ...

    let ptr: *mut NonZeroU8 = vec_nonzero.as_mut_ptr();
    let len = vec_nonzero.len();
    let cap = vec_nonzero.capacity();
    mem::forget(vec_nonzero);

    // or now that we have #65816:
    let (ptr, len, cap) = vec_nonzero.into_raw_parts();

    unsafe {
        let vec_u8 = Vec::from_raw_parts(ptr as *mut u8, len, cap);
        CString::from_vec_unchecked(vec_u8)
    }
}

where the code fits easily on one screen and the safety reasoning is explained all at once.

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Nov 24, 2019

Yeah, this should get cleaned up.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Nov 24, 2019

Although that landed more recently than this PR was opened, now it can use let (ptr, len, cap) = v.into_raw_parts(); #65816

@danielhenrymantilla

This comment has been minimized.

Copy link
Contributor Author

danielhenrymantilla commented Nov 24, 2019

This PR and its very verbose code is kind of what triggered the discussions about transmuting Vecs and the added Vec::into_raw_parts (and hopefully Vec::transmute or a more generalized variant).

The main thing bloating the code is thus the "soundly transmute a Vec" thing, which was factored into its own function as an example.

Now, with Vec::into_raw_parts, the implementation can be simplified, which I will happily do.

I disagree, however, into factoring both unsafe block into one: the more fine grained the unsafe blocks can be, the more resilient the code will be to further changes.

In this case, for instance, there are two unsafe things going on:

  • transmuting Vec<NonZeroU8> into a Vec<u8> (valid thanks to transmute<NonZeroU8, u8> being always sound AND thanks to Layout<NonZeroU8> == Layout<u8>),

  • and then using the unchecked constructor of CString.


As an example of how delicate this may be, @dtolnay's example without into_raw_parts could be considered to be violating Rust aliasing rules (when mem::forgetting vec_nonzero, it gives ownership of the Vec into a function, and thus asserts that the pointee is at that point unaliased, invalidating ptr).

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Nov 24, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2019-11-24T10:50:46.0052387Z ##[command]git remote add origin https://github.com/rust-lang/rust
2019-11-24T10:50:46.0174410Z ##[command]git config gc.auto 0
2019-11-24T10:50:46.0176899Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2019-11-24T10:50:46.0178931Z ##[command]git config --get-all http.proxy
2019-11-24T10:50:47.0018209Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/64069/merge:refs/remotes/pull/64069/merge
---
2019-11-24T10:57:09.9272732Z    Compiling panic_abort v0.0.0 (/checkout/src/libpanic_abort)
2019-11-24T10:57:10.1196925Z    Compiling backtrace v0.3.40
2019-11-24T10:57:11.0329609Z    Compiling rustc-std-workspace-alloc v1.99.0 (/checkout/src/tools/rustc-std-workspace-alloc)
2019-11-24T10:57:11.1416434Z    Compiling panic_unwind v0.0.0 (/checkout/src/libpanic_unwind)
2019-11-24T10:57:13.7666936Z error[E0658]: use of unstable library feature 'vec_into_raw_parts': new API
2019-11-24T10:57:13.7667635Z     |
2019-11-24T10:57:13.7667635Z     |
2019-11-24T10:57:13.7668012Z 764 |             let (ptr, len, cap): (*mut NonZeroU8, _, _) = Vec::into_raw_parts(v);
2019-11-24T10:57:13.7668689Z     |
2019-11-24T10:57:13.7668689Z     |
2019-11-24T10:57:13.7669149Z     = note: for more information, see ***/issues/65816
2019-11-24T10:57:13.7669541Z     = help: add `#![feature(vec_into_raw_parts)]` to the crate attributes to enable
2019-11-24T10:57:14.8472199Z error: aborting due to previous error
2019-11-24T10:57:14.8472317Z 
2019-11-24T10:57:14.8472712Z For more information about this error, try `rustc --explain E0658`.
2019-11-24T10:57:14.8852457Z error: could not compile `std`.
---
2019-11-24T10:57:14.8943158Z   local time: Sun Nov 24 10:57:14 UTC 2019
2019-11-24T10:57:15.0395919Z   network time: Sun, 24 Nov 2019 10:57:15 GMT
2019-11-24T10:57:15.0397650Z == end clock drift check ==
2019-11-24T10:57:17.9402685Z 
2019-11-24T10:57:17.9505391Z ##[error]Bash exited with code '1'.
2019-11-24T10:57:17.9534314Z ##[section]Starting: Checkout
2019-11-24T10:57:17.9536204Z ==============================================================================
2019-11-24T10:57:17.9536270Z Task         : Get sources
2019-11-24T10:57:17.9536324Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Nov 24, 2019

there are two unsafe things going on

That’s a good reason to write an explanation comment in in two parts. If feel it’s not useful to write two consecutive unsafe {…} blocks.

danielhenrymantilla and others added 2 commits Nov 7, 2019
Co-Authored-By: Mazdak Farrokhzad <twingoow@gmail.com>
Use `Vec::into_raw_parts` instead of a manual implementation of
`Vec::transmute`.

If `Vec::into_raw_parts` uses `NonNull` instead, then the code here
will need to be adjusted to take it into account (issue #65816)
@danielhenrymantilla danielhenrymantilla force-pushed the danielhenrymantilla:feature/cstring_from_vec_of_nonzerou8 branch from 7a373cf to c7ac2e7 Nov 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.