Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added From<Vec<NonZeroU8>> for CString #64069

Conversation

@danielhenrymantilla
Copy link
Contributor

commented Sep 1, 2019

Added a From<Vec<NonZeroU8>> impl for CString

Rationale

  • CString::from_vec_unchecked is a subtle function, that makes unsafe code harder to audit when the generated Vec's creation is non-trivial. This impl allows to write safer unsafe code thanks to the very explicit semantics of the Vec<NonZeroU8> type.

  • One such situation is when trying to .read() a CString, see issue #59229.

    • this lead to a PR: #59314, that was closed for being too specific / narrow (it only targetted being able to .read() a CString, when this pattern could have been generalized).

    • the issue suggested another route, based on From<Vec<NonZeroU8>>, which is indeed a less general and more concise code pattern.

  • quoting @Shnatsel:

    • For me the main thing about making this safe is simplifying auditing - people have spent like an hour looking at just this one unsafe block in libflate because it's not clear what exactly is unchecked, so you have to look it up when auditing anyway. This has distracted us from much more serious memory safety issues the library had.
      Having this trivial impl in stdlib would turn this into safe code with compiler more or less guaranteeing that it's fine, and save anyone auditing the code a whole lot of time.

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Sep 1, 2019

r? @withoutboats

(rust_highfive has picked a reviewer for you, use r? to override)

src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Sep 1, 2019

The job mingw-check of your PR failed (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2019-09-01T17:30:24.9862540Z ##[command]git remote add origin https://github.com/rust-lang/rust
2019-09-01T17:30:25.6788628Z ##[command]git config gc.auto 0
2019-09-01T17:30:25.6791906Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2019-09-01T17:30:25.6795190Z ##[command]git config --get-all http.proxy
2019-09-01T17:30:25.6799980Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/64069/merge:refs/remotes/pull/64069/merge
---
2019-09-01T17:36:38.0473105Z     Checking backtrace v0.3.35
2019-09-01T17:36:43.2265818Z error: impl has missing stability attribute
2019-09-01T17:36:43.2267576Z    --> src/libstd/ffi/c_str.rs:722:1
2019-09-01T17:36:43.2268155Z     |
2019-09-01T17:36:43.2268657Z 722 | / impl From<Vec<NonZeroU8>> for CString {
2019-09-01T17:36:43.2269186Z 723 | |     /// Converts a [`Vec`]`<`[`NonZeroU8`]`>` into a [`CString`] without
2019-09-01T17:36:43.2269717Z 724 | |     /// copying nor checking for inner null bytes.
2019-09-01T17:36:43.2270910Z ...   |
2019-09-01T17:36:43.2271338Z 793 | |     }
2019-09-01T17:36:43.2271800Z 794 | | }
2019-09-01T17:36:43.2272242Z     | |_^
---
2019-09-01T17:36:43.4492110Z == clock drift check ==
2019-09-01T17:36:43.4512422Z   local time: Sun Sep  1 17:36:43 UTC 2019
2019-09-01T17:36:43.6012744Z   network time: Sun, 01 Sep 2019 17:36:43 GMT
2019-09-01T17:36:43.6017002Z == end clock drift check ==
2019-09-01T17:36:50.5865241Z ##[error]Bash exited with code '1'.
2019-09-01T17:36:50.5899521Z ##[section]Starting: Checkout
2019-09-01T17:36:50.5902055Z ==============================================================================
2019-09-01T17:36:50.5902124Z Task         : Get sources
2019-09-01T17:36:50.5902167Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@scottmcm scottmcm added the needs-fcp label Sep 2, 2019

@Centril Centril added this to the 1.39 milestone Sep 3, 2019

@clarfon

This comment has been minimized.

Copy link
Contributor

commented Sep 3, 2019

Perhaps there could be a method to return the nonzero bytes as a slice as well?

@Shnatsel

This comment has been minimized.

Copy link

commented Sep 3, 2019

Perhaps there could be a method to return the nonzero bytes as a slice as well?

Sounds reasonable to me. That should be its own PR though, as that can be feature-gated for some time while trait implementations are instantly stable.

@Shnatsel Shnatsel referenced this pull request Sep 3, 2019
@JohnCSimon

This comment has been minimized.

Copy link

commented Sep 14, 2019

Ping from triage
@Centril @withoutboats This PR is still waiting on review.
CC @danielhenrymantilla

Thanks.

@Centril
Copy link
Member

left a comment

Mostly just some nits...

/// - for any `cap: usize`, `Layout<[T; cap]>` needs to be equal to
/// `Layout<[U; cap]>` (for the allocator)
#[inline]
unsafe

This comment has been minimized.

Copy link
@Centril

Centril Sep 14, 2019

Member

what's with the line break here?

This comment has been minimized.

Copy link
@danielhenrymantilla

danielhenrymantilla Sep 14, 2019

Author Contributor

Woops, my own Rust style leaked here (Out of topic: I see pub and unsafe, among others, as markers, hence the logic of them being on an extra line rather than on the same line, in the same vein as other #[meta] attributes)

#[inline]
unsafe
fn transmute_vec<T, U>(v: Vec<T>) -> Vec<U> {
// necessary conditions for `Layout<[T; N]> == Layout<[U; N]>`

This comment has been minimized.

Copy link
@Centril

Centril Sep 14, 2019

Member

Why not check Layout::array::<T / U>().unwrap() against each other?

/// `[T; length]` and `[U; length]`.
///
/// - for any `cap: usize`, `Layout<[T; cap]>` needs to be equal to
/// `Layout<[U; cap]>` (for the allocator)

This comment has been minimized.

Copy link
@Centril

Centril Sep 14, 2019

Member
Suggested change
/// `Layout<[U; cap]>` (for the allocator)
/// `Layout<[U; cap]>` (for the allocator).
src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
@Centril

This comment has been minimized.

Copy link
Member

commented Sep 14, 2019

r? @RalfJung for checking the safety argument more in-depth. Also cc @rkruppe.

(Do ping T-libs once that review is done...)

@rkruppe
Copy link
Member

left a comment

Soundness argument seems fine to me, just some nits.

//
// - `v` cannot contain null bytes, given the type-level
// invariant of `NonZeroU8` (this would still apply even if
// this invariant was a safety invariant and not a validity

This comment has been minimized.

Copy link
@rkruppe

rkruppe Sep 14, 2019

Member

Nit: strike the hypothetical. I am reasonably sure you do need the safety invariant here. Most operations on vectors don't assert validity of any elements except perhaps the ones they move into or out of the vec. Of course, a Vec<T> is only safe if elements 0..len are safe at T, so the soundness argument can rest on that plus the safety/validity (same thing) of the NonZeroU8.

(Insert standard disclaimer about all of these details being not yet ratified here.)

This comment has been minimized.

Copy link
@danielhenrymantilla

danielhenrymantilla Sep 14, 2019

Author Contributor

You're right, I will remove the part between brackets then (I'm still glad I put it, it is good to clearly think about these things before merging).

Vec::<U>::from_raw_parts(ptr as *mut U, length, capacity)
}

let v = unsafe {

This comment has been minimized.

Copy link
@rkruppe

rkruppe Sep 14, 2019

Member

A type annotation won't hurt here IMO.

This comment has been minimized.

Copy link
@danielhenrymantilla

danielhenrymantilla Sep 14, 2019

Author Contributor

Agreed

crate::alloc::Layout::new::<U>(),
);
// The previous assert should imply the following one:
debug_assert_eq!(

This comment has been minimized.

Copy link
@danielhenrymantilla

danielhenrymantilla Sep 14, 2019

Author Contributor

The first check is an assert since it is expected to be easily optimized out (only const / compile-time information here).
On the other hand, the check involving Layout::array cannot be const, since it involves the runtime value of the vector capacity.
That being said, I expect Layout<T> == Layout<U> to imply that for all N: usize, Layout<[T; N]> == Layout<[U; N]>.
If this were not to be correct / guaranteed by the language, this debug assertion could be upgraded to a normal assert!; in this case however, the function is only used with the monomorphisation to T = NonZeroU8 and V = u8, which does not require such check at runtime.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Sep 16, 2019

Member

I think it would be a serious bug in Layout::array if this is not given... its comparison test would fail to take something into account, or so.

Not sure if this check is worth the clutter it introduces.

This comment has been minimized.

Copy link
@danielhenrymantilla

danielhenrymantilla Sep 16, 2019

Author Contributor

Great! I will remove the debug_assert! involving the Layout::array check

@danielhenrymantilla danielhenrymantilla force-pushed the danielhenrymantilla:feature/cstring_from_vec_of_nonzerou8 branch from af2b287 to 755fe39 Sep 14, 2019

danielhenrymantilla and others added 2 commits Sep 14, 2019
Minor documentation improvements
Thanks to @Centril's code review

Co-Authored-By: Mazdak Farrokhzad <twingoow@gmail.com>

@danielhenrymantilla danielhenrymantilla force-pushed the danielhenrymantilla:feature/cstring_from_vec_of_nonzerou8 branch from 755fe39 to b32aec7 Sep 14, 2019

@RalfJung

This comment has been minimized.

Copy link
Member

commented Sep 16, 2019

I'm afraid my Rust inbox is swamped, I won't be able to review this -- I can take a glance, but don't have time to do a full review.

Any takers?

(Also is there some way to tell bors "sorry, not me", if I don't know whom else to assign?)

// End-of-scope `mem::forget` but without aliasing problems.
let mut v = mem::ManuallyDrop::<Vec<T>>::new(v);
let v: &mut Vec<T> = &mut *v;
v.as_mut_ptr() // Cannot be aliased.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Sep 16, 2019

Member

ManuallyDrop implements Deref; why is the intermediate let v: &mut Vec<T> = ... needed?

This comment has been minimized.

Copy link
@danielhenrymantilla

danielhenrymantilla Sep 16, 2019

Author Contributor

It's just me being (overly?) cautious with type inference / method resolution within unsafe code , so I wanted to have that Deref be very explicit.

@Mark-Simulacrum Mark-Simulacrum assigned rkruppe and unassigned RalfJung Sep 16, 2019

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

commented Sep 16, 2019

Going to re-assign to @rkruppe as they appear to have done some review here but feel free to reassign to someone else if you don't feel confident reviewing this.

@rkruppe

This comment has been minimized.

Copy link
Member

commented Sep 16, 2019

I'm fine with reviewing this. Will take another close look later for due digilence but from what I recall this should be in good shape.

I think some @rust-lang/libs sign-off is also required since this is a new insta-stable API addition.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

commented Sep 16, 2019

I've nominated to get someone from T-libs to kick off an FCP merge here

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Sep 16, 2019

Is there any precedent for working with Vec<NonZeroU8>? How would one go about constructing a value of that type?

@rkruppe

This comment has been minimized.

Copy link
Member

commented Sep 17, 2019

r=me pending T-libs approval

@SimonSapin I'm not aware of any existing APIs for that type specifically, so no precedent, but the motivating example from libflate in #59229 can be modified to build up a Vec<NonZeroU8> and convert it to a CString once a 0 byte is read.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Sep 17, 2019

I’m not disputing that the conversion is correct, but that it is useful. The code in #59229 that would use this conversion doesn’t look better than the original to me.

(To answer my own question: the way to construct Vec<NonZeroU8> in #59229 is one byte at a time with NonZeroU8::new.)

@rkruppe

This comment has been minimized.

Copy link
Member

commented Sep 17, 2019

The advantage of using this API over the code in #59229 is that the unsafe is isolated in libstd. So it protects against mistakes such as adding the 0 terminator to the Vec<u8> before calling CString::from_vec_unchecked (happened in #59229 (comment)) and simplifies auditing for memory safety. The latter point was already made in the PR description with this quote from @Shnatsel:

For me the main thing about making this safe is simplifying auditing - people have spent like an hour looking at just this one unsafe block in libflate because it's not clear what exactly is unchecked, so you have to look it up when auditing anyway. This has distracted us from much more serious memory safety issues the library had.
Having this trivial impl in stdlib would turn this into safe code with compiler more or less guaranteeing that it's fine, and save anyone auditing the code a whole lot of time.

I'm in no position to argue whether it is useful for other crates but the libflate snippet in question does benefits from this API addition IMO.

@Shnatsel

This comment has been minimized.

Copy link

commented Sep 17, 2019

The primary use case motivating this is constructing a CString from a reader. There is currently no safe way to do that without scanning the CString for null bytes twice.

A bespoke unsafe implementation works, but without an std abstraction people need to reinvent it every time, and get it right. Even if done right, presence of a bespoke unsafe block increases the cost of auditing safety of the crate.

This API addition isolates the unsafety inside std, so it can be implemented correctly once and for all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.