Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add {String,Vec}::into_raw_parts #65705

Merged
merged 3 commits into from Oct 25, 2019

Conversation

@shepmaster
Copy link
Member

shepmaster commented Oct 22, 2019

Aspects to address:

  • Create a tracking issue
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Oct 22, 2019

r? @dtolnay

(rust_highfive has picked a reviewer for you, use r? to override)

@shepmaster

This comment has been minimized.

Copy link
Member Author

shepmaster commented Oct 22, 2019

src/liballoc/string.rs Outdated Show resolved Hide resolved
src/liballoc/string.rs Outdated Show resolved Hide resolved
src/liballoc/vec.rs Outdated Show resolved Hide resolved
@shepmaster shepmaster force-pushed the shepmaster:vec-into-raw branch from d9c18de to 16ff552 Oct 22, 2019
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Oct 22, 2019

Your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2019-10-22T18:43:01.5589883Z ##[command]git remote add origin https://github.com/rust-lang/rust
2019-10-22T18:43:01.5780780Z ##[command]git config gc.auto 0
2019-10-22T18:43:01.5848013Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2019-10-22T18:43:01.5906218Z ##[command]git config --get-all http.proxy
2019-10-22T18:43:01.6047752Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/65705/merge:refs/remotes/pull/65705/merge

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@shepmaster shepmaster force-pushed the shepmaster:vec-into-raw branch from 16ff552 to f6d0936 Oct 22, 2019
@shepmaster

This comment has been minimized.

Copy link
Member Author

shepmaster commented Oct 22, 2019

I believe I've addressed the first round of review comments, thanks!

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Oct 22, 2019

LGTM! But let's wait for someone from @rust-lang/libs to also take a look.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Oct 22, 2019

What motivates this? Why is it needed or useful when as_mut_ptr + len + capacity + mem::forget (or maybe ManuallyDrop) already exist? What’s an example of non-trivial code that could use this? How does it affect rust-lang/rfcs#2756, if at all?

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Oct 22, 2019

Your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2019-10-22T20:23:01.2949724Z ##[command]git remote add origin https://github.com/rust-lang/rust
2019-10-22T20:23:01.3174449Z ##[command]git config gc.auto 0
2019-10-22T20:23:01.3241444Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2019-10-22T20:23:01.3298086Z ##[command]git config --get-all http.proxy
2019-10-22T20:23:01.3481292Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/65705/merge:refs/remotes/pull/65705/merge

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@Lokathor

This comment has been minimized.

Copy link
Contributor

Lokathor commented Oct 22, 2019

It's strikingly easy to get vec transmutation wrong. Just using the methods you listed in the wrong order is UB, for example.

So this lets you correctly dismantle a Vec into the parts that you need to use with from_raw_parts.

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Oct 23, 2019

Personally I like this -- I always have the same sort of wonders as the things that Ralf was commenting about when I do this, and getting things back in the same order as from_raw_parts is a nice push in the right direction to not accidentally put them back weirdly.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Oct 23, 2019

Just using the methods you listed in the wrong order is UB, for example.

Could you provide code examples that are UB with an explanation why?

This is the same question as in rust-lang/rfcs#2756 (comment) about Vec::transmute, which as far as I know hasn’t been addressed:

Though the code is not too large, there are a good number of things to get wrong – this solution was iteratively created by soundness-knowledgeable Rustaceans, with multiple wrong attempts.

Could you include text in the RFC that documents this progression? In particular, why is the naive approach unsound, and why is the final result correct?


I’m not entirely opposed to having this method. But it’s not obvious what problem(s) it solves, so I think it’s important to have those problems documented in details (ideally in the method’s doc-comment), so that that knowledge is not just in the collective mind of people who happen to have participated in a previous discussion.

@Lokathor

This comment has been minimized.

Copy link
Contributor

Lokathor commented Oct 23, 2019

I... defer to @RalfJung to answer that one >_>

I do not remember the details well enough to be confident in saying at all.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Oct 23, 2019

@SimonSapin See rust-lang/rust-clippy#4484 and #64073 for some context. The only blessed way to "transmute" a Vec is to go via its raw parts. With the current API that is clumsy and, under stricter aliasing rules, easy to get wrong (see #62610 for an incorrect use of mem::forget).

The argument "why is this useful when you can already do this right now by calling 5 other functions in a row" would kill the vast majority of recent libstd additions...

Vec::into_raw_parts is the natural counterpart to Box::into_raw. It closes an expressiveness gap in Vec's API when compared with Box. After all, we also don't ask users to do a raw-ptr-cast followed by forgetting the Box.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Oct 23, 2019

and, under stricter aliasing rules, easy to get wrong

It’s still not clear to me how exactly things can go wrong or what part would be UB.

Again I’m not arguing against adding this method, only for proper documentation (not just in GitHub threads) of all those issues.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Oct 23, 2019

It’s still not clear to me how exactly things can go wrong or what part would be UB.

See #62553 (comment) and #65705 (comment).

@shepmaster

This comment has been minimized.

Copy link
Member Author

shepmaster commented Oct 23, 2019

What’s an example of non-trivial code that could use this

The original case started in Deserialize a Vec<Foobar> as Vec directly when Foobar has exactly one field. There, I effectively did:

struct FoobarContainer<T> {
    values: Vec<T>,
}

#[repr(transparent)]
struct Foobar<T> {
    value: T,
}

let input: Vec<Foobar<T>> = unimplemented!();
let output: Vec<T> = mem::transmute(input);

Only by a whim of chance did I think to ask if that was safe and learned it was not (as Vec is repr(Rust)). The safe way of performing such a transmute is by decomposing the parts, casting the pointer, and recomposing them.

That currently looks like:

let mut v: Vec<Foobar<T>> = unimplemented!();

let values = unsafe {
    let data = v.as_mut_ptr() as *mut T;
    let len = v.len();
    let cap = v.capacity();

    std::mem::forget(v);

    Vec::from_raw_parts(data, len, cap)
};

With this new feature, it can look like

let v: Vec<Foobar<T>> = unimplemented!();

let values = unsafe {
    let (data, len, cap) = v.into_raw_parts();
    Vec::from_raw_parts(data as *mut T, len, cap)
};

That specific example is intended to be expressed in the example for Vec::into_raw_parts, using a simpler transmute.

when as_mut_ptr + len + capacity + mem::forget (or maybe ManuallyDrop) already exist

I agree with @RalfJung's points above

  • calling the right methods in the correct order
  • making sure we correctly use mutability and ManuallyDrop (see the two commits besides the one that adds into_raw_parts — the official docs were showing incorrect usage)

How does it affect rust-lang/rfcs#2756, if at all?

They are certainly motivated by similar concerns (I think the RFC may have even been opened due to the Clippy issue). I believe that there's some amount of benefit to both. Vec::transmute would address my original case here, but into_raw_parts would be nice to use for FFI concerns.

If I understand it correctly, Vec::transmute could be implemented in terms of into_raw_parts:

unsafe fn transmute<U>(self) -> Vec<U> {
    let (ptr, len, cap) = self.into_raw_parts();
    let ptr = mem::transmute::<*mut T, *mut U>(ptr);
    Vec::from_raw_parts(ptr, len, cap)
}

Then the original example would be

let mut v: Vec<Foobar<T>> = unimplemented!();
let values: Vec<T> = unsafe { v.transmute() };

Thanks for trying to ensure high-quality for standard library functions!

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Oct 25, 2019

Alright. I’d like to accept this PR, with a tracking issue and some docs improvements:

  • Describe in prose (not just through examples) the components of the returned tuple. Two of them are usize but they are not interchangeable. This is a case where I wish the language had structural records so we could name those components, but oh well.
  • Explain that ownership of the allocation is transferred to the caller. Maybe some phrasing can be taken from docs for Box::into_raw.
  • Specify that (until #65801 is done), from_raw_parts is the only valid way to deallocate or reallocate.

under stricter aliasing rules, easy to get wrong (see #62610 for an incorrect use of mem::forget).

I feel that something about this should maybe be part of the docs, but I’m not sure how to phrase it or if it really belongs there. @RalfJung what do you think?

@Centril

This comment has been minimized.

Copy link
Member

Centril commented Oct 25, 2019

  • This is a case where I wish the language had structural records so we could name those components, but oh well.

cc rust-lang/rfcs#2584.

@shepmaster

This comment has been minimized.

Copy link
Member Author

shepmaster commented Oct 25, 2019

Is it worth adding an example of using the raw parts of the String? My first thought was a case-conversion thingie:

    let ascii_case_difference = b'a' - b'A';
    for idx in 0..len {
        unsafe {
            *ptr.add(idx) -= ascii_case_difference;
        }
    }

But there's make_ascii_uppercase, so that's a bit silly of an example. Maybe just something that bumps letters by 1?

@shepmaster shepmaster force-pushed the shepmaster:vec-into-raw branch from f6d0936 to 6600cf6 Oct 25, 2019
@shepmaster

This comment has been minimized.

Copy link
Member Author

shepmaster commented Oct 25, 2019

Should these functions be associated functions like Box::into_raw?

@Lokathor

This comment has been minimized.

Copy link
Contributor

Lokathor commented Oct 25, 2019

Box does that to avoid deref conflicts, I don't think it's necessary here.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Oct 25, 2019

Right, if Foo happens to have a into_raw method then it can also be called on Box<Foo> through auto-deref and we don’t want to shadow it.

Vec<Foo> however derefs to [Foo] which only has a limited set of methods (unless a trait is in scope that adds them, but this is enough of a stretch that we’re less worried about that), non of which is called into_raw_parts.

@shepmaster shepmaster mentioned this pull request Oct 25, 2019
0 of 2 tasks complete
@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Oct 25, 2019

Looks good, thanks!

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Oct 25, 2019

📌 Commit 6600cf6 has been approved by SimonSapin

Centril added a commit to Centril/rust that referenced this pull request Oct 25, 2019
Add {String,Vec}::into_raw_parts

Aspects to address:

- [x] Create a tracking issue
  - rust-lang#65816
bors added a commit that referenced this pull request Oct 25, 2019
Rollup of 7 pull requests

Successful merges:

 - #63810 (Make <*const/mut T>::offset_from `const fn`)
 - #65705 (Add {String,Vec}::into_raw_parts)
 - #65749 (Insurance policy in case `iter.size_hint()` lies.)
 - #65799 (Fill tracking issue number for `array_value_iter`)
 - #65800 (self-profiling: Update measureme to 0.4.0 and remove non-RAII methods from profiler.)
 - #65806 (Add [T]::as_ptr_range() and [T]::as_mut_ptr_range().)
 - #65810 (SGX: Clear additional flag on enclave entry)

Failed merges:

r? @ghost
@shepmaster

This comment has been minimized.

Copy link
Member Author

shepmaster commented Oct 25, 2019

Should these functions be associated functions like Box::into_raw?

I was mostly asking for consistency between these conceptually similar ideas, not because of an intrinsic need.

JohnTitor added a commit to JohnTitor/rust that referenced this pull request Oct 25, 2019
Add {String,Vec}::into_raw_parts

Aspects to address:

- [x] Create a tracking issue
  - rust-lang#65816
@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Oct 25, 2019

I feel that something about this should maybe be part of the docs, but I’m not sure how to phrase it or if it really belongs there. @RalfJung what do you think?

(That was re: avoiding mem::forget.) The usual problem is where to put "negative docs" like this. I guess we could write something in the mem::forget docs?
But also, this is all speculative -- beause we don't know what the aliasing model will be, it's better to be conservative.

bors added a commit that referenced this pull request Oct 25, 2019
Rollup of 6 pull requests

Successful merges:

 - #65705 (Add {String,Vec}::into_raw_parts)
 - #65749 (Insurance policy in case `iter.size_hint()` lies.)
 - #65799 (Fill tracking issue number for `array_value_iter`)
 - #65800 (self-profiling: Update measureme to 0.4.0 and remove non-RAII methods from profiler.)
 - #65806 (Add [T]::as_ptr_range() and [T]::as_mut_ptr_range().)
 - #65810 (SGX: Clear additional flag on enclave entry)

Failed merges:

r? @ghost
@bors bors merged commit 6600cf6 into rust-lang:master Oct 25, 2019
4 checks passed
4 checks passed
pr Build #20191025.76 succeeded
Details
pr (Linux mingw-check) Linux mingw-check succeeded
Details
pr (Linux x86_64-gnu-llvm-6.0) Linux x86_64-gnu-llvm-6.0 succeeded
Details
pr (LinuxTools) LinuxTools succeeded
Details
@shepmaster shepmaster deleted the shepmaster:vec-into-raw branch Oct 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.