Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: [T]::rejoin #2806

Closed
wants to merge 1 commit into from
Closed

Conversation

golddranks
Copy link

@golddranks golddranks commented Nov 4, 2019

Add method fn rejoin(&self, other: &[T]) -> &[T] for slice, with other related methods (*_mut, try_*, and rejoin, try_rejoin for &str). This API allows joining two slices that are adjacent in the memory, into one.

Rendered

@golddranks golddranks changed the title RFC: slice::rejoin RFC: [T]::rejoin Nov 4, 2019
@hanna-kruppe
Copy link

This API has been implemented in several third party crates and discovered to be unsound repeatedly. See rust-lang/rust#62765 for reasons why and links to several past occurrences.

@HeroicKatora proposed an alternative API (and implemented it in an unreleased branch of str-concat) which they believe addresses the problem, though:

  • I, for one, haven't looked into it enough to come to any conclusion whether they're sound.
  • That API is significantly more cumbersome to use, essentially requiring you to carry around the original slice to be able to rejoin parts of it (unless you manually ensure that the slices are part of the same allocation and use unsafe)

@golddranks
Copy link
Author

golddranks commented Nov 5, 2019

Thanks for pointing those out. I mentioned about the soundness concerns in drawbacks, and so it's as I feared, unsound. The question I was wondering about while writing that, however is whether LLVM supports any way of fixing the situation, such as creating new pointers that would conceptually consider the joined object as one object. It seems however, that LLVM is very tightly geared for C-like memory models so they won't support anything like that?

Is the main thing for forbidding all the out-of-bounds accesses the aliasing analysis? It seems unfortunate that even if we have lifetimes and sound references in Rust, we can't still have a function that takes in two references, returns one and say: "this reference may alias BOTH of the input references". It sounds like it could be possible in theory, although not maybe feasible with what LLVM supports.

I found this link from the linked thread to be an informative read so I'm linking it here:
https://bugs.llvm.org/show_bug.cgi?id=34548

@Centril Centril added T-libs-api Relevant to the library API team, which will review and decide on the RFC. A-slice Slice related proposals & ideas T-lang Relevant to the language team, which will review and decide on the RFC. labels Nov 5, 2019
@Centril
Copy link
Contributor

Centril commented Nov 5, 2019

Adding the lang team as well for now due to the soundness concern.
cc @RalfJung

Is the soundness issue here one of just interactions with LLVM or something deeper? If the latter is the case perhaps some formal proofs would be in order.

@RalfJung
Copy link
Member

RalfJung commented Nov 5, 2019

This API is unimplementable in C, LLVM and Rust. In all of these languages, array/slice accesses have "inbounds" semantics, meaning the computation of the array element address must stay within the bounds of the same allocation. It is impossible to check at run-time if two neighboring pointers are in the same allocation or not. This is a property the informal specs of LLVM and C as well as every single formal model of LLVM and C that I am aware of have in common -- it's about as much consensus as we will get for these languages.

The only way I know to implement this is to change array/slice indexing. In this formal model of LLVM, one can do cross-allocation arithmetic by casting the ptr to an int, doing the arithmetic there, and casting back. Unfortunately the C and LLVM specs are too vague around integer-pointer casts to tell if that is indeed intended to work, and LLVM is known to have bugs in this area that could break this in practice.

Also see bluss/odds#25.

Is the soundness issue here one of just interactions with LLVM or something deeper? If the latter is the case perhaps some formal proofs would be in order.

I would say it is deeper. The proposed API is, as far as I can tell, fundamentally incompatible with our requirement that arrays/slices be contained within a single allocation.

let self_len = self.len();
let self_end = self[self_len..].as_ptr();
if core::ptr::eq(self_end, other.as_ptr()) {
Some(unsafe { core::slice::from_raw_parts(self.as_ptr(), self.len() + other.len()) })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the unsoundness: from_raw_parts requires the entire range to be within a single allocation, which your check does not ensure (the two allocations could be situated right next to each other).

We should probably clarify the docs.

@RalfJung
Copy link
Member

RalfJung commented Nov 5, 2019

From the implementing PR rust-lang/rust#66088:

This cannot be merged as-is as it was deemed to be unsafe. However, it might be useful to have an unsafe version in standard library. It would be good from the perspective of communicating what is UB and what is not, and why, too.

I agree we need better docs. As a first step, we should improve the from_raw_parts docs to clarify the single-allocation rule. I'll propose a PR for that soon.

Beyond that, I do not think we have precedent for adding an unsafe function to libstd just for the purpose of documenting that this cannot be done safely. What are you proposing the name, API and docs of that function to be?

@golddranks
Copy link
Author

golddranks commented Nov 5, 2019

Basically, the same as the the rejoin proposed here (names are bikesheddable, but to me "re" implies that it's only intended for joining subslices that were split from the same slice.), but with unsafe. (The try_* versions should be dropped as they name suggest false sense of security that the "rejoinability" would be detectable at runtime.) The basic motivation explained in the RFC still stands. Unless I'm misunderstanding the source of the unsoundness (i.e. it's UB to join two slices that originate from different allocations), the operation itself is a thing that can be done, provided that the user can be sure that the slices are subslices of the same original slice. The fact that it has been implemented multiple times outside of the standard library tells that there is demand, and as the history has shown, people often miss corner cases. Therefore, it would be better to have it done in standard library, and clearly tell why it's unsafe, when it triggers UB and when it doesn't. This at the same times caters to the demand, and achieves communicating about the soundness hazard.

@RalfJung
Copy link
Member

RalfJung commented Nov 5, 2019

Submitted docs PR as: rust-lang/rust#66111

the operation itself is a thing that can be done, provided that the user can be sure that the slices are subslices of the same original slice.

That is correct.

But then I wonder, what is the use-case for this? The function you proposed, and that can be found on crates.io, is useful because it is (well, claims to be) safe and does all the run-time checks. The function you are proposjng does some run-time checks (slice1.end == slice2.start) but not enough. Under which circumstances would I know that two subslices come from the same parent slice, but not know if they are adjacent? Basically, I am doubting that this is any more useful than bare from_raw_parts.

Its main use comes from the fact that people might search for it in the libstd docs, and then we can say in clear words that it is impossible to implement a safe version of this and that they should forward this claim to any crates.io library that says otherwise. But it seems odd to add a function to libstd whose main purpose is not to be called...

@golddranks
Copy link
Author

I'm mainly thinking thinking that catching some of the possible errors is better than none. Besides adjacency, which has false positives but no false negatives, there are lifetimes and slice length, all of which you can get wrong with from_raw_parts.

@golddranks
Copy link
Author

golddranks commented Nov 5, 2019

To be sure, I also think that it makes sense to have the function in standard library to prevent people from reimplementing it sloppily. With the above argument I was just trying to say that not being called is not it's only purpose.

@Centril
Copy link
Contributor

Centril commented Nov 5, 2019

Its main use comes from the fact that people might search for it in the libstd docs, and then we can say in clear words that it is impossible to implement a safe version of this and that they should forward this claim to any crates.io library that says otherwise. But it seems odd to add a function to libstd whose main purpose is not to be called...

Agreed. I think it's also plausible that some people have a bad day, don't read the docs very clearly, and then go on to use the method wrongly.

pietroalbini added a commit to pietroalbini/rust that referenced this pull request Nov 6, 2019
improve from_raw_parts docs

Triggered by rust-lang/rfcs#2806. Hopefully this helps clarify that joining slices across allocations is not possible in Rust currently.

r? @Centril
JohnTitor added a commit to JohnTitor/rust that referenced this pull request Nov 6, 2019
improve from_raw_parts docs

Triggered by rust-lang/rfcs#2806. Hopefully this helps clarify that joining slices across allocations is not possible in Rust currently.

r? @Centril
JohnTitor added a commit to JohnTitor/rust that referenced this pull request Nov 7, 2019
improve from_raw_parts docs

Triggered by rust-lang/rfcs#2806. Hopefully this helps clarify that joining slices across allocations is not possible in Rust currently.

r? @Centril
@golddranks
Copy link
Author

Thanks for the insightful comments. It's clear that this RFC is unacceptable in this form. While I think that the discussion about whether to provide an unsafe API is valuable to have, let's do that once there is a proposal. I'm closing this PR at this time.

@golddranks golddranks closed this Nov 7, 2019
@glaebhoerl
Copy link
Contributor

It'd presumably be less ergonomic and expressive, but in principle could invariant lifetimes be used to prove that different references/slices came from the same allocation?

@golddranks
Copy link
Author

golddranks commented Nov 7, 2019

@glaebhoerl I haven't looked into it but I believe that's what @HeroicKatora's API that @rkruppe linked, does.

@glaebhoerl
Copy link
Contributor

It doesn't, but thanks for the tip. (Maybe it should; I can't tell what the situation is with the dynamic checks it also adds.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-slice Slice related proposals & ideas T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants