RFC: [T]::rejoin #2806

golddranks · 2019-11-04T15:16:53Z

Add method fn rejoin(&self, other: &[T]) -> &[T] for slice, with other related methods (*_mut, try_*, and rejoin, try_rejoin for &str). This API allows joining two slices that are adjacent in the memory, into one.

Rendered

hanna-kruppe · 2019-11-04T21:01:53Z

This API has been implemented in several third party crates and discovered to be unsound repeatedly. See rust-lang/rust#62765 for reasons why and links to several past occurrences.

@HeroicKatora proposed an alternative API (and implemented it in an unreleased branch of str-concat) which they believe addresses the problem, though:

I, for one, haven't looked into it enough to come to any conclusion whether they're sound.
That API is significantly more cumbersome to use, essentially requiring you to carry around the original slice to be able to rejoin parts of it (unless you manually ensure that the slices are part of the same allocation and use unsafe)

golddranks · 2019-11-05T01:34:40Z

Thanks for pointing those out. I mentioned about the soundness concerns in drawbacks, and so it's as I feared, unsound. The question I was wondering about while writing that, however is whether LLVM supports any way of fixing the situation, such as creating new pointers that would conceptually consider the joined object as one object. It seems however, that LLVM is very tightly geared for C-like memory models so they won't support anything like that?

Is the main thing for forbidding all the out-of-bounds accesses the aliasing analysis? It seems unfortunate that even if we have lifetimes and sound references in Rust, we can't still have a function that takes in two references, returns one and say: "this reference may alias BOTH of the input references". It sounds like it could be possible in theory, although not maybe feasible with what LLVM supports.

I found this link from the linked thread to be an informative read so I'm linking it here:
https://bugs.llvm.org/show_bug.cgi?id=34548

Centril · 2019-11-05T01:41:05Z

Adding the lang team as well for now due to the soundness concern.
cc @RalfJung

Is the soundness issue here one of just interactions with LLVM or something deeper? If the latter is the case perhaps some formal proofs would be in order.

RalfJung · 2019-11-05T08:20:40Z

This API is unimplementable in C, LLVM and Rust. In all of these languages, array/slice accesses have "inbounds" semantics, meaning the computation of the array element address must stay within the bounds of the same allocation. It is impossible to check at run-time if two neighboring pointers are in the same allocation or not. This is a property the informal specs of LLVM and C as well as every single formal model of LLVM and C that I am aware of have in common -- it's about as much consensus as we will get for these languages.

The only way I know to implement this is to change array/slice indexing. In this formal model of LLVM, one can do cross-allocation arithmetic by casting the ptr to an int, doing the arithmetic there, and casting back. Unfortunately the C and LLVM specs are too vague around integer-pointer casts to tell if that is indeed intended to work, and LLVM is known to have bugs in this area that could break this in practice.

Also see bluss/odds#25.

Is the soundness issue here one of just interactions with LLVM or something deeper? If the latter is the case perhaps some formal proofs would be in order.

I would say it is deeper. The proposed API is, as far as I can tell, fundamentally incompatible with our requirement that arrays/slices be contained within a single allocation.

RalfJung · 2019-11-05T08:22:24Z

text/0000-rejoin-slice.md

+        let self_len = self.len();
+        let self_end = self[self_len..].as_ptr();
+        if core::ptr::eq(self_end, other.as_ptr()) {
+            Some(unsafe { core::slice::from_raw_parts(self.as_ptr(), self.len() + other.len()) })


This is the unsoundness: from_raw_parts requires the entire range to be within a single allocation, which your check does not ensure (the two allocations could be situated right next to each other).

We should probably clarify the docs.

RalfJung · 2019-11-05T08:26:53Z

From the implementing PR rust-lang/rust#66088:

This cannot be merged as-is as it was deemed to be unsafe. However, it might be useful to have an unsafe version in standard library. It would be good from the perspective of communicating what is UB and what is not, and why, too.

I agree we need better docs. As a first step, we should improve the from_raw_parts docs to clarify the single-allocation rule. I'll propose a PR for that soon.

Beyond that, I do not think we have precedent for adding an unsafe function to libstd just for the purpose of documenting that this cannot be done safely. What are you proposing the name, API and docs of that function to be?

golddranks · 2019-11-05T08:52:45Z

Basically, the same as the the rejoin proposed here (names are bikesheddable, but to me "re" implies that it's only intended for joining subslices that were split from the same slice.), but with unsafe. (The try_* versions should be dropped as they name suggest false sense of security that the "rejoinability" would be detectable at runtime.) The basic motivation explained in the RFC still stands. Unless I'm misunderstanding the source of the unsoundness (i.e. it's UB to join two slices that originate from different allocations), the operation itself is a thing that can be done, provided that the user can be sure that the slices are subslices of the same original slice. The fact that it has been implemented multiple times outside of the standard library tells that there is demand, and as the history has shown, people often miss corner cases. Therefore, it would be better to have it done in standard library, and clearly tell why it's unsafe, when it triggers UB and when it doesn't. This at the same times caters to the demand, and achieves communicating about the soundness hazard.

RalfJung · 2019-11-05T09:04:09Z

Submitted docs PR as: rust-lang/rust#66111

the operation itself is a thing that can be done, provided that the user can be sure that the slices are subslices of the same original slice.

That is correct.

But then I wonder, what is the use-case for this? The function you proposed, and that can be found on crates.io, is useful because it is (well, claims to be) safe and does all the run-time checks. The function you are proposjng does some run-time checks (slice1.end == slice2.start) but not enough. Under which circumstances would I know that two subslices come from the same parent slice, but not know if they are adjacent? Basically, I am doubting that this is any more useful than bare from_raw_parts.

Its main use comes from the fact that people might search for it in the libstd docs, and then we can say in clear words that it is impossible to implement a safe version of this and that they should forward this claim to any crates.io library that says otherwise. But it seems odd to add a function to libstd whose main purpose is not to be called...

golddranks · 2019-11-05T09:19:15Z

I'm mainly thinking thinking that catching some of the possible errors is better than none. Besides adjacency, which has false positives but no false negatives, there are lifetimes and slice length, all of which you can get wrong with from_raw_parts.

golddranks · 2019-11-05T10:51:41Z

To be sure, I also think that it makes sense to have the function in standard library to prevent people from reimplementing it sloppily. With the above argument I was just trying to say that not being called is not it's only purpose.

Centril · 2019-11-05T20:55:06Z

Its main use comes from the fact that people might search for it in the libstd docs, and then we can say in clear words that it is impossible to implement a safe version of this and that they should forward this claim to any crates.io library that says otherwise. But it seems odd to add a function to libstd whose main purpose is not to be called...

Agreed. I think it's also plausible that some people have a bad day, don't read the docs very clearly, and then go on to use the method wrongly.

@Centril

improve from_raw_parts docs Triggered by rust-lang/rfcs#2806. Hopefully this helps clarify that joining slices across allocations is not possible in Rust currently. r? @Centril

@Centril

improve from_raw_parts docs Triggered by rust-lang/rfcs#2806. Hopefully this helps clarify that joining slices across allocations is not possible in Rust currently. r? @Centril

@Centril

improve from_raw_parts docs Triggered by rust-lang/rfcs#2806. Hopefully this helps clarify that joining slices across allocations is not possible in Rust currently. r? @Centril

golddranks · 2019-11-07T03:54:15Z

Thanks for the insightful comments. It's clear that this RFC is unacceptable in this form. While I think that the discussion about whether to provide an unsafe API is valuable to have, let's do that once there is a proposal. I'm closing this PR at this time.

glaebhoerl · 2019-11-07T07:19:16Z

It'd presumably be less ergonomic and expressive, but in principle could invariant lifetimes be used to prove that different references/slices came from the same allocation?

golddranks · 2019-11-07T07:24:12Z

@glaebhoerl I haven't looked into it but I believe that's what @HeroicKatora's API that @rkruppe linked, does.

glaebhoerl · 2019-11-07T08:01:15Z

It doesn't, but thanks for the tip. (Maybe it should; I can't tell what the situation is with the dynamic checks it also adds.)

RFC: rejoin_slice

4fa927b

golddranks mentioned this pull request Nov 4, 2019

[T]::rejoin (experimental API implementation) rust-lang/rust#66088

Closed

golddranks changed the title ~~RFC: slice::rejoin~~ RFC: [T]::rejoin Nov 4, 2019

Centril added T-libs-api Relevant to the library API team, which will review and decide on the RFC. A-slice Slice related proposals & ideas T-lang Relevant to the language team, which will review and decide on the RFC. labels Nov 5, 2019

RalfJung reviewed Nov 5, 2019

View reviewed changes

RalfJung mentioned this pull request Nov 5, 2019

improve from_raw_parts docs rust-lang/rust#66111

Merged

golddranks closed this Nov 7, 2019

jplatte mentioned this pull request Dec 18, 2019

BytesMut::unsplit is unsound tokio-rs/bytes#347

Closed

golddranks mentioned this pull request Jan 18, 2021

This crate is unsoud golddranks/rejoin_slice#1

Closed

the8472 mentioned this pull request Feb 27, 2021

Add Iterator::map_windows rust-lang/rust#82413

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: [T]::rejoin #2806

RFC: [T]::rejoin #2806

golddranks commented Nov 4, 2019 •

edited

Loading

hanna-kruppe commented Nov 4, 2019

golddranks commented Nov 5, 2019 •

edited

Loading

Centril commented Nov 5, 2019

RalfJung commented Nov 5, 2019

RalfJung Nov 5, 2019

RalfJung commented Nov 5, 2019

golddranks commented Nov 5, 2019 •

edited

Loading

RalfJung commented Nov 5, 2019 •

edited

Loading

golddranks commented Nov 5, 2019

golddranks commented Nov 5, 2019 •

edited

Loading

Centril commented Nov 5, 2019

golddranks commented Nov 7, 2019

glaebhoerl commented Nov 7, 2019

golddranks commented Nov 7, 2019 •

edited

Loading

glaebhoerl commented Nov 7, 2019

RFC: [T]::rejoin #2806

RFC: [T]::rejoin #2806

Conversation

golddranks commented Nov 4, 2019 • edited Loading

hanna-kruppe commented Nov 4, 2019

golddranks commented Nov 5, 2019 • edited Loading

Centril commented Nov 5, 2019

RalfJung commented Nov 5, 2019

RalfJung Nov 5, 2019

Choose a reason for hiding this comment

RalfJung commented Nov 5, 2019

golddranks commented Nov 5, 2019 • edited Loading

RalfJung commented Nov 5, 2019 • edited Loading

golddranks commented Nov 5, 2019

golddranks commented Nov 5, 2019 • edited Loading

Centril commented Nov 5, 2019

golddranks commented Nov 7, 2019

glaebhoerl commented Nov 7, 2019

golddranks commented Nov 7, 2019 • edited Loading

glaebhoerl commented Nov 7, 2019

golddranks commented Nov 4, 2019 •

edited

Loading

golddranks commented Nov 5, 2019 •

edited

Loading

golddranks commented Nov 5, 2019 •

edited

Loading

RalfJung commented Nov 5, 2019 •

edited

Loading

golddranks commented Nov 5, 2019 •

edited

Loading

golddranks commented Nov 7, 2019 •

edited

Loading