Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that str, slices, (more?) can't safely straddle allocation boundaries #62765

Open
hanna-kruppe opened this issue Jul 17, 2019 · 8 comments
Open

Comments

@hanna-kruppe
Copy link
Member

@hanna-kruppe hanna-kruppe commented Jul 17, 2019

Slicing, indexing, and other safe operations on slices and strings pervasively use <*T>::offset and APIs built on top of it. These have the requirement that

Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object.

So if one allocates two pieces of memory and after proper checking miraculously finds they are directly adjacent, one can't safely construct a slice/str/etc. that spans both of these allocations. At least, one can't do very many things with the result it without causing UB from crossing the boundary between the allocations.

I couldn't find anything documenting this. It should be noted on the unsafe constructors (from_raw_parts etc.) at minimum. These already link to offset's documentation but only refer to its "no larger than isize::MAX" requirement, with no mention that the other requirements are also relevant.

cc oberien/str-concat#8
cc @rust-lang/wg-unsafe-code-guidelines

(Similar issues apply to references-to-arrays and field accesses in aggregates, but this is due to the compiler's codegen for language primitives rather than due to standard library code, so it should go into the UCG and I believe we're more or less covering that already.)

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Jul 17, 2019

Agreed. I am pretty sure this already came up last year or before with another "slice concat" library that was found to be unsound, but I cannot find any trace of that right now.

@HeroicKatora

This comment has been minimized.

Copy link
Contributor

@HeroicKatora HeroicKatora commented Jul 19, 2019

Pretty sure you mean this one 😄

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Jul 20, 2019

@HeroicKatora I do, nice find!

@cramertj

This comment has been minimized.

Copy link
Member

@cramertj cramertj commented Jul 22, 2019

I don't suppose there's any way to mark that offsetof from one of the addresses could also be used to refer to a separately allocated object, e.g. ptr::join(ptr_into_allocation_a, ptr_into_allocation_b)? I don't know of any LLVM-level operation to mirror this, but it seems pretty clearly useful.

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Jul 22, 2019

@cramertj In principle, you can implement a cross_object_offset as follows:

fn cross_object_offset<T>(ptr: *const T, n: usize) -> *const T {
  ((ptr as usize) + n*mem::size_of::<T>()) as *const T
}

However, there are some long-standing LLVM bugs in this area and with the LLVM model being as awfully imprecise as it is for UB, we can't really tell where the bug is.

The good news though is that it seems like C++ is moving towards "pointer provenance does not propagate via integers", which would basically force LLVM to implement the semantics we want for the above operation.

@hanna-kruppe

This comment has been minimized.

Copy link
Member Author

@hanna-kruppe hanna-kruppe commented Jul 24, 2019

@RalfJung showed how to perform pointer arithmetic that crosses allocation boundaries. It's certainly possible to do this manually when handling raw pointers, but it's something that has to be done on every operation, you can't simply claim to LLVM that two separate allocations are actually the same. Therefore, fixing the safety issue that this issue is about would require us to implement all operations on slices/strings/etc. with cross_object_offset. This will most likely cause serious performance regressions, both due to the added flexibility making desirable optimizations illegal, and because analyses and optimizations are often tailored to getelementpointers even if they may after careful consideration turn out to be applicable to cross_object_offset-using code too.

@hanna-kruppe

This comment has been minimized.

Copy link
Member Author

@hanna-kruppe hanna-kruppe commented Nov 11, 2019

#66111 did this for slices

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Nov 11, 2019

Well it did it in from_raw_parts. I wonder if the top-level module and/or primitive type documentation should also have a section on this? It doesn't talk about data layout at all so far.

I didn't do anything for str as there is no from_raw_parts for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.