Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation of safety invariants #16

Closed
oilaba opened this issue Mar 26, 2021 · 11 comments
Closed

Documentation of safety invariants #16

oilaba opened this issue Mar 26, 2021 · 11 comments

Comments

@oilaba
Copy link

oilaba commented Mar 26, 2021

MmapMut::map_mut is an unsafe function, but there is not any documentation about why it is unsafe and which invariants the user have to hold.

@RazrFalcon
Copy link
Owner

#13 ?

@RazrFalcon
Copy link
Owner

@ckaran
Copy link

ckaran commented Oct 22, 2021

I appreciate the blurb, but is there any way to give us slightly stronger guarantees? E.g., if we memory map disjoint regions of the same file are we guaranteed that they will never directly interfere with one another (I'm not worried about out of process behavior, just in process, with two completely different async tasks writing to disjoint regions of the same file)?

@RazrFalcon
Copy link
Owner

I'm not worried about out of process behavior

That's the problem. We cannot guarantee that.

@ckaran
Copy link

ckaran commented Oct 22, 2021

Sure, I understand that. I have unrelated mechanisms that (should) protect against that kind of problem. However, I don't know if my using memmap on the same file handle within the same process, but on different threads and to disjoint regions of the file is by itself unsafe. Do you know if that action itself will cause UB?

@RazrFalcon
Copy link
Owner

Sadly, no idea. I'm using memmap in a very trivial way and I'm not aware of possible edge cases.

@ckaran
Copy link

ckaran commented Oct 22, 2021

Got it. OK, I'll have to think up some other way to handle my use case then. Thank you for your quick replies!

@adamreichold
Copy link

However, I don't know if my using memmap on the same file handle within the same process, but on different threads and to disjoint regions of the file is by itself unsafe. Do you know if that action itself will cause UB?

A memory map provides you with an instance of &mut [u8]. If you are able to ensure that only place in your code (and no other processes) create memory maps of the file in question, you should be able to perform the synchronisation based on that slice. (In this scenario, nothing anywhere else actually knows that the slice is backed by a memory map of a file. It is just virtual memory like any other memory.)

That said, mmap itself is thread-safe (MT-Safe in POSIX speech) (one of the reasons it is relatively slow to create a mapping even though using it is fast). If you are using read-only maps, it should not be an issue to create them concurrently. Read-write maps however seem like they could be problematic if they are not aligned to page size, i.e. if one underlying page is partially part of two maps.

@ckaran
Copy link

ckaran commented Oct 22, 2021

A memory map provides you with an instance of &mut [u8]. If you are able to ensure that only place in your code (and no other processes) create memory maps of the file in question, you should be able to perform the synchronisation based on that slice. (In this scenario, nothing anywhere else actually knows that the slice is backed by a memory map of a file. It is just virtual memory like any other memory.)

Thank you @adamreichold, that's exactly what I needed to know! Within my own process space, I can guarantee that instances of &mut [u8] never overlap any other slice anywhere, so as long as mmap is thread-safe, then that should be taken care of.

Read-write maps however seem like they could be problematic if they are not aligned to page size, i.e. if one underlying page is partially part of two maps.

That is a harder nut for me to crack. My goal is write an extremely fast serializer/deserializer similar in spirit to abomonation, but which is somewhat safer to use, correctly deals with cycles in object graphs, and whose serialized contents can be passed around between different platforms which may have different page sizes. That is a long-winded way of saying that I can't easily align to any given page size, as what works on one platform may not work on another. Do you know if being misaligned would cause any correctness issues? I can figure out a way around the speed problem, as long as the result is always correct.

@adamreichold
Copy link

Within my own process space, I can guarantee that instances of &mut [u8] never overlap any other slice anywhere, so as long as mmap is thread-safe, then that should be taken care of.

I am not sure we are talking about the same thing? I was suggest to restructure things so that there is only one instance of the Mmap type for each file and all code using this memory map really only uses &mut [u8] and does not know that this slice was derived from a memory map at all. Hence whether Mmap is Sync does not even enter the picture as only byte slices would be passed between tasks/threads. (The idea is to avoid any question on the guarantees that mmap provides by not using them at all. Whether you &mut [u8] was created by fs::read or Mmap should not matter in this approach.)

Do you know if being misaligned would cause any correctness issues? I can figure out a way around the speed problem, as long as the result is always correct.

Sorry, but I really do not know and hence would work on the assumption that it does.

@ckaran
Copy link

ckaran commented Oct 25, 2021

You're right that we weren't talking about the same thing originally, but I may be able to restructure what I want to do so that it fits in with what you're suggesting. My original plan was that I could have an arbitrarily large archive file, and multiple disjoint views on the file that are mmaped as disjoint &mut [u8] slices. The idea was that different writers would update different portions of the file at the same time. But as you say, this could be very, very messy.

I'll work on restructuring my design, probably with lots of much smaller files. That will ensure that the file handles are to entirely separate objects, with at most on mmaped region from each file at a time. At least accessing different files is thread-safe!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants