Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAP_FIXED #162

Closed
kevinlawler opened this issue Oct 3, 2012 · 28 comments
Closed

MAP_FIXED #162

kevinlawler opened this issue Oct 3, 2012 · 28 comments
Labels

Comments

@kevinlawler
Copy link
Owner

As written, a bug?

https://github.com/kevinlawler/kona/blob/master/km.c


mmap(a+e,f,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_FIXED,-1,0) //Add pages to end

The problem is that mapping MAP_FIXED over an existing address space won't fail, it will just destroy the other mapping if it was close enough. The bug is if another mapping is close enough behind the expanding mapping.

(I am not sure how I missed this the first time around: maybe there is some other consideration that prevents this bug. It seems like the kind of thing I would've been wary of. Maybe I thought MAP_FIXED would fail on such allocations. Does it really always succeed? It would be nice to verify via test case on OS X, Linux.) ((Another explanation is that StackOverflow didn't exist when I did this, or didn't have much on the subject, and so my understanding was based on a flawed interpretation of the POSIX documentation.))

One potential fix:

http://stackoverflow.com/questions/6446101/how-do-i-choose-a-fixed-address-for-mmap#comment7573599_6446215

"An aside: MAP_FIXED can be used safely if you first make a single mapping without MAP_FIXED, then use MAP_FIXED to create various mappings at fixed addresses relative to one another over top of the original throw-away mapping that was at a kernel-assigned location. With a sufficiently large mapping size, this might solve OP's problem."

If this lets us map "anonymous memory" without ever actual allocating real RAM, then we could always reserve something very large for any mapped structure.

@kevinlawler kevinlawler mentioned this issue Oct 4, 2012
@silentbicycle
Copy link
Collaborator

I will have to double-check Stevens & etc. on this, but I'm pretty sure this can lead to portability problems. It may be free of edge cases on common platforms; whether this is a problem in practice and not just in theory is your call.

@silentbicycle
Copy link
Collaborator

We could use mremap when available (Linux only, AFAIK), but the most portable solution is to just copy to a larger chunk of mmap'd pooled memory. We want the same semantics as realloc, but for mmap'd memory.

On OSX (Mountain Lion), overlapping MAP_FIXED mmaps will clobber each other with zeroed out memory pages, as expected: git://gist.github.com/3889656.git

@kevinlawler
Copy link
Owner Author

I'm pretty sure this can lead to portability problems

Do you mean the solution suggested on StackOverflow or something else?

@silentbicycle
Copy link
Collaborator

I think silently clobbering existing mappings, the expected behavior, is bad for this use case. If some platform(s) give an error instead, that's helpful, but can't be depended upon.

@tavmem
Copy link
Collaborator

tavmem commented Mar 2, 2013

Checked both 32-bit Gentoo and OSX.
After removing "MAP_FIXED" parameter, I get correct results even when adding many pages to the end.
As mentioned by @silentbicycle above, this may be a problem just in theory, but not in practice.

@kevinlawler
Copy link
Owner Author

Removing MAP_FIXED introduces a new problem which is that the OS may add pages elsewhere besides the end of the object, which is what is expected. That would be bad since objects are assumed to be contiguous in memory.

(The test case to show how this fails is to map two objects contiguously in memory and then attempt to expand the first. Expected: With MAP_FIXED it clobbers the the second object. Expected: Without MAP_FIXED the tail of the first object is added to a random location in memory, potentially even before the object.)

Advised: restore MAP_FIXED, use a portable option from below to check beforehand for already-mapped areas. The result of finding an already-mapped area should be the same as if MAP_FAILED was the function return in the existing code.

Options:

  1. mincore see: http://stackoverflow.com/a/14956743/365478 or http://stackoverflow.com/a/8363219/365478
  2. msync see: http://stackoverflow.com/a/8368346/365478
  3. posix_mem_offset see above (currently not portable?)

What are the timings for each? The best choice is probably the fastest. StackOverflow commenters suggest mincore is best since it is lowest-level.

The previous idea to reserve large chunks of memory beforehand seems not as good.

@kevinlawler
Copy link
Owner Author

I think the above is the correct solution.

(Speculation: After implementing the above, to be perfectly correct we may need to use MAP_LOCKED or mlock on the first page of any file mmapped from disk. This is based on four untested assumptions: 1. anonymous memory never causes mincore/msync to announce that it isn't mapped. 2. mincore/msync, thinking the address is not in use, fails to return an error when looking at an address containing a page, originally mapped to a file on disk, which has not been used in a while after heavy mmap usage elsewhere. 3. operating systems do not wire in the first page of files by default. 4. locked memory does indeed play nicely with mincore/msync. The case, if it exists, would be that we map file X on disk, do a lot of work, potentially at the end of file X, do some more work appending to a separate object, and then that object extends over the beginning of X's mapping because mincore/msync do not alert us, the first page of X having been paged out. It is an odd corner case that we may never see, and no worse than the possible scenarios in the previous solutions.)

All of this is to get around not having a portable mremap.

@tavmem
Copy link
Collaborator

tavmem commented Mar 28, 2013

Is there any benefit in using mremap for Linux (and maybe for Cygwin), and an alternate solution for OSX?

@silentbicycle
Copy link
Collaborator

Potentially, but it needs to be balanced against added portability issues.

Scott

On Thu, Mar 28, 2013 at 9:30 AM, Tom Szczesny notifications@github.comwrote:

Is there any benefit in using mremap for Linux (and maybe for Cygwin), and
an alternate solution for OSX?


Reply to this email directly or view it on GitHubhttps://github.com//issues/162#issuecomment-15588256
.

@kevinlawler
Copy link
Owner Author

It's a neutral change.

NB. The full list of systems we claim to support is: OS X/Linux/BSD/Cygwin.

@tavmem
Copy link
Collaborator

tavmem commented Apr 2, 2013

FYI: Looks like Cygwin does not have mremap either.

@tavmem
Copy link
Collaborator

tavmem commented Apr 2, 2013

Suppose we use mincore(). Suppose the original k-structure uses m bytes, and we want to increase the size by an additional n bytes. We want to check that the additional n bytes do not contain an already-mapped area. (In other words, the additional n bytes needs to be completely unmapped.)
However, the documenation for mincore seems to indicate that the error ENOMEM will occur if the checked area (the additional n bytes) contains some unmapped memory.
Seems to me that we need a flag that indicates either 1) all of the n bytes are unmapped (OK), or 2) some of the n bytes are already mapped (problem).

@kevinlawler
Copy link
Owner Author

However, the documenation for mincore seems to indicate that the error ENOMEM will occur if the checked area (the additional n bytes) contains some unmapped memory.

Do you mean "contains some mapped memory"? If so, I think that gives us flag 2, and we are OK. If not, then you'll have to explain more since I don't understand.

@tavmem
Copy link
Collaborator

tavmem commented Apr 2, 2013

I did mean "contains some unmapped memory".
The wording in http://man7.org/linux/man-pages/man2/mincore.2.html is somewhat ambiguous: "ENOMEM addr to addr + length contained unmapped memory."
Does that mean "contained some unmapped" or "contained all unmapped"?
The wording in http://www.gsp.com/cgi-bin/man.cgi?section=2&topic=mincore
is much more specific:
"The virtual address range specified by the addr and len arguments is not fully mapped."
The problem is when the address range is "not fully mapped" does that mean that it is "fully unmapped", which is what we need to know.

@kevinlawler
Copy link
Owner Author

OK, I see. You're saying this could give flag 3: some of the n bytes are mincore-unmapped. This is sufficient if we add a single page at a time. For multiple pages the flag is not sufficient to know whether or not we can proceed, unless we check each page.

@kevinlawler
Copy link
Owner Author

If flag 3 is indeed the case, we can call mincore for each page we intend to add, and verify that they all throw errors because they are unmapped. Almost always when we call kexpander we are adding a page at a time using kap. The only way to add more than one page is to use kapn, which was included for completeness's sake. The only method using kapn is a printing method.

What is the timing like for calls to mincore? Is it fast enough to not matter or is it slow?

@tavmem
Copy link
Collaborator

tavmem commented Apr 3, 2013

Z K kapn_(K _a,V *v,I n) {
if(!a||!n)R 0;
K k=_a;
I t=k->t,m=k->n,p=m+n; ...

This comment simply documents my understanding as I go through the code:
Both kap: extern K kap(K*a,V v){R kapn_(a,&v,1);}
and kapn: extern K kapn(K *a,V v,I n){R kapn_(a,&v,n);}
call kapn_ which expands the k-structure, but not by pages.
kap adds 1 element to the k-structure (which may require no additional pages, or one additional page).
kapn adds n elements to the k-structure (which may require no, one, or multiple additional pages).
(It may be possible that kap requires multiple additional pages if the added single element is large.)

@kevinlawler
Copy link
Owner Author

That looks right to me. We can further assume that kap, if sufficient space does not already exist, requires at most one new page since the single element will always be small. The single element is bounded by the size of the atom types and the size of the object reference. These are all much smaller than the page size. (Mixed lists and strings store pointers.)

@kevinlawler
Copy link
Owner Author

(For specifics: I think the amount we can add is in [1, 8] bytes inclusive & tight, and that page sizes are essentially guaranteed to be much larger than that on all systems we currently support or intend to support in the future. Off the top of my head on Linux and OSX the page size is 4096. However, I think the discussion is moot since supporting kapn correctly means supporting arbitrary-count page additions.)

@tavmem
Copy link
Collaborator

tavmem commented Apr 3, 2013

The page size on Cygwin is 65536 = 2^16 (which is probably the page size on Windows).
Most of the discussion is moot. The key point (which I forgot): "Mixed lists and strings store pointers".

@tavmem
Copy link
Collaborator

tavmem commented Apr 9, 2013

Just to document the latest wrinkle on this issue: In preparation for using mincore to pre-verify that each page added is not already mapped, I checked to see whether the following lines (placed in kexpander in km.c) print "map_OK" in 32-bit-Linux, OSX and Cygwin:

if(MAP_FAILED!=mmap(a+e,f,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_FIXED,-1,0)) {O("map-OK\n");R 1;}//Add pages to end
else O("map-failed\n");

I get "map-OK" in OSX and Cygwin. I get "map-failed" in 32-bit-Linux.
Removing the "MAP_FIXED" flag results in "map-OK" on 32-bit-Linux.

Since I have not updated my version of 32-bit-Gentoo for over 6 months, I will attempt that and see if the problem disappears.

@kevinlawler
Copy link
Owner Author

That's strange. I'm not sure why the next block would already be mapped. Is it bad luck? Or does Linux automatically map the block after for some reason? Maybe a bad interaction with MAP_ANON?

Perhaps good justification on Linux to use mremap. Does mremap work correctly with MAP_ANON?

@tavmem
Copy link
Collaborator

tavmem commented Apr 19, 2013

On 32-bit-Linux:

  1. When mmap fails with MAP_FIXED, errno is set to ENOMEM.
  2. mremap does work correctly when the original mmap was set with MAP_ANON.
  3. I agree that we should use mremap for Linux.

tavmem added a commit that referenced this issue Apr 19, 2013
tavmem added a commit that referenced this issue Apr 19, 2013
@tavmem
Copy link
Collaborator

tavmem commented Apr 19, 2013

Latest wrinkle: mincore is not available in Cygwin.

@kevinlawler
Copy link
Owner Author

Hrm, I suppose since Cygwin is based on top of the memory manager for Windows. The Cygwin team seems to have successfully ported mmap to use MapViewOfFile. I can think of a variety of reasons why mincore might not be present. For one, as a system call, it might not make sense on Windows. Is msync present and will msync give us the same result? Alternatively, is there a Windows function that does the same thing, and can we call it from Cygwin C? For instance, this function http://msdn.microsoft.com/en-us/library/windows/desktop/aa366902(v=vs.85).aspx looks promising. Other related functions are listed on the same page. The MapViewOfFile function was a close mmap clone, so we might get lucky.

@tavmem
Copy link
Collaborator

tavmem commented Apr 19, 2013

msync is present in Cygwin.
Looks like we may implement 3 solutions (mincore for OSX, msync for Cygwin, mremap for Linux).

tavmem added a commit that referenced this issue Apr 24, 2013
@tavmem
Copy link
Collaborator

tavmem commented May 1, 2013

The above commit 3137f17 demonstrates that for OSX using mincore with mmap and MAP_FIXED is not a successful replacement for mremap.
You can check this yourself by compiling this commit and running /.k_test on OSX.
Although the test result is correct, it is achieved by copying the data.

If the extra space is unmapped, we are counting on mincore to fail with a return code of -1, and an errno set to ENOMEM. In fact, mincore does not fail, it has a return code of 0, and therefore errno is not set to ENOMEM.

tavmem added a commit that referenced this issue May 1, 2013
@tavmem
Copy link
Collaborator

tavmem commented May 1, 2013

posix_mem_offset does not appear to be implemented in Linux, Cygwin or OSX.
It is implemented in the QNX operating system.

@tavmem tavmem closed this as completed May 3, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants