New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decommit unused virtual memory unless overcommit is enabled #255

Closed
glandium opened this Issue Aug 12, 2015 · 19 comments

Comments

Projects
None yet
5 participants
@glandium
Contributor

glandium commented Aug 12, 2015

This is something that I either failed to report or can't find anymore, but I stumbled upon this while going through the jemalloc3-related issues on bugzilla.mozilla.org.

The symptoms are that memory usage is not shrinking after tabs are closed in Firefox. Here is what I wrote back then, after investigating the issue on Windows:

This turns out to be due to the fact that VirtualAlloc's MEM_RESETted pages don't go away from the resident set. In mozjemalloc, we have decommit to take care of that. Replacing MEM_RESET with a sequence of MEM_DECOMMIT/MEM_COMMIT does make the resident count go close to what it is with mozjemalloc, but the private number is still high. My guess is that this is because that memory is committed, which, come to think of it, is not really ideal.

All in all, this just means we do need decommit for windows in the end. And the other RSS regression in bug 1138999, for mac, is likely something similar. We have double purge for that in mozjemalloc.

It's always been a pita that we had two different systems to overcome essentially the same core problem in mozjemalloc, and it's an occasion to make things right in jemalloc. I'll talk with jasone so that we can come up with something that works for both.

Coincidentally, all the current work around generalized commit/decommit hooks can probably help us work around the issue.

Following is the crude patch I wrote back then:

diff --git a/configure.ac b/configure.ac
index 4ac7ac8..b22e8fd 100644
--- a/configure.ac
+++ b/configure.ac
@@ -262,7 +262,7 @@ case "${host}" in
   *-*-darwin* | *-*-ios*)
    CFLAGS="$CFLAGS"
    abi="macho"
-   AC_DEFINE([JEMALLOC_PURGE_MADVISE_FREE], [ ])
+   AC_DEFINE([JEMALLOC_PURGE_USE_MMAP], [ ])
    RPATH=""
    LD_PRELOAD_VAR="DYLD_INSERT_LIBRARIES"
    so="dylib"
diff --git a/include/jemalloc/internal/jemalloc_internal_defs.h.in b/include/jemalloc/internal/jemalloc_internal_defs.h.in
index 191abc5..5682096 100644
--- a/include/jemalloc/internal/jemalloc_internal_defs.h.in
+++ b/include/jemalloc/internal/jemalloc_internal_defs.h.in
@@ -198,12 +198,14 @@
  *   madvise(..., MADV_DONTNEED) : On Linux, this immediately discards pages,
  *                                 such that new pages will be demand-zeroed if
  *                                 the address region is later touched.
- *   madvise(..., MADV_FREE) : On FreeBSD and Darwin, this marks pages as being
- *                             unused, such that they will be discarded rather
- *                             than swapped out.
+ *   madvise(..., MADV_FREE) : On FreeBSD, this marks pages as being unused,
+ *                             such that they will be discarded rather than
+ *                             swapped out.
+ *   mmap(..., MAP_FIXED | ...) : On Darwin, this immediately discards pages.
  */
 #undef JEMALLOC_PURGE_MADVISE_DONTNEED
 #undef JEMALLOC_PURGE_MADVISE_FREE
+#undef JEMALLOC_PURGE_USE_MMAP

 /* Define if operating system has alloca.h header. */
 #undef JEMALLOC_HAS_ALLOCA_H
diff --git a/src/chunk_mmap.c b/src/chunk_mmap.c
index 7e02c10..712d1bb 100644
--- a/src/chunk_mmap.c
+++ b/src/chunk_mmap.c
@@ -119,8 +119,14 @@ pages_purge(void *addr, size_t length)
    bool unzeroed;

 #ifdef _WIN32
-   VirtualAlloc(addr, length, MEM_RESET, PAGE_READWRITE);
-   unzeroed = true;
+   VirtualFree(addr, length, MEM_DECOMMIT);
+   VirtualAlloc(addr, length, MEM_COMMIT, PAGE_READWRITE);
+   unzeroed = false;
+#elif defined(JEMALLOC_PURGE_USE_MMAP)
+   void *new_addr = mmap(addr, length, PROT_READ | PROT_WRITE,
+                         MAP_FIXED | MAP_PRIVATE | MAP_ANON, -1, 0);
+   assert(new_addr == addr);
+   unzeroed = false;
 #elif defined(JEMALLOC_HAVE_MADVISE)
 #  ifdef JEMALLOC_PURGE_MADVISE_DONTNEED
# define JEMALLOC_MADV_PURGE MADV_DONTNEED
@thestinger

This comment has been minimized.

Contributor

thestinger commented Aug 12, 2015

MEM_RESET / MADV_FREE are designed to perform lazy free, which is significantly faster as the pages are only dropped if there's memory pressure. It means not paying the high price of page faults unless the system actually ran out of memory. The operating systems tend to provide poor measurement tools so it's difficult to gauge memory usage due to memory shared between processes and lazily freed pages. Android actually provides a proper pss measurement and expect that they'll incorporate a sane way of measuring MADV_FREE memory once it lands (it's in linux-next atm).

@thestinger

This comment has been minimized.

Contributor

thestinger commented Aug 12, 2015

It definitely shouldn't stop doing lazy free by default, as performance matters more than perception of memory usage (there's no change to actual memory usage).

@thestinger

This comment has been minimized.

Contributor

thestinger commented Aug 12, 2015

Using mmap or MEM_COMMIT / MEM_DECOMMIT is also serialized vs. the parallelism of the lazy free mechanisms. On Linux, MADV_DONTNEED is similarly parallel but non-lazy, although there will be MADV_FREE (hopefully) soon too.

@jasone

This comment has been minimized.

Member

jasone commented Aug 12, 2015

This was #206. I don't have a good way to test a Windows-specific decommit patch, but I think the decommit code works correctly as of 1f27abc, so it would probably be simple to develop a patch. I just committed 03bf5b6 so that new chunks can be decommitted (still testing more thoroughly). As far as I know, the only necessary additional change is to implement Windows-specific decommit/commit code in pages_commit_impl().

We're in agreement that nothing needs to change on OS X, right?

@thestinger

This comment has been minimized.

Contributor

thestinger commented Aug 12, 2015

@jasone: You could MAP_FIXED with PROT_NONE pages on Linux to test something that's nearly equivalent. I think it'd be useful as an opt-in feature for systems without overcommit and little / no swap as toggling off write access removes commit charge (like MEM_DECOMMIT), although those are very rare (but it'd make jemalloc more suited to some embedded environments). It would catch (nearly) all crashes that'd occur on Windows due to missed re-commit, etc.

@jasone

This comment has been minimized.

Member

jasone commented Aug 12, 2015

@thestinger, you mean something like the code I just disabled?

@thestinger

This comment has been minimized.

Contributor

thestinger commented Aug 12, 2015

I mean having all three of these functions across platforms:

  • memory_purge:
    • Windows: MEM_PURGE
    • *nix: MADV_FREE (MADV_DONTNEED if unavailable)
  • memory_decommit
    • Windows: MEM_DECOMMIT
    • *nix: mmap with MAP_FIXED / PROT_NONE
  • memory_commit
    • Windows: MEM_COMMIT
    • *nix: mprotect it back to PROT_READ|PROT_WRITE

It would use memory_purge by default on systems with overcommit and the memory_decommit / memory_commit pair on systems without it. It would be nice to still have the option of lazy free on Windows for systems with lots of swap (to allow lots of commit charge, even though it won't be used).

Toggling on PROT_NONE makes measurements significantly easier + means good support for systems where overcommit is disabled (it's not a given on Linux).

@thestinger

This comment has been minimized.

Contributor

thestinger commented Aug 12, 2015

@jasone: Yeah, that looks like it.

@jasone

This comment has been minimized.

Member

jasone commented Aug 12, 2015

Cool. In the future I think we should detect whether overcommit is active and do the right thing if not, but right now I'm just trying to get 4.0.0 released.

@thestinger

This comment has been minimized.

Contributor

thestinger commented Aug 12, 2015

@jasone: The one other change that'd be useful is passing MAP_NORESERVE to mmap on Linux like glibc does for arenas (#193) (note that the mmap(2) documentation is wrong, proc(5) gets it right). It's a no-op when overcommit is disabled (i.e. proper memory accounting) but works around the dumb heuristics in the default mode. On a system with 4GiB of memory, a process with 2.5GiB of writeable mappings via jemalloc will fail to fork with the default "heuristic" overcommit mode even if it's nearly all unused. It's just supposed to catch obvious mistakes but jemalloc intentionally holds onto lots of VM.

There's no rush to get this worked out though since it has always been an issue.

@glandium

This comment has been minimized.

Contributor

glandium commented Aug 25, 2015

FWIW, enabling the pages_commit/decommit code has some serious performance impact on the various Firefox benchmarks.

@rustyx

This comment has been minimized.

Contributor

rustyx commented Jan 26, 2016

Would it be possible to add a "high watermark" parameter in order to limit the amount of unused pages? Or create some sort of "garbage collection" function?

The amount of mapped pages just keeps on growing.

Peak usage:
Allocated: 14797180656, active: 14933364736, mapped: 15506341888

In the beginning the mapped memory does go down:
Allocated: 8635352528, active: 8841318400, mapped: 9743368192

But after a while:
Allocated: 8508467016, active: 9471238144, mapped: 16479420416

@thestinger

This comment has been minimized.

Contributor

thestinger commented Jan 26, 2016

It can only unmap chunks (4M naturally aligned regions), so they need to be entirely empty for that to happen. It's not really relevant on 64-bit. Virtual memory is a per-process resource and there's plenty of it to spare. Purging happens in multiples of the page size so it's much finer grained. This issue was about supporting Windows-style decommit where the memory is immediately dropped and marked as unusable rather than the significantly more efficient lazy purging (MADV_FREE / MEM_RESET). The only reasons to fully decommit are to make memory usage easier to measure (it doesn't actually drop the memory usage) and to reduce the accountable memory for systems without overcommit (if it's usable and writeable, it counts towards the memory limit, but simply being mapped does not).

@thestinger

This comment has been minimized.

Contributor

thestinger commented Jan 26, 2016

The mapped memory will definitely tend to increase over time due to fragmentation. It's only a measurement of virtual memory though. It's not a measurement of memory usage. Even if you're worried about virtual memory exhaustion on 32-bit, the amount of mapped memory isn't all that useful since the location of the mappings matters even more due to fragmentation of the address space. The fragmentation is lowest if it never unmaps but then it never drops from the peak usage level (not an issue in normal programs).

@rustyx

This comment has been minimized.

Contributor

rustyx commented Jan 26, 2016

OK clear. BTW the new feature does seem to work, at least now the "Working set" is better reflecting the "active" memory (before that it was the same as "Commit size"). But this version (MEM_DECOMMIT) seems on average about 8% slower than before (with MEM_RESET).
I wonder if it's possible to not decommit right away, but bundle the calls (especially given that MEM_DECOMMIT can decommit multiple pages at once).

@thestinger

This comment has been minimized.

Contributor

thestinger commented Jan 26, 2016

It does perform calls on ranges as large as possible. Using MEM_RESET will be inherently faster because it marks the memory to be freed only if there's memory pressure. If there isn't memory pressure, the OS doesn't need to drop it. It also still counts towards the overall memory limit whether or not it's actually used since Windows doesn't do overcommit. That's essentially an arbitrary limit though since it can be raised incredibly high as long as there's enough storage for backing the memory reservations.

@rustyx

This comment has been minimized.

Contributor

rustyx commented Jan 26, 2016

Disagree with the last statement. The limit can be (and is for us) quite well-defined, as our policy enforces a fixed page file of 60% of the amount of RAM. That results in an OOM when I just keep on doing MEM_RESET because the overall commit is quickly exceeded. Not to mention that Windows uses uncommitted memory for disk caching, meaning that MEM_RESET (in comparison with decommit) is essentially robbing Windows of disk cache, because it never swaps reset pages out to disk so long as there is enough RAM for all apps.

@jasone jasone changed the title from Memory is not decommitted on Windows and maybe Mac to Decommit unused virtual memory unless overcommit is enabled Mar 15, 2016

@jasone jasone added this to the 4.2.0 milestone Mar 15, 2016

@cpeterso

This comment has been minimized.

Contributor

cpeterso commented Mar 29, 2016

@jasone

This comment has been minimized.

Member

jasone commented May 6, 2016

This issue is a subset of #193; closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment