-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs seems to use more memory than it should #5035
Comments
@haasn |
A small update: After setting the ARC size limit to 8 GiB max, I've done a reboot. I'm now running a clean system with virtually no memory consumption from programs. (about 700 MiB from the browser I'm typing this in and basically nothing else) I've read a bunch of data from disk ( After doing this, it reports the current usage as 6 GiB (75% of the 8 GiB max), yet free -m considers my total used memory to be 10 GiB (32% used). Again I have about a 4 GiB deficit between what zfs claims to consume and what it actually consumes. Note: As an experiment, I tried removing my L2ARC devices, because I read that zfs needs to store tables of some sort to support them. However, this did not affect memory usage at all. (Unless I need to reboot for the change to take effect?) You can find my current arc_summary.py output here: https://0x0.st/MFJ.txt This is my current: /proc/buddyinfo:
Also worth noting is that I have two pools imported, although the second pool has After running (To work around this temporarily, and since the number seems to be fairly constant, I'm going to subtract 4 GiB from my normal P.s. I forgot to mention, I am on kernel 4.7.2 and spl/zfs git master. |
I decided to re-investigate this after fixing #5036 to eliminate that as the cause. Additionally, I am now testing on a stock kernel (not hardened) to eliminate more potential issues. Long story short: Problem persists, the difference between the actual and observed RAM is again almost exactly 4 GiB. (1.7 GiB is the total sum of all resident+shared memory currently in use, 6.91 GiB is what ARC reports → totals to 8.61 GiB, but |
It seems like this memory usage is slowly growing over time, while the node 0 memory fragmentation also grew (according to I'm slightly suspecting that there may be some sort of fragmentation-inducing memory leak in some SPL/ZFS component on my machine, since I haven't had these problems while running btrfs on the same hardware. same kernel version and doing the same things. |
Further update: I had a look through With this extra data accounted for, I'm only “missing” about 2G currently, which probably has some other similar explanation. It seems I was under the misguided assumption that ZFS would only use about as much RAM as I had ARC configured. Is it normal for ZFS to have several extra GB of slabs allocated for other purposes? Perhaps worth noting is that I am using SLAB instead of SLUB as my SLAB allocator, because I have previously observed it performing better for me under certain workloads, but it might be worth re-evaluating that assumption for ZFS. Edit: Unfortunately, it seems like the |
referencing: https://utcc.utoronto.ca/~cks/space/blog/linux/ZFSonLinuxMemoryWhere https://groups.google.com/a/zfsonlinux.org/forum/#!topic/zfs-discuss/tXHQPBE6uHg Where ZFS on Linux's memory usage goes and gets reported (I think) |
Interesting, that first article in particular pretty much completely answers all of the doubts and questions I had, and also helps me understand why ZFS is causing me so many out-of-memory style conditions when I had other programs running at the same time (i.e. I can't entirely dedicated my RAM to ARC and spl SLAB objects like the defaults seem to be tuned for). I'm considering this issue resolved unless I run into more troubles. Thanks for the pointers. |
For reference, PR #5009 which is actively being worked in a big step towards addressing these issues. |
@haasn take a look at: http://list.zfsonlinux.org/pipermail/zfs-discuss/2013-November/012336.html disabling transparent hugepages also might lessen memory fragmentation and consumption |
@kernelOfTruth I gave it a try. (I also got around to setting up monitoring/graphs so I could observe this over time) I tried out your suggestion by using Just now (at around 4:00 local time) I saw this graph and decided to re-enable As you can see, memory fragmentation immediately went down rather dramatically, at least for smaller fragments, despite practically no change in the amount of consumed memory (nor the distribution of memory). The number of free pages for large chunks still seems to be rather low compared to a fresh boot, but it's still higher than it was before |
Update: Seems to have been a fluke caused by switching the setting more than anything. Not an hour after enabling hugepages again, memory fragmentation has gotten even worse than before (free page count dropped to basically nothing). If I had to guess, I think what I'm seeing would be explained by changes to this variable only taking being taken into effect for new allocations, rather than existing ones - and enabling defragmentation caused a spike in the available free pages due to defragmenting all of the existing ones. Edit: That being said, I disabled it again and not much longer my free page count has skyrocketed again, so now I'm not really sure. These values are probably pretty unreliable at the moment either way since I'm rsyncing some data off old disks. I'll comment again when I can provide more concrete data. Edit 2: Confirmed that it was the rsync causing heavy memory pressure which increased my fragmentation, which I've cross-confirmed by looking at the overall memory usage and noticing that nearly all available memory was being used for internal caches. Seems that “memory fragmentation” graph really only considers free memory, rather than available memory. (Which is somewhat odd IMO, but oh well). Everything's fine again now. |
It seems this problem won't leave me alone. On my machine right now, 75% of my RAM is being used. (It was at 90% before I reduced my ARC size) Current ARC size is 8 GiB (25%). About 3 GiB of that is applications, and another 3 GiB was in tmpfs. This memory (which I can directly account for) adds up to 14 GiB (43%), leaving behind 10 GiB of memory in use by zfs's various slabs. Any tips on how I can track down why exactly they are being used, and ideally, limit them? I wonder what would have happened if I had less available RAM to begin with. Would zfs have exploded, or would it have self-limited? If the latter, can I do this manually? |
@haasn is it possible for you to test out current master ? I've pre-set 10GB for ARC but the following currently is used while transferring 2.5 TB of data from one ZFS pool to another (via rsync):
meaning, it hovered between 2.5 to 3.7 GB, when adding the other memory consumption it is probably still less than 10 GB, so it means either the compressed ARC makes usage really efficient and/or it is now also able to stick way more exactly to the preset limits Since you mentioned rsync: linking https://github.com/Feh/nocache and http://insights.oetiker.ch/linux/fadvise/ here |
@kernelOfTruth I can upgrade. Right now I'm on commit 9907cc1 (and openzfs/spl@aeb9baa), is there a commit in particular that you think will help? One thing I noticed while inspecting Edit: I just realized, spl_kmem_cache_slab_limit is a lower limit, not an upper limit. |
@haasn the changes since September 7th, specifically: spl basically tag 0.7.0-rc1 (September 7th). Make sure to have recent backups of your data (just in case, which is actually always a good idea & practice) |
As an example: current arc stats (after several hundreds of GB of data transferred)
plus
which adds around 2-3 GB to the existing accounted 3 GB in arcstats, which is close to 6 GB still significantly lower than 10 GB (the set limit) |
@haasn it seems that the conservative mem usage after the compressed ARC patches was rather a regression than fundamental change in behavior: #5128 Poor cache performance So the only solution right now is to e.g. set ARC size at approx. 40% when you want it to occupy for example 50% of your RAM |
The According to this graph, which does not seem to be displaying my ARC size (16 GB at the time of writing), I have 9-10 GB of memory spent on unreclaimable slab objects. What are these 10 GB currently doing? Is there any way I can introspect this figure further? |
Hi @haasn we encounterd the same problem with several servers, did you managed to find solution? |
@meteozond here is my solution to the problem: http://brockmann-consult.de/peter2/bc-zfsonlinux-memorymanagement2.tgz it will loop forever and keep tuning and dropping caches if used RAM gets too high. Both of my large 36 disk box ZoL machines hang if I don't run this. I wrote this in bash years ago and recently redid it in python3 to fix the float handling and exceptions. |
@meteozond @haasn any update on this problem? I use When I run: @petermaloney is Your script doing anything fancier than that? PS: I'm on Ubuntu 16.04 + zfsutils-linux 0.6.5.6-0ubuntu17 |
@lechup The source is there for you to read what it does (not sure if you need to know python to understand it). The script is fancier than that, yes. What it does is constantly manages the I found that setting those module parameters one time doesn't work well... a low value might still end up using all your RAM still, or a not so high value might still not use enough RAM for best performance. Or a value that works well sometimes might not work well other times. So this script keeps your free RAM around 10%. The machine I originally wrote it for was very slow if I didn't use enough RAM, so it was very important to use lots when available, so this strategy was very effective. The script will also use |
@petermaloney thanks for sharing Your code and explainng how it works - I'll give it a shot! |
@petermaloney your link is dead? do you have it on github somewhere? |
@gdevenyi the path changed slightly https://www.brockmann-consult.de/peter2/zfs/bc-zfsonlinux-memorymanagement2.tgz and BTW there's a hang bug in the zfs version which is the one used on Ubuntu 16.04 where I think it might not get triggered if you lower the meta limit (my script set it very generously)... (see 25458cb ) so to maybe prevent that (still testing...) patch the script like: -meta_limit = int((limit_gb-2)102410241024) |
Thanks @petermaloney I'm still on 0.6.5.11. I just started having these exploding memory usages, OOM on a fileserver that had been running fine for years. Right now I'm experiencing huge zfs slabs for unknown reasons. The only thing thats saved me is greatly relaxing arc_min so that it can shrink. Interestingly your script does absolutely nothing for me. I guess I'll wait another month to see of the 0.7.x series is finally stable since new features are only being added to 0.8.x now. |
I'm running into many memory-related issues since switching to ZFS, including instances where the oom killer triggered despite plenty free memory being available, and instances where programs fail due to out of memory conditions.
For an example of a failure, when trying to recreate my initramfs:
I've experienced this failure multiple times already. In every case, reducing the ARC size (i.e. temporarily reducing
zfs_arc_max
) solves it, as does e.g.echo 2 > /proc/sys/vm/drop_caches
.At the time of this failure, my system reported about 80% memory being in use, and after the
echo 2
command described it went down to about 60%.The weird thing is that I can't explain this high memory usage. This is my current output of arc_summary.py: https://0x0.st/MFr.txt
As you can see, ARC claims to be using about 11 GiB. Tallying together the top memory-hungry processes in top gives me about 3 GiB at most. There are no significant amounts of data in tmpfs either.
Together, this means that my system should be consuming 11+3 = 14 GiB of memory, meaning my usage should be 14/32 ≈ 43%, rather than 60%. Why does
free -m
report almost 19 GiB used? Where are the missing 5 GiB being accounted for? I've never had this weird issue before ZFS, nor have I ever “run out” of memory before ZFS.I'm considering drastically reducing the ARC size as a temporary measure at least until this issue can be tracked down and fixed, wherever it comes from.
The text was updated successfully, but these errors were encountered: