Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPL: MMT it's been an hour since the last reap #12

Closed
ylluminate opened this issue Jul 27, 2016 · 26 comments

Comments

@ylluminate
Copy link

commented Jul 27, 2016

I have been noticing that my ZFS volume has become unresponsive somewhat and upon rebooting I received the following messages:

SPL: MMT it's been an hour since the last reap, vm_page_free_count == 12716231
SPL: reap thread, segkmem_total_mem_allocated delta 846462976 since 3596 seconds ago

I'm a little concerned because when I rebooted, I am still running into some hangs in the middle of copying some files. I'm using the most recent source from master at present.

@ylluminate

This comment has been minimized.

Copy link
Author

commented Jul 27, 2016

I stepped back to the 1.5.2 release and still seeing the same kind of hangs, although it has not been long enough to see if the initial message above is reported on reboot.

@brendonhumphrey

This comment has been minimized.

Copy link
Contributor

commented Jul 28, 2016

Those are harmless messages.

On 28 Jul 2016, at 12:19 AM, George Plymale notifications@github.com wrote:

I stepped back to the 1.5.2 release and still seeing the same kind of hangs, although it has not been long enough to see if the initial message above is reported on reboot.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #12 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AFWzn46R17RdseRFJPauVjB-YGqGVSLsks5qZ2jagaJpZM4JV6qx.

@brendonhumphrey

This comment has been minimized.

Copy link
Contributor

commented Jul 28, 2016

Also did you pick up the SPL change that was made ~24 hrs ago?

On 28 Jul 2016, at 12:19 AM, George Plymale notifications@github.com wrote:

I stepped back to the 1.5.2 release and still seeing the same kind of hangs, although it has not been long enough to see if the initial message above is reported on reboot.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #12 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AFWzn46R17RdseRFJPauVjB-YGqGVSLsks5qZ2jagaJpZM4JV6qx.

@ylluminate

This comment has been minimized.

Copy link
Author

commented Jul 28, 2016

No, I've not grabbed those. Let me take a look. I went back to the older driver in hopes things would improve, but to no avail. What concerned me was the fact that it wasn't releasing after long periods of time and some degree of inactivity. I probably should make some kind of a shell script that forces it to feel the pressure and release...

@brendonhumphrey

This comment has been minimized.

Copy link
Contributor

commented Jul 29, 2016

I see. “An hour since last reap” is behaving normally. That is not changing for now. Its just idle chitchat in the logs.

Brendon

On 29 Jul 2016, at 5:34 AM, George Plymale notifications@github.com wrote:

No, I've not grabbed those. Let me take a look. I went back to the older driver in hopes things would improve, but to no avail. What concerned me was the fact that it wasn't releasing after long periods of time and some degree of inactivity. I probably should make some kind of a shell script that forces it to feel the pressure and release...


You are receiving this because you commented.
Reply to this email directly, view it on GitHub #12 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AFWzn09VjeD1WCH3QbinnLTDSmpoKKudks5qaQRCgaJpZM4JV6qx.

@brendonhumphrey

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2016

If you're keen try the code in the SPL branch 'vmem_experiments' we would like some feedback as to how that works for you.

@ylluminate

This comment has been minimized.

Copy link
Author

commented Jul 30, 2016

Absolutely. Can you tell me what to expect and what the goal is? From the look of the commits it seems fairly stable so I don't think it's likely to produce panics is it?

@ilovezfs

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2016

You should expect 🌈 💐 ☀️

@ylluminate

This comment has been minimized.

Copy link
Author

commented Jul 30, 2016

/me switches to :godmode:

HERE WE GO!

@brendonhumphrey

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2016

I've been running it for a month without reboot, it seems ok. Just alters some allocator details, maybe works better. Dont want to cloud your judgement by making claims. Do whatever you did to create your performance stalls or whatever they were.

@brendonhumphrey

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2016

Did you try it?

@ylluminate

This comment has been minimized.

Copy link
Author

commented Jul 31, 2016

I've actually got a heavy copy process going on it right now that should have prompted the issue I was having earlier. I've actually cranked it up and have 6 rsyncs going full bore and while it has maxed out my memory usage, my system is still semi usable. I have two different volumes copying via these multiple threads onto the raid-z. So yeah, so far so good... I wish I could cut down the memory consumption further though. Like I said, it's consuming everything and using swap at the moment on a 64GB system (htop is showing kernel_task is at 103 GB virt).

@brendonhumphrey

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2016

ok sounds promising. If you want to limit memory use, why dont you set the arc max?

@ylluminate

This comment has been minimized.

Copy link
Author

commented Jul 31, 2016

Well that's what is confusing @brendonhumphrey, I do have kstat.zfs.darwin.tunable.zfs_arc_max=8589934592 set in /etc/zfs/zsysctl.conf, yet that still happened.

@brendonhumphrey

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2016

That is a similar setting to what I have, and the machine is obediently keeping memory use in check. You havent got any other settings in zsysctl.conf have you? I accidentally inherited something from ilovezfs' example script that blocked zfs_arc_max from working. I'm not near the machine, so cant quote the exact setting.

you could also do sysctl -a | grep spl. One of the sysctls (theres only a few, so it should be obvious which one) tells how much memory we have allocated in total. You can use that to check that it is o3x that is occupying the memory.

@ylluminate

This comment has been minimized.

Copy link
Author

commented Aug 1, 2016

Hmm, that's the only line in zsysctl.conf, and yes, I do use @ilovezfs's script, so I must have also inherited that setting somewhere... What may be doing that is not readily popping out at me from the settings so I'll hang tight until you can pull it up...

@brendonhumphrey

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2016

Ok, you really dont want this setting (assuming you pulled it in from the .example file):

10 Mar 2015; ilovezfs
Raise zfs_arc_meta_limit to 3/4 (instead of 1/4) of zfs_arc_max.
3/4 * (11 * 2^30) = 8,858,370,048
But let's use hexadecimal this time.
8,858,370,048 = 0x210000000
kstat.zfs.darwin.tunable.zfs_arc_meta_limit=0x210000000

also kstat.zfs.misc.arcstats.c_max is the actual arc max i.e. check after setting the tunable.

@ylluminate

This comment has been minimized.

Copy link
Author

commented Aug 1, 2016

Interesting, so essentially I should use:
kstat.zfs.darwin.tunable.zfs_arc_meta_limit=0x210000000
instead of:
kstat.zfs.darwin.tunable.zfs_arc_max=8589934592

Or should BOTH be in there? What's the purpose of this new setting in comparison with the old wrt this "meta" vs actual limit?

@brendonhumphrey

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2016

Ah no. Sorry if I wasnt clear. if you have
kstat.zfs.darwin.tunable.zfs_arc_meta_limit=0x210000000
then
kstat.zfs.darwin.tunable.zfs_arc_max=8589934592 wont work.

You can verify what arc max is after setting the tunable by checking kstat.zfs.misc.arcstats.c_max.

Also, the total memory in use is kstat.spl.misc.spl_misc.os_mem_alloc.

@ylluminate

This comment has been minimized.

Copy link
Author

commented Aug 1, 2016

Alright, so at present I only have kstat.zfs.darwin.tunable.zfs_arc_max=8589934592, so I am removing that and inserting this kstat.zfs.darwin.tunable.zfs_arc_meta_limit=0x210000000 and will monitor the levels. This will be the only line in zsysctl.conf. Will keep you posted.

@ylluminate

This comment has been minimized.

Copy link
Author

commented Aug 1, 2016

I guess that last setting must have been the wrong move since the kernel_task memory levels are already going well past the threshold very rapidly. Already up to ~20GB after just running rsync for a few minutes.

@rottegift

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2016

Try this script.

DON'T adjust to taste, don't mess around. Tell us the results of running EXACTLY this as superuser as soon as you can after boot time, assuming your recent problematic workload eventually causes problems again.

#!/bin/bash

export PATH=/usr/bin:/bin:/sbin:/usr/sbin

sysctl -w kstat.zfs.darwin.tunable.zfs_arc_min=$((512*1024**2+1*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_meta_min=$((256*1024**2+1*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_meta_limit=$((5*1000**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_max=$((11*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_dirty_data_max=$((512*1024**2))

IF you see the same problem after running this as superuser near boot-time, then paste in the exact output of

sysctl kstat.spl kstat.vmem.vmem.kmem_va.mem_inuse kstat.vmem.vmem.kmem_default.mem_inuse kstat.vmem.vmem.kmem_oversize.mem_inuse kstat.vmem.vmem.heap.mem_inuse; sysctl vm | egrep '^vm.page_(free_(wanted|count)|speculative|purgable_count)'

(and surround the result in a pair of lines containing three backticks for the sake of readability).

@ylluminate

This comment has been minimized.

Copy link
Author

commented Aug 1, 2016

So, FYI, I've done the following:

Created /etc/zfs/zsysctl.sh # with 'sudo chmod +x' with these contents:

#!/bin/bash

export PATH=/usr/bin:/bin:/sbin:/usr/sbin

sysctl -w kstat.zfs.darwin.tunable.zfs_arc_min=$((512*1024**2+1*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_meta_min=$((256*1024**2+1*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_meta_limit=$((5*1000**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_max=$((11*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_dirty_data_max=$((512*1024**2))

And created /Library/LaunchDaemons/com.openzfsonosx.zsysctlupdate.plist with contents:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>Label</key>
        <string>com.openzfsonosx.zsysctlupdate</string>
        <key>ProgramArguments</key>
        <array>
            <string>/etc/zfs/zsysctl.sh</string>
            <string>/etc/zfs</string>
        </array>
        <key>RunAtLoad</key>
        <true/>
        <key>KeepAlive</key>
        <false/>
    </dict>
</plist>
@rottegift

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2016

Well, you could just have run it by hand (sudo scriptname) when you get a login prompt, but it should (should!) work when executed early by launchd.

Make sure it does run via launchd; you can use the StandardErrorPath and StandardOutPath keys in your plist, and also grep SPL /var/log/system.log and check the output of "sysctl kstat.zfs.darwin.tunable" to make sure the values reflect the script.

Unfortunately if you're using launchd you may have a race with org.openzfsonosx.zconfigd.

(It's perfectly fine to run it by hand before there are more than 2GB in your ARC, and certainly before you start running your problem-reproducing workload. Running it by hand gives you confidence that the tunable values have been updated by the script.)

@ylluminate

This comment has been minimized.

Copy link
Author

commented Aug 1, 2016

Yes, understood. So far it's staying a lot more sane...

@rottegift

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2016

If it's making a difference this soon with a workload like "6 rsyncs going full bore" -- and on the assumption that each rsync is writing to a different dataset or zvol -- then the issue is probably that you're generating an enromous amount of dirty data (limited by the smallest of 80% of system memory; or ~10% maximum per active target dataset; or 4 GiB per target dataset or zvol) which takes time to drain, during which your reads into ARC will stall, and during which the ARC cannot shrink back based on userland pressure.

The "smallarc" script I gave you leads to a maximum of 512 MiB of dirty data in the ARC per target dataset or zvol summing to a maximum of about 10GiB total. You can even consider dialling down kstat.zfs.darwin.tunable.zfs_dirty_data_max even further (e.g. to 128 MiB) if your write loads will always be throughput-sensitive rather than latency-sensitive.

(However, if you have only one target zvol (or dataset) for all six rsyncs, it could be something else is going on in addition to huge amounts of dirty data.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.