Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPL: MMT it's been an hour since the last reap #12

Closed
ylluminate opened this issue Jul 27, 2016 · 26 comments
Closed

SPL: MMT it's been an hour since the last reap #12

ylluminate opened this issue Jul 27, 2016 · 26 comments

Comments

@ylluminate
Copy link

I have been noticing that my ZFS volume has become unresponsive somewhat and upon rebooting I received the following messages:

SPL: MMT it's been an hour since the last reap, vm_page_free_count == 12716231
SPL: reap thread, segkmem_total_mem_allocated delta 846462976 since 3596 seconds ago

I'm a little concerned because when I rebooted, I am still running into some hangs in the middle of copying some files. I'm using the most recent source from master at present.

@ylluminate
Copy link
Author

I stepped back to the 1.5.2 release and still seeing the same kind of hangs, although it has not been long enough to see if the initial message above is reported on reboot.

@brendonhumphrey
Copy link
Contributor

Those are harmless messages.

On 28 Jul 2016, at 12:19 AM, George Plymale notifications@github.com wrote:

I stepped back to the 1.5.2 release and still seeing the same kind of hangs, although it has not been long enough to see if the initial message above is reported on reboot.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #12 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AFWzn46R17RdseRFJPauVjB-YGqGVSLsks5qZ2jagaJpZM4JV6qx.

@brendonhumphrey
Copy link
Contributor

Also did you pick up the SPL change that was made ~24 hrs ago?

On 28 Jul 2016, at 12:19 AM, George Plymale notifications@github.com wrote:

I stepped back to the 1.5.2 release and still seeing the same kind of hangs, although it has not been long enough to see if the initial message above is reported on reboot.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #12 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AFWzn46R17RdseRFJPauVjB-YGqGVSLsks5qZ2jagaJpZM4JV6qx.

@ylluminate
Copy link
Author

No, I've not grabbed those. Let me take a look. I went back to the older driver in hopes things would improve, but to no avail. What concerned me was the fact that it wasn't releasing after long periods of time and some degree of inactivity. I probably should make some kind of a shell script that forces it to feel the pressure and release...

@brendonhumphrey
Copy link
Contributor

brendonhumphrey commented Jul 29, 2016

I see. “An hour since last reap” is behaving normally. That is not changing for now. Its just idle chitchat in the logs.

Brendon

On 29 Jul 2016, at 5:34 AM, George Plymale notifications@github.com wrote:

No, I've not grabbed those. Let me take a look. I went back to the older driver in hopes things would improve, but to no avail. What concerned me was the fact that it wasn't releasing after long periods of time and some degree of inactivity. I probably should make some kind of a shell script that forces it to feel the pressure and release...


You are receiving this because you commented.
Reply to this email directly, view it on GitHub #12 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AFWzn09VjeD1WCH3QbinnLTDSmpoKKudks5qaQRCgaJpZM4JV6qx.

@brendonhumphrey
Copy link
Contributor

If you're keen try the code in the SPL branch 'vmem_experiments' we would like some feedback as to how that works for you.

@ylluminate
Copy link
Author

Absolutely. Can you tell me what to expect and what the goal is? From the look of the commits it seems fairly stable so I don't think it's likely to produce panics is it?

@ilovezfs
Copy link
Contributor

You should expect 🌈 💐 ☀️

@ylluminate
Copy link
Author

/me switches to :godmode:

HERE WE GO!

@brendonhumphrey
Copy link
Contributor

brendonhumphrey commented Jul 30, 2016

I've been running it for a month without reboot, it seems ok. Just alters some allocator details, maybe works better. Dont want to cloud your judgement by making claims. Do whatever you did to create your performance stalls or whatever they were.

@brendonhumphrey
Copy link
Contributor

Did you try it?

@ylluminate
Copy link
Author

ylluminate commented Jul 31, 2016

I've actually got a heavy copy process going on it right now that should have prompted the issue I was having earlier. I've actually cranked it up and have 6 rsyncs going full bore and while it has maxed out my memory usage, my system is still semi usable. I have two different volumes copying via these multiple threads onto the raid-z. So yeah, so far so good... I wish I could cut down the memory consumption further though. Like I said, it's consuming everything and using swap at the moment on a 64GB system (htop is showing kernel_task is at 103 GB virt).

@brendonhumphrey
Copy link
Contributor

ok sounds promising. If you want to limit memory use, why dont you set the arc max?

@ylluminate
Copy link
Author

Well that's what is confusing @brendonhumphrey, I do have kstat.zfs.darwin.tunable.zfs_arc_max=8589934592 set in /etc/zfs/zsysctl.conf, yet that still happened.

@brendonhumphrey
Copy link
Contributor

brendonhumphrey commented Aug 1, 2016

That is a similar setting to what I have, and the machine is obediently keeping memory use in check. You havent got any other settings in zsysctl.conf have you? I accidentally inherited something from ilovezfs' example script that blocked zfs_arc_max from working. I'm not near the machine, so cant quote the exact setting.

you could also do sysctl -a | grep spl. One of the sysctls (theres only a few, so it should be obvious which one) tells how much memory we have allocated in total. You can use that to check that it is o3x that is occupying the memory.

@ylluminate
Copy link
Author

Hmm, that's the only line in zsysctl.conf, and yes, I do use @ilovezfs's script, so I must have also inherited that setting somewhere... What may be doing that is not readily popping out at me from the settings so I'll hang tight until you can pull it up...

@brendonhumphrey
Copy link
Contributor

brendonhumphrey commented Aug 1, 2016

Ok, you really dont want this setting (assuming you pulled it in from the .example file):

10 Mar 2015; ilovezfs
Raise zfs_arc_meta_limit to 3/4 (instead of 1/4) of zfs_arc_max.
3/4 * (11 * 2^30) = 8,858,370,048
But let's use hexadecimal this time.
8,858,370,048 = 0x210000000
kstat.zfs.darwin.tunable.zfs_arc_meta_limit=0x210000000

also kstat.zfs.misc.arcstats.c_max is the actual arc max i.e. check after setting the tunable.

@ylluminate
Copy link
Author

Interesting, so essentially I should use:
kstat.zfs.darwin.tunable.zfs_arc_meta_limit=0x210000000
instead of:
kstat.zfs.darwin.tunable.zfs_arc_max=8589934592

Or should BOTH be in there? What's the purpose of this new setting in comparison with the old wrt this "meta" vs actual limit?

@brendonhumphrey
Copy link
Contributor

Ah no. Sorry if I wasnt clear. if you have
kstat.zfs.darwin.tunable.zfs_arc_meta_limit=0x210000000
then
kstat.zfs.darwin.tunable.zfs_arc_max=8589934592 wont work.

You can verify what arc max is after setting the tunable by checking kstat.zfs.misc.arcstats.c_max.

Also, the total memory in use is kstat.spl.misc.spl_misc.os_mem_alloc.

@ylluminate
Copy link
Author

Alright, so at present I only have kstat.zfs.darwin.tunable.zfs_arc_max=8589934592, so I am removing that and inserting this kstat.zfs.darwin.tunable.zfs_arc_meta_limit=0x210000000 and will monitor the levels. This will be the only line in zsysctl.conf. Will keep you posted.

@ylluminate
Copy link
Author

I guess that last setting must have been the wrong move since the kernel_task memory levels are already going well past the threshold very rapidly. Already up to ~20GB after just running rsync for a few minutes.

@rottegift
Copy link
Contributor

Try this script.

DON'T adjust to taste, don't mess around. Tell us the results of running EXACTLY this as superuser as soon as you can after boot time, assuming your recent problematic workload eventually causes problems again.

#!/bin/bash

export PATH=/usr/bin:/bin:/sbin:/usr/sbin

sysctl -w kstat.zfs.darwin.tunable.zfs_arc_min=$((512*1024**2+1*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_meta_min=$((256*1024**2+1*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_meta_limit=$((5*1000**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_max=$((11*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_dirty_data_max=$((512*1024**2))

IF you see the same problem after running this as superuser near boot-time, then paste in the exact output of

sysctl kstat.spl kstat.vmem.vmem.kmem_va.mem_inuse kstat.vmem.vmem.kmem_default.mem_inuse kstat.vmem.vmem.kmem_oversize.mem_inuse kstat.vmem.vmem.heap.mem_inuse; sysctl vm | egrep '^vm.page_(free_(wanted|count)|speculative|purgable_count)'

(and surround the result in a pair of lines containing three backticks for the sake of readability).

@ylluminate
Copy link
Author

So, FYI, I've done the following:

Created /etc/zfs/zsysctl.sh # with 'sudo chmod +x' with these contents:

#!/bin/bash

export PATH=/usr/bin:/bin:/sbin:/usr/sbin

sysctl -w kstat.zfs.darwin.tunable.zfs_arc_min=$((512*1024**2+1*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_meta_min=$((256*1024**2+1*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_meta_limit=$((5*1000**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_max=$((11*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_dirty_data_max=$((512*1024**2))

And created /Library/LaunchDaemons/com.openzfsonosx.zsysctlupdate.plist with contents:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>Label</key>
        <string>com.openzfsonosx.zsysctlupdate</string>
        <key>ProgramArguments</key>
        <array>
            <string>/etc/zfs/zsysctl.sh</string>
            <string>/etc/zfs</string>
        </array>
        <key>RunAtLoad</key>
        <true/>
        <key>KeepAlive</key>
        <false/>
    </dict>
</plist>

@rottegift
Copy link
Contributor

Well, you could just have run it by hand (sudo scriptname) when you get a login prompt, but it should (should!) work when executed early by launchd.

Make sure it does run via launchd; you can use the StandardErrorPath and StandardOutPath keys in your plist, and also grep SPL /var/log/system.log and check the output of "sysctl kstat.zfs.darwin.tunable" to make sure the values reflect the script.

Unfortunately if you're using launchd you may have a race with org.openzfsonosx.zconfigd.

(It's perfectly fine to run it by hand before there are more than 2GB in your ARC, and certainly before you start running your problem-reproducing workload. Running it by hand gives you confidence that the tunable values have been updated by the script.)

@ylluminate
Copy link
Author

Yes, understood. So far it's staying a lot more sane...

@rottegift
Copy link
Contributor

rottegift commented Aug 1, 2016

If it's making a difference this soon with a workload like "6 rsyncs going full bore" -- and on the assumption that each rsync is writing to a different dataset or zvol -- then the issue is probably that you're generating an enromous amount of dirty data (limited by the smallest of 80% of system memory; or ~10% maximum per active target dataset; or 4 GiB per target dataset or zvol) which takes time to drain, during which your reads into ARC will stall, and during which the ARC cannot shrink back based on userland pressure.

The "smallarc" script I gave you leads to a maximum of 512 MiB of dirty data in the ARC per target dataset or zvol summing to a maximum of about 10GiB total. You can even consider dialling down kstat.zfs.darwin.tunable.zfs_dirty_data_max even further (e.g. to 128 MiB) if your write loads will always be throughput-sensitive rather than latency-sensitive.

(However, if you have only one target zvol (or dataset) for all six rsyncs, it could be something else is going on in addition to huge amounts of dirty data.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants