-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPL: MMT it's been an hour since the last reap #12
Comments
I stepped back to the 1.5.2 release and still seeing the same kind of hangs, although it has not been long enough to see if the initial message above is reported on reboot. |
Those are harmless messages.
|
Also did you pick up the SPL change that was made ~24 hrs ago?
|
No, I've not grabbed those. Let me take a look. I went back to the older driver in hopes things would improve, but to no avail. What concerned me was the fact that it wasn't releasing after long periods of time and some degree of inactivity. I probably should make some kind of a shell script that forces it to feel the pressure and release... |
I see. “An hour since last reap” is behaving normally. That is not changing for now. Its just idle chitchat in the logs. Brendon
|
If you're keen try the code in the SPL branch 'vmem_experiments' we would like some feedback as to how that works for you. |
Absolutely. Can you tell me what to expect and what the goal is? From the look of the commits it seems fairly stable so I don't think it's likely to produce panics is it? |
You should expect 🌈 💐 ☀️ |
/me switches to HERE WE GO! |
I've been running it for a month without reboot, it seems ok. Just alters some allocator details, maybe works better. Dont want to cloud your judgement by making claims. Do whatever you did to create your performance stalls or whatever they were. |
Did you try it? |
I've actually got a heavy copy process going on it right now that should have prompted the issue I was having earlier. I've actually cranked it up and have 6 rsyncs going full bore and while it has maxed out my memory usage, my system is still semi usable. I have two different volumes copying via these multiple threads onto the raid-z. So yeah, so far so good... I wish I could cut down the memory consumption further though. Like I said, it's consuming everything and using swap at the moment on a 64GB system ( |
ok sounds promising. If you want to limit memory use, why dont you set the arc max? |
Well that's what is confusing @brendonhumphrey, I do have |
That is a similar setting to what I have, and the machine is obediently keeping memory use in check. You havent got any other settings in zsysctl.conf have you? I accidentally inherited something from ilovezfs' example script that blocked zfs_arc_max from working. I'm not near the machine, so cant quote the exact setting. you could also do sysctl -a | grep spl. One of the sysctls (theres only a few, so it should be obvious which one) tells how much memory we have allocated in total. You can use that to check that it is o3x that is occupying the memory. |
Hmm, that's the only line in |
Ok, you really dont want this setting (assuming you pulled it in from the .example file): 10 Mar 2015; ilovezfs also kstat.zfs.misc.arcstats.c_max is the actual arc max i.e. check after setting the tunable. |
Interesting, so essentially I should use: Or should BOTH be in there? What's the purpose of this new setting in comparison with the old wrt this "meta" vs actual limit? |
Ah no. Sorry if I wasnt clear. if you have You can verify what arc max is after setting the tunable by checking kstat.zfs.misc.arcstats.c_max. Also, the total memory in use is kstat.spl.misc.spl_misc.os_mem_alloc. |
Alright, so at present I only have |
I guess that last setting must have been the wrong move since the |
Try this script. DON'T adjust to taste, don't mess around. Tell us the results of running EXACTLY this as superuser as soon as you can after boot time, assuming your recent problematic workload eventually causes problems again.
IF you see the same problem after running this as superuser near boot-time, then paste in the exact output of
(and surround the result in a pair of lines containing three backticks for the sake of readability). |
So, FYI, I've done the following: Created
And created
|
Well, you could just have run it by hand (sudo scriptname) when you get a login prompt, but it should (should!) work when executed early by launchd. Make sure it does run via launchd; you can use the StandardErrorPath and StandardOutPath keys in your plist, and also grep SPL /var/log/system.log and check the output of "sysctl kstat.zfs.darwin.tunable" to make sure the values reflect the script. Unfortunately if you're using launchd you may have a race with org.openzfsonosx.zconfigd. (It's perfectly fine to run it by hand before there are more than 2GB in your ARC, and certainly before you start running your problem-reproducing workload. Running it by hand gives you confidence that the tunable values have been updated by the script.) |
Yes, understood. So far it's staying a lot more sane... |
If it's making a difference this soon with a workload like "6 rsyncs going full bore" -- and on the assumption that each rsync is writing to a different dataset or zvol -- then the issue is probably that you're generating an enromous amount of dirty data (limited by the smallest of 80% of system memory; or ~10% maximum per active target dataset; or 4 GiB per target dataset or zvol) which takes time to drain, during which your reads into ARC will stall, and during which the ARC cannot shrink back based on userland pressure. The "smallarc" script I gave you leads to a maximum of 512 MiB of dirty data in the ARC per target dataset or zvol summing to a maximum of about 10GiB total. You can even consider dialling down kstat.zfs.darwin.tunable.zfs_dirty_data_max even further (e.g. to 128 MiB) if your write loads will always be throughput-sensitive rather than latency-sensitive. (However, if you have only one target zvol (or dataset) for all six rsyncs, it could be something else is going on in addition to huge amounts of dirty data.) |
I have been noticing that my ZFS volume has become unresponsive somewhat and upon rebooting I received the following messages:
I'm a little concerned because when I rebooted, I am still running into some hangs in the middle of copying some files. I'm using the most recent source from master at present.
The text was updated successfully, but these errors were encountered: