New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hdd-spindown doesn't suspend disks anymore #5
Comments
I was thinking about using Arch as my NAS OS yesterday. |
You can try the command: Then use the command to check whether the disk is in standby mode? : You can also use this command : |
Thanks to lynix for developing this tool, I just need it! |
Unfortunately I've only got one machine left that has rotating disks (migrated anything else to NVMe SSDs) and that one is running I'll try to put something together in order to test with a recent kernel. |
I've tried with a recent Arch installation on Kernel @nick-s-b could you please install the version from branch debug/issue-5 and see what the debugging output I have added tells you? This way we can see whether the I/O counters are not read correctly or whether it's a bug in the logic. |
@lynix Thank you for a response! I've installed the issue-5 branch, I've restarted the service and I've opened a folder on the HDD in question to start up the disk. I then closed the folder and file manager and haven't touched it since. All this happened at 14:23 (according to logs below) Here's the full journal output:
From the above, service was restarted and disk activated at 14:23 and the disk should have been spun down at 14:33 (10 min later; disk timeout I specified). However, this did not happen.
.rc file has these two lines in it:
and I'm still on 5.5.5. (can't reboot right now... have a process that has been running for a few days and will run for another couple of days).
I then used
and then the logs had this output few minutes later:
Edit 2:
Thank you so much for looking into this! |
Looking at your traces I see that the read counter (first number) is constantly increased. That's why the disk is not put to sleep. Interestingly, in your last trace this is still the case. If the drive remained suspended during that trace then the read requests have all been served from cache, which is a weakness of my approach of determining drive activity. So you need to find out which process keeps reading from that disk. There is an option from the Kernel exported via sysfs to dump all I/O access in dmesg, but this can get dangerous as writing these dmesg entries down to disk causes further entries and you end up in self-amplification. I'd also think about adding an option to hdd-spindown.sh to only check for the write counter, which remains constant in your case. But that would have the downside of putting the drive to sleep too often for read-focused workloads. |
@lynix Thank you! I'll look into it myself as well. I'll try to determine what's accessing the disk. I'll also update the kernel since that might be the cause. I don't think I've changed anything in my day-to-day use. I still use the same text editor, same DM, same FM et etc. One thing that has changed is that I now have two web browsers opened at all times for development. Could it be that Firefox Dev is causing this since I started using it heavily right around the update? I don't know. I'll try to narrow it down. One thing that's so weird about this is that the disk doesn't spin up at all after being put to standby yet these read counters keep increasing. I'm hoping this is the kernel issue since it will either get fixed or it will be the new normal. Thanks again. I'll report in a few days. |
I would like to report that reading the drives S.M.A.R.T. causes the read value to go up. Im going to try stopping the systemd smartmontools service to see what happens. PS: im also getting this problem. (Linux 5.6) Apr 15 03:48:24 chrholly smartd[1413]: Device: /dev/sdf [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 203 to 196 Apr 15 04:18:23 chrholly smartd[1413]: Device: /dev/sdf [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 196 to 185 Stopping the service now
Heh that fixed it |
@nick-s-b disable your smartmontools service. |
@USBhost hmmm... mine's not even running:
But yeah, |
Do we know more about this by now? I had the same phenomenon after upgrading from kernel 5.4, the "read" values in /proc/diskstats went up without data actually being read by any process. |
@bedouin67 I could never get it run again for me and have uninstalled it. I might give it a try after I do an upgrade this weekend. |
I'm happy to accept pull requests for ignoring the read counter, if SMART readouts really make that one constantly increase. However I must admit I don't have any rotating disks at hand anymore, so I will not be able to test anything. |
I don't think it's a good idea to ignore the read counter and only take the write counter for the check. I was playing a video for testing exactly this and the write counter never changes:
This issue is on my Arch install with Kernel 5.11, however, on my Pi4 running Raspbian kernel 5.10.17-v7l+ it still works fine. |
Looking at the Kernel documentation for
Maybe the sector counters are not triggered by SMART queries? If so, we could use them to determine drive activity. |
Sector counters show the same behaviour as I/O for me:
|
psutil.disk_io_counters(perdisk=True) |
"smartctl -i -n standby /dev/sdx" |
Run it during initialization to see if and how much the IO count will increase, and then modify the count according to the added value. This makes it compatible with different kernels. |
When initializing, run it a few more times to make sure the data is correct |
Probably script logic.
|
A better way. It seems that after the spindown command is executed, the IO will also change. So the script should immediately retrieve and record the current IO count. To prevent the log from growing indefinitely, we can add a flag bit that represents a change in IO. This flag bit is not as reliable as the actual detected disk status. So only as a condition for logging. |
@rankaiyx Thanks for 5 notification mails within two hours ;) I'm not sure I can follow your explanations, specifically the latest one with the flag. Either way, this is something I would consider too 'complex' to implement or add myself without the ability to test anything. And, as I said above, I don't have any rotating disks anymore. Feel free to fork the project and go ahead with the extended counter detection logic. I guess I will put this instance of the project to archived mode. |
I'm having the same issue on my machine (running the most recent stable kernel) and I think other spindown solutions are suffering from the same problem. According to the hd-idle readme, on kernels > 5.4 monitoring tools will alter the disk read/write count, so they moved that logic to the partition level where these values will stay the same instead. I haven't had a look at the hdd-spindown code so far (because everything has been working great :) but this sounds like a nice project for a lazy weekend, so I might try my hand at this. |
Well, I think I made it work, my disks have been happily spinning down for the past 24 hours (finally!). I rewrote some parts to use There's still some debug output left in my fork, but I changed the documentation so it's ready for a test drive at the very least :) Please feel free to report any issues (on my repo, to keep lynix' notifications to a minimum). Once I'm confident enough that I actually created a working piece of code, I'll clean the debug output and make a pull request. NB If it didn't before, hdd-spindown will now require bash version 4, because I'm using associative arrays for the partitions. P.s. I'm not a programmer, not an expert either, I might have added bugs that will make your cat explode or your disks explode - or both! It works for me™, though :) |
Great news @bocki! I'll happily give your PR another pair of eyes and merge when looking good. |
Nice! I took a look at your repair code, and It looks great. |
sdxn may change, and it may be better to use UUID. |
Has been tested. It works. Cheers! |
Yes, changing partition names definitely is an issue for my current approach. I hadn't thought of that and so far I don't see a simple solution. While getting sdXY names automatically is pretty trivial, it gets harder if you use other things than classic block devices. In my case I'm using bcache - these "partitions" show up in UUIDs would likely be the easiest approach, however they don't look as nice (not sure if aesthetics alone is an argument, though :). It would definitely add some bloat to the script, either way. Maybe someone else can chime in? |
function dev_stats() { If a disk has only one partition (This is usually the case on NAS), it may be possible to modify it in this way. |
I tried @bocki 's branch, it WORK! Hope @lynix could approve the pull request.
|
There is a defect in his repair in some cases. Take a closer look at the above discussion. |
This comment has been minimized.
This comment has been minimized.
It should work, but it may take a test to be sure. |
Looks good for me, works as expected for 3 days. I will post here if something wrong. |
I managed to nearly trash one of my drives by ignoring the fact that block device names can change on reboot, so I finally added the option to add partitions by their UUIDs. It's probably not the best piece of code but it works for me™️ (for multiple partitions too!). I'd appreciate any feedback. I updated the original pull request, or you can just use my repo to get the code :) |
It seems that on kernel 6.0 this issue has been resolved, runing |
Hi,
I'm not sure if it's just me but hdd-spindown has stopped suspending disks on my machine after a recent Arch update. I have tried to track it down but was unable to figure out what's stopping it. I think it has something to do with the new Linux kernel. After upgrading to 5.5.5, hdd-spindown doesn't suspend anymore. All the logs look normal and there are no errors reported. My .rc file is super simple: an ID for a single disk and a timeout. It has worked fine for over a year so something must have changed in the kernel and the way that hdd-spindown tacks no disk activity.
The text was updated successfully, but these errors were encountered: