This repository has been archived by the owner on Oct 12, 2020. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
Add udev-md-raid-safe-timeouts.rules
These udev rules attempt to set a safe kernel controller
timeout for disks containing RAID level 1 or higher
partitions for commodity disks which do not have SCTERC
capability, or do have it but it is disabled.
No attempt is made to change the STCERC settings on devices
which support it.
This attempts to mitigate the problem described here:
https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
http://strugglers.net/~andy/blog/2015/11/09/linux-software-raid-and-drive-timeouts/
where the kernel controller may timeout on a read from a
disk after the default timeout of 30 seconds and consequently
cause mdraid to regard the disk as dead and eject it from the
RAID array.
The mitigation is to set the timeout to 180 seconds for disks
which contain a RAID level 1 or higher partition.
Signed-off-by: Jonathan G. Underwood <jonathan.underwood@gmail.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>- Loading branch information
b96c193There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not wired up in Makefile install-udev, any reason for that?
Also wouldn't be it better to have rules which enable SCT ERC, via smartctl -l seterc,70,70, too ?
b96c193There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this repo is not where development happens these days - the mdadm list is the best place for discussion.
As to changing scterc settings: I took the approach that mdadm wasn't in the business of changing hardware firmware settings, so didn't add that into the patch I pushed upstream. However, I have experimented with that here:
https://github.com/jonathanunderwood/mdraid-safe-timeouts
I'd consider pushing more of that to mdadm, if the devs felt that messing with scterc settings was something mdadm shipped code should be doing.