-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RAS errors #10
Comments
Rasdaemon just collects what the Linux Kernel reports, so those are not a mis-configuration of the tool. UPDATE: forgot to mention, but rasdaemon support for disk errors is pretty new. Maybe some of the reported errors are simply because it is not properly being able to collect data.
That sounds a real error to me. It seems that you're having some troubles on your disk - or maybe the system upgrade installed a defective Kernel. In any case, to be safe, I would recommend you to backup your important data. |
Btw, you can see the errors it recorded with:
|
How could I verify this?
So far SMART on the disk reports all ok. The disk is just about two years old, so if there's a problem I'd like to find out asap since it may still be under warranty. I do have backups, so my biggest worry is potential disruption. Mainly I'd like to understand what is going on. I could try reinstalling the kernel if that might help. The errors are nearly always within about a minute of booting, and almost never at any other time. I compared the error timestamps with ras-mc-ctl --errors
For completeness, as noted in the stackexchange question:
|
Hard to tell what's happening... Yeah, it could be due to some problem with FUSE and encrypted disks, or something else that won't be a big issue. Anyway, I just lost a couple of weeks ago a less than one year old 480GB Sandisk SSD on my laptop. I had SMART installed, and I didn't got any previous notice about problems there. Unfortunately, it didn't have rasdaemon installed on it. So, I have no means to know if I got any previous warnings. So, I'm right now on a paranoid state about SSD reliability ;-) |
FWIW, i'm getting these every ~2s
|
Looks like there is a need to understand the meaning of the errors, otherwise they will simply be ignored. Right now I'm clueless and given that I see nothing wrong, I'm not about to replace the drive. I doubt warranty departments would accept nondescript errors as a valid reason when performance and other diagnostics check out just fine. |
I don't have the disk errors at boot anymore with a different motherboard and CPU. I switched out the motherboard of this PC and kept everything else about it the same. It's the same drives and SATA cables as before. I'm also still using the same Linux installation as before, I didn't reinstall. I'm then thinking these errors aren't about the drives. It's maybe something about the chipset or board. The motherboard where I had the errors at boot is an Intel Z77 chipset with i5-3570K CPU, the board without errors is an AMD X470 chipset and R7-2700X CPU. |
Interesting possibility. Though I have errors with an AMD X470 chipset (Asus CH7 WIFI motherboard) and Ryzen 7 2700X CPU. |
One specific issue in this case is that the original error is not logged. ie, we see |
How do you map those device ids to actual devices? There are no devices with such ids present in |
I started using
rasdaemon
a few months ago, and I have been seeing disk errors and a system log entry ofrasdaemon: Can't get traces from ras:aer_event
at every boot, as reported here: https://unix.stackexchange.com/questions/553527/how-to-diagnose-rasdaemon-disk-errorsAre these errors indications of real issues, is it a mis-configuration of
rasdaemon
, or is it a bug withrasdaemon
or something else?After a recent system update (including upgrading to kernel 5.5.7) the errors now indicate some crashes at boot. The system is still stable and I would not have known of any issue without the reports in the system log:
ras errors in journalctl
The text was updated successfully, but these errors were encountered: