New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disk issues on yevaud #143
Comments
|
I couldn't find a way in the Areca BIOS to forcibly mark channel 5 as failed so for now the machine is up but with rendering stopped and the low zoom array (which uses that disk) unmounted. |
|
That disk also made a SMART report last last night, around the time I did the initial reboot. It was only a single pending sector though: |
|
It seems like this is so flaky now that it's not much use for tile serving? We knew that the Areca controller was on its way out since the battery died a while back. There's an 8-port LSI JBOD controller in there which we can use instead. We would have plenty of ports for 2x OS, 1x 1TB database, 2x 1TB tiles-high and 3x ?TB tiles-low, if that's what we wanted to do. Back when we updated orm #88 we swapped out the high-zoom array with SSDs to improve latency. Did we see any tangible benefit from that? |
|
I don't know about orm, but scorch (which is all SSD) is definitely doing well given it only has 8 CPU cores compared to 12 in orm and yevaud. |
|
Are we able to remove the Areca controller completely and wire up the backplane / disks to the LSI JBOD controller? We have at least 2x 500GB SATA SSD ex poldi. |
|
I have pulled IDE channel 5 disk Western Digital WD3000HLHX serial number WD-WXG1C30V9532. |
|
Does it mean that this server is now functional again and the issue can be closed or it is just kind of a test run?: https://munin.openstreetmap.org/openstreetmap/yevaud.openstreetmap/uptime.html |
|
Not quite, the machine is functional and catching up the replication backlog. There is another disk throwing disk warnings and it should be replaced or removed. |
|
PS: I install 2x Samsung 840 Pro 512GB disks into the tile-low array. |
|
Closing this issue. Going to create a new one for |
There seem to be some disk issues on yevaud. They started pretty much as soon as I tried to deploy the stylesheet update last night with the machine becoming unresponsive and the serial console showing what appeared to be disk related errors.
It was a few hours before I noticed but I then power cycled only for it to fall over again after about ten minutes spewing errors about rejected writes on the swap device.
I paused it on pingdom and it then stayed up overnight and completed the low zoom render but within a few hours of being bought back online this morning it went again.
I couldn't see any SMART errors, but the BIOS on the Areca RAID controller is reporting read errors on IDE channel 5 at around the relevant times - that disk is a Western Digital WD3000HLHX serial number WD-WXG1C30V9532.
The text was updated successfully, but these errors were encountered: