You can clone with
HTTPS or Subversion.
Hi, this seems to be a generic issue and I've noticed it on 3 different versions of Ubuntu. If the device on which a flashcache device sits becomes unavailable, then the flashcache instance goes into a loop, which ultimately drives the load average up to the point where the box ceases to function.
I'm currently running against network mounted images, run through qemu-nbd, which presents (for example) /dev/nbd0 as a block device. If I run FlashCache on top of this is works fine .. but .. if I kill the device that makes nbd0 available (i.e. qemu-nbd) then FlashCache goes mad.
Oct 6 14:13:25 node1 kernel: [ 1586.367220] block nbd0: Attempted send on closed socket
Oct 6 14:13:25 node1 kernel: [ 1586.367296] end_request: I/O error, dev nbd0, sector 0
Oct 6 14:13:25 node1 kernel: [ 1586.367365] Buffer I/O error on device nbd0, logical block 0
Oct 6 14:13:25 node1 kernel: [ 1586.367450] block nbd0: Attempted send on closed socket
Oct 6 14:13:25 node1 kernel: [ 1586.367549] end_request: I/O error, dev nbd0, sector 2
Oct 6 14:13:25 node1 kernel: [ 1586.367624] Buffer I/O error on device nbd0, logical block 1
;; etc ... apparently infinite loop.
Whereas I can understand it getting upset, one issue on one device takes out the entire machine of 20 vrirtual machines .. which is not good.
Is there any way to isolate faults such that FlashCache stops gracefully rather than ending the World?
(the same seems to be true when running against AoE targets....)
Would that also apply to wt mode?
We're running a patched version of flashcache in wt mode that bypasses the SSD upon the first failure (it basically marks all IO as uncacheable on an IO error). However, we have cases where the kernel reports read reqs being sent to the broken SSD after some time, and the server eventually freezes. Could that come from the same source you have in mind? If so, could you point me to that part of the code, pls?
Arne - This looping only applies to the writeback case. Specifically, when we get a disk error in the path of a block being cleaned.
This is the scenario - We pick a block for cleaning, submit the write to disk. We get a disk error, kcopyd_callback() calls into do_pending_error(). The block remains dirty, because it wasn't written out successfully. dirty_writeback() continues cleaning the set, after sweeping through the blocks it will re-clean the same block.
The problem here is that if the disk device is removed, we simply cannot clean any blocks at all. So the cleaning logic will pick off dirty blocks and try to clean them constantly.
The fix here would be to mark a block as "errored" on the cleaning path, so we can skip cleaning it. The problem of course is that such blocks will never be cleaned, and the cache will eventually be fuill of dirty blocks that cannot be cleaned.
I was aware of this from day 1, but I didn't think disk device removal, while the filesystem was still mounted was an important thing to address :)
Thanks, Mohan. I imagined that this did not apply to wt mode, as there is nothing to clean really.
In fact, I believe the problem we see is related to the way I implemented the bypass: I was relying on the uncacheable check in flashcache_read() and overlooked that the cache hit check was done before. So, the scenario is: block A is requested, block A is cached, the SSD fails, block B (or even A) is requested, the IO fails, SSD bypassing is activated, block A is requested, fc thinks it's cached and goes to flashcache_read_hit, rather than doing the uncacheable check where it would see that the SSD should be ignored and then sends the req to the faulty SSD.
I moved the bypass check to flashcache_map now :-)
Do you happen to know why too many SCSI errors grind the server to a halt? Is that syslog/console buffer overflow?