flashcache crashing #87

Closed
madpenguin opened this Issue Oct 6, 2012 · 5 comments

Comments

Projects
None yet
3 participants
@madpenguin

Hi, this seems to be a generic issue and I've noticed it on 3 different versions of Ubuntu. If the device on which a flashcache device sits becomes unavailable, then the flashcache instance goes into a loop, which ultimately drives the load average up to the point where the box ceases to function.

Example;

I'm currently running against network mounted images, run through qemu-nbd, which presents (for example) /dev/nbd0 as a block device. If I run FlashCache on top of this is works fine .. but .. if I kill the device that makes nbd0 available (i.e. qemu-nbd) then FlashCache goes mad.
;;
Oct 6 14:13:25 node1 kernel: [ 1586.367220] block nbd0: Attempted send on closed socket
Oct 6 14:13:25 node1 kernel: [ 1586.367296] end_request: I/O error, dev nbd0, sector 0
Oct 6 14:13:25 node1 kernel: [ 1586.367365] Buffer I/O error on device nbd0, logical block 0
Oct 6 14:13:25 node1 kernel: [ 1586.367450] block nbd0: Attempted send on closed socket
Oct 6 14:13:25 node1 kernel: [ 1586.367549] end_request: I/O error, dev nbd0, sector 2
Oct 6 14:13:25 node1 kernel: [ 1586.367624] Buffer I/O error on device nbd0, logical block 1
;;
;; etc ... apparently infinite loop.
;;

Whereas I can understand it getting upset, one issue on one device takes out the entire machine of 20 vrirtual machines .. which is not good.

Is there any way to isolate faults such that FlashCache stops gracefully rather than ending the World?

(the same seems to be true when running against AoE targets....)

tia
MP

@mohans

This comment has been minimized.

Show comment Hide comment
@mohans

mohans Oct 6, 2012

Contributor

This is happening in one case in the block cleaning path, if the device goes
offline, flashcache will continually try and write out the block in a loop. I'll
take a look at the code next week and see how easy it is to fix this.


From: Mad Penguin notifications@github.com
To: facebook/flashcache flashcache@noreply.github.com
Sent: Sat, October 6, 2012 6:25:49 AM
Subject: [flashcache] flashcache crashing (#87)

Hi, this seems to be a generic issue and I've noticed it on 3 different versions
of Ubuntu. If the device on which a flashcache device sits becomes unavailable,
then the flashcache instance goes into a loop, which ultimately drives the load
average up to the point where the box ceases to function.
Example;
I'm currently running against network mounted images, run through qemu-nbd,
which presents (for example) /dev/nbd0 as a block device. If I run FlashCache on
top of this is works fine .. but .. if I kill the device that makes nbd0
available (i.e. qemu-nbd) then FlashCache goes mad.
;;
Oct 6 14:13:25 node1 kernel: [ 1586.367220] block nbd0: Attempted send on
closed socket
Oct 6 14:13:25 node1 kernel: [ 1586.367296] end_request: I/O error, dev nbd0,
sector 0
Oct 6 14:13:25 node1 kernel: [ 1586.367365] Buffer I/O error on device nbd0,
logical block 0
Oct 6 14:13:25 node1 kernel: [ 1586.367450] block nbd0: Attempted send on
closed socket
Oct 6 14:13:25 node1 kernel: [ 1586.367549] end_request: I/O error, dev nbd0,
sector 2
Oct 6 14:13:25 node1 kernel: [ 1586.367624] Buffer I/O error on device nbd0,
logical block 1
;;
;; etc ... apparently infinite loop.
;;
Whereas I can understand it getting upset, one issue on one device takes out the
entire machine of 20 vrirtual machines .. which is not good.
Is there any way to isolate faults such that FlashCache stops gracefully rather
than ending the World?
(the same seems to be true when running against AoE targets....)
tia
MP

Reply to this email directly or view it on GitHub.

Contributor

mohans commented Oct 6, 2012

This is happening in one case in the block cleaning path, if the device goes
offline, flashcache will continually try and write out the block in a loop. I'll
take a look at the code next week and see how easy it is to fix this.


From: Mad Penguin notifications@github.com
To: facebook/flashcache flashcache@noreply.github.com
Sent: Sat, October 6, 2012 6:25:49 AM
Subject: [flashcache] flashcache crashing (#87)

Hi, this seems to be a generic issue and I've noticed it on 3 different versions
of Ubuntu. If the device on which a flashcache device sits becomes unavailable,
then the flashcache instance goes into a loop, which ultimately drives the load
average up to the point where the box ceases to function.
Example;
I'm currently running against network mounted images, run through qemu-nbd,
which presents (for example) /dev/nbd0 as a block device. If I run FlashCache on
top of this is works fine .. but .. if I kill the device that makes nbd0
available (i.e. qemu-nbd) then FlashCache goes mad.
;;
Oct 6 14:13:25 node1 kernel: [ 1586.367220] block nbd0: Attempted send on
closed socket
Oct 6 14:13:25 node1 kernel: [ 1586.367296] end_request: I/O error, dev nbd0,
sector 0
Oct 6 14:13:25 node1 kernel: [ 1586.367365] Buffer I/O error on device nbd0,
logical block 0
Oct 6 14:13:25 node1 kernel: [ 1586.367450] block nbd0: Attempted send on
closed socket
Oct 6 14:13:25 node1 kernel: [ 1586.367549] end_request: I/O error, dev nbd0,
sector 2
Oct 6 14:13:25 node1 kernel: [ 1586.367624] Buffer I/O error on device nbd0,
logical block 1
;;
;; etc ... apparently infinite loop.
;;
Whereas I can understand it getting upset, one issue on one device takes out the
entire machine of 20 vrirtual machines .. which is not good.
Is there any way to isolate faults such that FlashCache stops gracefully rather
than ending the World?
(the same seems to be true when running against AoE targets....)
tia
MP

Reply to this email directly or view it on GitHub.

@arnewiebalck

This comment has been minimized.

Show comment Hide comment
@arnewiebalck

arnewiebalck Oct 18, 2012

Mohan,

Would that also apply to wt mode?

We're running a patched version of flashcache in wt mode that bypasses the SSD upon the first failure (it basically marks all IO as uncacheable on an IO error). However, we have cases where the kernel reports read reqs being sent to the broken SSD after some time, and the server eventually freezes. Could that come from the same source you have in mind? If so, could you point me to that part of the code, pls?

Thanks,
Arne

Mohan,

Would that also apply to wt mode?

We're running a patched version of flashcache in wt mode that bypasses the SSD upon the first failure (it basically marks all IO as uncacheable on an IO error). However, we have cases where the kernel reports read reqs being sent to the broken SSD after some time, and the server eventually freezes. Could that come from the same source you have in mind? If so, could you point me to that part of the code, pls?

Thanks,
Arne

@mohans

This comment has been minimized.

Show comment Hide comment
@mohans

mohans Oct 18, 2012

Contributor

Arne - This looping only applies to the writeback case. Specifically, when we get a disk error in the path of a block being cleaned.

This is the scenario - We pick a block for cleaning, submit the write to disk. We get a disk error, kcopyd_callback() calls into do_pending_error(). The block remains dirty, because it wasn't written out successfully. dirty_writeback() continues cleaning the set, after sweeping through the blocks it will re-clean the same block.

The problem here is that if the disk device is removed, we simply cannot clean any blocks at all. So the cleaning logic will pick off dirty blocks and try to clean them constantly.

The fix here would be to mark a block as "errored" on the cleaning path, so we can skip cleaning it. The problem of course is that such blocks will never be cleaned, and the cache will eventually be fuill of dirty blocks that cannot be cleaned.

I was aware of this from day 1, but I didn't think disk device removal, while the filesystem was still mounted was an important thing to address :)

Contributor

mohans commented Oct 18, 2012

Arne - This looping only applies to the writeback case. Specifically, when we get a disk error in the path of a block being cleaned.

This is the scenario - We pick a block for cleaning, submit the write to disk. We get a disk error, kcopyd_callback() calls into do_pending_error(). The block remains dirty, because it wasn't written out successfully. dirty_writeback() continues cleaning the set, after sweeping through the blocks it will re-clean the same block.

The problem here is that if the disk device is removed, we simply cannot clean any blocks at all. So the cleaning logic will pick off dirty blocks and try to clean them constantly.

The fix here would be to mark a block as "errored" on the cleaning path, so we can skip cleaning it. The problem of course is that such blocks will never be cleaned, and the cache will eventually be fuill of dirty blocks that cannot be cleaned.

I was aware of this from day 1, but I didn't think disk device removal, while the filesystem was still mounted was an important thing to address :)

@arnewiebalck

This comment has been minimized.

Show comment Hide comment
@arnewiebalck

arnewiebalck Oct 18, 2012

Thanks, Mohan. I imagined that this did not apply to wt mode, as there is nothing to clean really.

In fact, I believe the problem we see is related to the way I implemented the bypass: I was relying on the uncacheable check in flashcache_read() and overlooked that the cache hit check was done before. So, the scenario is: block A is requested, block A is cached, the SSD fails, block B (or even A) is requested, the IO fails, SSD bypassing is activated, block A is requested, fc thinks it's cached and goes to flashcache_read_hit, rather than doing the uncacheable check where it would see that the SSD should be ignored and then sends the req to the faulty SSD.

I moved the bypass check to flashcache_map now :-)

Do you happen to know why too many SCSI errors grind the server to a halt? Is that syslog/console buffer overflow?

Thanks, Mohan. I imagined that this did not apply to wt mode, as there is nothing to clean really.

In fact, I believe the problem we see is related to the way I implemented the bypass: I was relying on the uncacheable check in flashcache_read() and overlooked that the cache hit check was done before. So, the scenario is: block A is requested, block A is cached, the SSD fails, block B (or even A) is requested, the IO fails, SSD bypassing is activated, block A is requested, fc thinks it's cached and goes to flashcache_read_hit, rather than doing the uncacheable check where it would see that the SSD should be ignored and then sends the req to the faulty SSD.

I moved the bypass check to flashcache_map now :-)

Do you happen to know why too many SCSI errors grind the server to a halt? Is that syslog/console buffer overflow?

@mohans

This comment has been minimized.

Show comment Hide comment
@mohans

mohans Oct 18, 2012

Contributor

I think in his case, he is simply yanking the disk device from under flashcache
(?), so flashcache's block cleaning just goes crazy, getting errors back from
the disk writes issued, and promptly continuing cleaning.

Syslog spew may also be a contributor. But the bigger fix is to make flashcache
stop cleaning blocks that have already disk errored ?


From: Arne Wiebalck notifications@github.com
To: facebook/flashcache flashcache@noreply.github.com
Cc: Mohan Srinivasan mohan_srinivasan@yahoo.com
Sent: Thu, October 18, 2012 9:12:17 AM
Subject: Re: [flashcache] flashcache crashing (#87)

Thanks, Mohan. I imagined that this did not apply to wt mode, as there is
nothing to clean really.

In fact, I believe the problem we see is related to the way I implemented the
bypass: I was relying on the uncacheable check in flashcache_read() and
overlooked that the cache hit check was done before. So, the scenario is: block
A is requested, block A is cached, the SSD fails, block B (or even A) is
requested, the IO fails, SSD bypassing is activated, block A is requested, fc
thinks it's cached and goes to flashcache_read_hit, rather than doing the
uncacheable check where it would see that the SSD should be ignored and then
sends the req to the faulty SSD.

I moved the bypass check to flashcache_map now :-)
Do you happen to know why too many SCSI errors grind the server to a halt? Is
that syslog/console buffer overflow?


Reply to this email directly or view it on GitHub.

Contributor

mohans commented Oct 18, 2012

I think in his case, he is simply yanking the disk device from under flashcache
(?), so flashcache's block cleaning just goes crazy, getting errors back from
the disk writes issued, and promptly continuing cleaning.

Syslog spew may also be a contributor. But the bigger fix is to make flashcache
stop cleaning blocks that have already disk errored ?


From: Arne Wiebalck notifications@github.com
To: facebook/flashcache flashcache@noreply.github.com
Cc: Mohan Srinivasan mohan_srinivasan@yahoo.com
Sent: Thu, October 18, 2012 9:12:17 AM
Subject: Re: [flashcache] flashcache crashing (#87)

Thanks, Mohan. I imagined that this did not apply to wt mode, as there is
nothing to clean really.

In fact, I believe the problem we see is related to the way I implemented the
bypass: I was relying on the uncacheable check in flashcache_read() and
overlooked that the cache hit check was done before. So, the scenario is: block
A is requested, block A is cached, the SSD fails, block B (or even A) is
requested, the IO fails, SSD bypassing is activated, block A is requested, fc
thinks it's cached and goes to flashcache_read_hit, rather than doing the
uncacheable check where it would see that the SSD should be ignored and then
sends the req to the faulty SSD.

I moved the bypass check to flashcache_map now :-)
Do you happen to know why too many SCSI errors grind the server to a halt? Is
that syslog/console buffer overflow?


Reply to this email directly or view it on GitHub.

This issue was closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment