-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable recovery when there's not enough space #59
Comments
Hi, I wondered if this has been progressed at all? We see the same issue here on a 3 node cluster. Thanks, |
Sorry, I'm not working on this now. I'll solve this ASAP. |
No problem :) We are looking at using sheepdog for a project and this was one of the things we noticed could be an issue in our testing. Let me know when you have implemented the fix and happy to help test with you :) |
Thanks a lot for your help! |
Hi, I just wondered if any progress has been made on this? |
Hi @gadago , sorry for my late reply. I created a branch for this problem: https://github.com/sheepdog/sheepdog/tree/recovery-diskfull Could you check it? If you pass a new option -F to sheep, your cluster will stop itself when a recovery process can cause diskfull. |
sheep can corrupt its cluster by diskfull with recovery process. For avoiding this problem, this patch adds a new option -F to sheep. If this command is passed to the sheep process, every sheep process of the cluster stops itself if there is a possibility of diskfull during recovery. Fixes #59 Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
sheep can corrupt its cluster by diskfull with recovery process. For avoiding this problem, this patch adds a new option -F to sheep. If this command is passed to the sheep process, every sheep process of the cluster skips recovery if there is a possibility of diskfull during recovery. Fixes #59 Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
sheep can corrupt its cluster by diskfull with recovery process. For avoiding this problem, this patch adds a new option -F to dog cluster format. If this command is passed during cluster formatting, every sheep process of the cluster skips recovery if there is a possibility of diskfull during recovery. Fixes #59 Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
sheep can corrupt its cluster by diskfull with recovery process. For avoiding this problem, this patch adds a new option -F to dog cluster format. If this command is passed during cluster formatting, every sheep process of the cluster skips recovery if there is a possibility of diskfull during recovery. Fixes sheepdog#59 Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
sheep can corrupt its cluster by diskfull with recovery process. For avoiding this problem, this patch adds a new option -F to dog cluster format. If this command is passed during cluster formatting, every sheep process of the cluster skips recovery if there is a possibility of diskfull during recovery. Fixes #59 Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
This is the simpler example: a cluster with 3 nodes and --copies 2.
All nodes have about 80-90% of used space.
When I kill a node, the cluster try to replicate the missing copies of the lost node but there's abviously not enough space.
I think sheepdog should behave like this:
(This is alike mdadm showing 'clean,degraded' when a disk is missing)
dog node info
Id Size Used Avail Use%
0 4.6 GB 4.1 GB 479 MB 89%
1 5.0 GB 3.8 GB 1.1 GB 77%
2 5.0 GB 4.1 GB 894 MB 82%
Total 15 GB 12 GB 2.5 GB 83%
df -h /mnt/sheep/0
/dev/sda6 4,7G 4,2G 479M 90% /mnt/sheep/0
dog cluster info
Cluster status: running, auto-recovery enabled
Cluster created at Sat Oct 4 10:34:30 2014
Epoch Time Version
2014-10-04 10:34:30 1 [192.168.10.4:7000, 192.168.10.5:7000, 192.168.10.6:7000]
root@test004:~# dog cluster info -v
Cluster status: running, auto-recovery enabled
Cluster store: plain with 2 redundancy policy
Cluster vnode mode: node
Cluster created at Sat Oct 4 10:34:30 2014
dog node kill 2
dog node info
Id Size Used Avail Use%
0 4.6 GB 4.6 GB 2.7 MB 99%
1 5.0 GB 5.0 GB 1.5 MB 99%
Total 9.6 GB 9.6 GB 4.2 MB 99%
/var/lib/sheepdog/sheep.log
Oct 04 10:37:39 ERROR [rw 4593] prealloc(385) failed to preallocate space, No space left on device
Oct 04 10:37:39 ERROR [rw 4593] err_to_sderr(108) diskfull, oid=fd38150000005b
Oct 04 10:37:39 ALERT [rw 4593] recover_replication_object(404) cannot access any replicas of fd38150000005b at epoch 1
Oct 04 10:37:39 ALERT [rw 4593] recover_replication_object(405) clients may see old data
Oct 04 10:37:39 ERROR [rw 4593] recover_replication_object(412) can not recover oid fd38150000005b
Oct 04 10:37:39 ERROR [rw 4593] recover_object_work(576) failed to recover object fd38150000005b
dog vdu check
Server has no space for new objects
Sheepdog daemon version 0.8.0_353_g4d282d3
The text was updated successfully, but these errors were encountered: