Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable recovery when there's not enough space #59

Closed
mitake opened this issue Aug 1, 2015 · 6 comments
Closed

Disable recovery when there's not enough space #59

mitake opened this issue Aug 1, 2015 · 6 comments
Assignees
Labels

Comments

@mitake
Copy link
Contributor

mitake commented Aug 1, 2015

This is the simpler example: a cluster with 3 nodes and --copies 2.
All nodes have about 80-90% of used space.
When I kill a node, the cluster try to replicate the missing copies of the lost node but there's abviously not enough space.
I think sheepdog should behave like this:

  • as soon as there's not enough space on the cluster to replicate the loss of any of the nodes, recovery has to be disabled.
  • if a node die, the cluster is still able to work but it has to show a 'degraded' state in dog cluster info.
    (This is alike mdadm showing 'clean,degraded' when a disk is missing)

dog node info
Id Size Used Avail Use%
0 4.6 GB 4.1 GB 479 MB 89%
1 5.0 GB 3.8 GB 1.1 GB 77%
2 5.0 GB 4.1 GB 894 MB 82%
Total 15 GB 12 GB 2.5 GB 83%

df -h /mnt/sheep/0
/dev/sda6 4,7G 4,2G 479M 90% /mnt/sheep/0

dog cluster info
Cluster status: running, auto-recovery enabled
Cluster created at Sat Oct 4 10:34:30 2014
Epoch Time Version
2014-10-04 10:34:30 1 [192.168.10.4:7000, 192.168.10.5:7000, 192.168.10.6:7000]
root@test004:~# dog cluster info -v
Cluster status: running, auto-recovery enabled
Cluster store: plain with 2 redundancy policy
Cluster vnode mode: node
Cluster created at Sat Oct 4 10:34:30 2014

dog node kill 2

dog node info
Id Size Used Avail Use%
0 4.6 GB 4.6 GB 2.7 MB 99%
1 5.0 GB 5.0 GB 1.5 MB 99%
Total 9.6 GB 9.6 GB 4.2 MB 99%

/var/lib/sheepdog/sheep.log
Oct 04 10:37:39 ERROR [rw 4593] prealloc(385) failed to preallocate space, No space left on device
Oct 04 10:37:39 ERROR [rw 4593] err_to_sderr(108) diskfull, oid=fd38150000005b
Oct 04 10:37:39 ALERT [rw 4593] recover_replication_object(404) cannot access any replicas of fd38150000005b at epoch 1
Oct 04 10:37:39 ALERT [rw 4593] recover_replication_object(405) clients may see old data
Oct 04 10:37:39 ERROR [rw 4593] recover_replication_object(412) can not recover oid fd38150000005b
Oct 04 10:37:39 ERROR [rw 4593] recover_object_work(576) failed to recover object fd38150000005b

dog vdu check
Server has no space for new objects

Sheepdog daemon version 0.8.0_353_g4d282d3

@mitake mitake added the sheep label Aug 1, 2015
@mitake mitake self-assigned this Aug 1, 2015
@gadago
Copy link

gadago commented Sep 22, 2015

Hi,

I wondered if this has been progressed at all? We see the same issue here on a 3 node cluster.

Thanks,

@mitake
Copy link
Contributor Author

mitake commented Sep 22, 2015

Sorry, I'm not working on this now. I'll solve this ASAP.

@gadago
Copy link

gadago commented Sep 22, 2015

No problem :)

We are looking at using sheepdog for a project and this was one of the things we noticed could be an issue in our testing.

Let me know when you have implemented the fix and happy to help test with you :)

@mitake
Copy link
Contributor Author

mitake commented Sep 22, 2015

Thanks a lot for your help!

@gadago
Copy link

gadago commented Oct 26, 2015

Hi,

I just wondered if any progress has been made on this?

@mitake
Copy link
Contributor Author

mitake commented Oct 28, 2015

Hi @gadago , sorry for my late reply.

I created a branch for this problem: https://github.com/sheepdog/sheepdog/tree/recovery-diskfull

Could you check it? If you pass a new option -F to sheep, your cluster will stop itself when a recovery process can cause diskfull.

cc @sirio81 @atw-abe

mitake added a commit that referenced this issue Oct 28, 2015
sheep can corrupt its cluster by diskfull with recovery process. For
avoiding this problem, this patch adds a new option -F to sheep. If
this command is passed to the sheep process, every sheep process of
the cluster stops itself if there is a possibility of diskfull during
recovery.

Fixes #59

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
mitake added a commit that referenced this issue Nov 24, 2015
sheep can corrupt its cluster by diskfull with recovery process. For
avoiding this problem, this patch adds a new option -F to sheep. If
this command is passed to the sheep process, every sheep process of
the cluster skips recovery if there is a possibility of diskfull
during recovery.

Fixes #59

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
mitake added a commit that referenced this issue Dec 23, 2015
sheep can corrupt its cluster by diskfull with recovery process. For
avoiding this problem, this patch adds a new option -F to dog cluster
format. If this command is passed during cluster formatting, every
sheep process of the cluster skips recovery if there is a possibility
of diskfull during recovery.

Fixes #59

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
tmenjo pushed a commit to tmenjo/sheepdog that referenced this issue Apr 22, 2016
sheep can corrupt its cluster by diskfull with recovery process. For
avoiding this problem, this patch adds a new option -F to dog cluster
format. If this command is passed during cluster formatting, every
sheep process of the cluster skips recovery if there is a possibility
of diskfull during recovery.

Fixes sheepdog#59

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
mitake added a commit that referenced this issue Apr 25, 2016
sheep can corrupt its cluster by diskfull with recovery process. For
avoiding this problem, this patch adds a new option -F to dog cluster
format. If this command is passed during cluster formatting, every
sheep process of the cluster skips recovery if there is a possibility
of diskfull during recovery.

Fixes #59

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants