Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large snapshots prevent the addition of new managers to the cluster #2374

Closed
nishanttotla opened this issue Sep 15, 2017 · 6 comments
Closed

Comments

@nishanttotla
Copy link
Contributor

nishanttotla commented Sep 15, 2017

This issue has been seen in a couple of production clusters, and is considered critical. We must fix it on SwarmKit.

Summary

When the raft snapshot becomes larger than 4MB, then adding a new manager to the cluster becomes problematic. This is because the default gRPC message limit is 4MB, and sending a snapshot over to the new joining manager fails. As a result, the new manager does not end up with proper cluster state. This can also happen if a manager in an existing cluster falls behind and needs to receive a snapshot from a raft peer.

What Makes the Snapshot Large

Running a large number of services/tasks possibly connected to many networks can increase the size of the snapshot. If the task history retention limit is particularly high, a lot of old tasks can stay around bloating it further. Having a large number of (possibly large) secrets can also cause this problem.

Possible Fixes

There are several possible fixes that have been discussed. Let's use this issue to discuss pros and cons.

  1. Increase the gRPC message limit size to something higher and more reasonable. (how to decide this limit is unclear)
  2. Stream the snapshot instead of trying to send it as one gRPC message.
  3. Don't keep task history in the raft log, because it is not as critical. (this may alleviate the problem but not necessarily fix it)
  4. Compress the snapshot when writing to disk. If a new manager has to receive it, it can decompress it upon reception. (this may alleviate the problem but not necessarily fix it)

We may have to do a combination of these things.

cc @wsong @anshulpundir @stevvooe @aluzzardi @aaronlehmann @jlhawn

@anshulpundir
Copy link
Contributor

@wsong and I discussed this, but this should include better error reporting for failure scenarios related to this.

@aaronlehmann
Copy link
Collaborator

aaronlehmann commented Sep 15, 2017 via email

@anshulpundir
Copy link
Contributor

anshulpundir commented Sep 15, 2017

3 is just a better separation of critical vs non-critical data, IMO. 4 is an optimization. 1 can possibly be done in the short term. 2 is the approach for long term.

@anshulpundir anshulpundir self-assigned this Sep 15, 2017
@anshulpundir
Copy link
Contributor

anshulpundir commented Sep 15, 2017

Increasing grpc message size to 128MB, in case it is needed: #2375
Didn't get a chance to test it yet though.

@anshulpundir
Copy link
Contributor

Short-term fix has been landed. Reducing priority to P1 since the long term solution is not P1.

@anshulpundir
Copy link
Contributor

anshulpundir commented Dec 6, 2017

Fixed in #2458

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants