Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error generated when messages size is too big #2733

Closed
nmengin opened this issue Aug 29, 2018 · 4 comments · Fixed by #2869
Closed

Error generated when messages size is too big #2733

nmengin opened this issue Aug 29, 2018 · 4 comments · Fixed by #2869

Comments

@nmengin
Copy link
Contributor

nmengin commented Aug 29, 2018

Kind

Enhancement

Description

When a huge number of Swarmkit objects (services, configs and secrets) are created in the same time, this error appears:

level=error msg="agent: session failed" backoff=100ms error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (7263689 vs. 4194304)"

This error seems to happen on all the cluster nodes in two differents use cases:

  • When a big swarmkit object is created/updated or when a lot of swamkit objects are created/updated in the same, UC1 below.
  • When the cluster leader comes down, UC2 below.

UC1

The error seems to happen because the gRPC client connection used for the Assignments method can not received messages which have a size greater than 4194304.
This method is called thanks to the ConnectionBroker Client Connection which is defined in two functions:

  • Here to define the local connection to use on the leader
  • Here to define the remote connection to use by the others nodes

UC2

The error seems to happen because the grpc client connection used in the Raft Transport (the structure which manages remote raft peers and sends messages to them) can not received messages which have a size greater than 4194304.

The Transport ClientConnection is initialized here.

Proposition

I propose to add an option on the Node Config to allow users to add a slice of grpc.DialOption which can be provided to the internal ClientConnections.

@kjelle
Copy link

kjelle commented Oct 1, 2018

Hello. We experience this problem, using docker 18.06.1-ce in swarm mode with api 1.38. The docker manager tries to call /v1.38/tasks which generates an error:
error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (6908543 vs. 4194304)" rpc=/docker.swarmkit.v1.Control/ListTasks

@Marc3001
Copy link

Same here with our swarm cluster. We have ~350 services running and we are not able to list tasks as grpc message is reaching limit.
Also sometimes our worker is not able to respond any of our manager request and logs on worker are filled with these errors.

level=error msg="agent: session failed" backoff=100ms error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4823143 vs. 4194304)" module=node/agent node.id=sfygnj6xhjm420khg0pe6nn9x
level=info msg="manager selected by agent for new session: { }" module=node/agent node.id=sfygnj6xhjm420khg0pe6nn9x
level=info msg="waiting 90.674605ms before registering session" module=node/agent node.id=sfygnj6xhjm420khg0pe6nn9x

Then we are not able to update service or start/stop swarm tasks

@Marc3001
Copy link

Something I don't get is since 17.06 #2378 grpc max size should be 128Mb not 4Mb right ?
Why do we still have 4Mb max size on some messages ?

@kjelle
Copy link

kjelle commented Oct 26, 2018

I just moby/moby#37941 (comment) - we have messages on ~319 MB...

@Marc3001 setting a high max size really doesn't fix the problem. We experience this problem with messages way bigger than 128MB...

thaJeztah added a commit to thaJeztah/docker that referenced this issue Sep 12, 2019
…3 branch)

full diff: moby/swarmkit@4fb9e96...bbe3418

changes included:

- moby/swarmkit#2889 [19.03 backport] Fix update out of sequence and increase max recv gRPC message size for nodes and secrets

Which relates to

- moby#39531 integration-cli: fix swarm tests flakiness
- docker-archive#345 [19.03 backport] integration-cli: fix swarm tests flakiness

And includes backports of

- moby/swarmkit#2808 Fix flaky tests
- moby/swarmkit#2866 Swap gometalinter for golangci-lint
- moby/swarmkit#2869 Increase max recv gRPC message size to initialize connection broker
 - related / similar to moby#38103 / docker-archive#102 cluster: set bigger grpc limit for array requests
 - related / similar to moby#39306 Increase max recv gRPC message size for nodes and secrets
 - fixes moby/swarmkit#2733 Error generated when messages size is too big
- moby/swarmkit#2870 Fix update out of sequence

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this issue Sep 12, 2019
…3 branch)

full diff: moby/swarmkit@4fb9e96...bbe3418

changes included:

- moby/swarmkit#2889 [19.03 backport] Fix update out of sequence and increase max recv gRPC message size for nodes and secrets

Which relates to

- moby/moby#39531 integration-cli: fix swarm tests flakiness
- docker-archive/engine#345 [19.03 backport] integration-cli: fix swarm tests flakiness

And includes backports of

- moby/swarmkit#2808 Fix flaky tests
- moby/swarmkit#2866 Swap gometalinter for golangci-lint
- moby/swarmkit#2869 Increase max recv gRPC message size to initialize connection broker
 - related / similar to moby/moby#38103 / docker-archive/engine#102 cluster: set bigger grpc limit for array requests
 - related / similar to moby/moby#39306 Increase max recv gRPC message size for nodes and secrets
 - fixes moby/swarmkit#2733 Error generated when messages size is too big
- moby/swarmkit#2870 Fix update out of sequence

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: f7dbee3eeaa1dd218116f85b8f60361acbd5b214
Component: engine
thaJeztah added a commit to thaJeztah/docker that referenced this issue Oct 9, 2019
…v18.09)

full diff: moby/swarmkit@142a737...5c86095

- moby/swarmkit#2892 [18.09 backport] Remove hardcoded IPAM config subnet value for ingress network
    - backport of moby/swarmkit#2890 Remove hardcoded IPAM config subnet value for ingress network
    - fixes [ENGORC-2651](https://docker.atlassian.net/browse/ENGORC-2651)
- moby/swarmkit#2836 [18.09 backport] Switch to go 1.11
    - backport of moby/swarmkit#2752 Switch to go 1.11
- moby/swarmkit#2901 [18.09 backport] Bump to golang 1.12.9
    - backport of moby/swarmkit#2880 Bump to golang 1.12.9
- moby/swarmkit#2900 [18.09 backport] Fix update out of sequence and increase max recv gRPC message size for nodes and secrets
    - backport of moby/swarmkit#2762 Increased wait time on test utils WaitForCluster and WatchTaskCreate
    - backport of moby/swarmkit#2771 Allow using Configs as CredentialSpecs
        - **second commit only** (attempt to fix weirdly broken tests)
    - backport of moby/swarmkit#2808 Fix flaky tests
    - backport of moby/swarmkit#2866 Swap gometalinter for golangci-lint
    - backport of moby/swarmkit#2869 Increase max recv gRPC message size to initialize connection broker
        - related / similar to moby#38103 / docker-archive#102 cluster: set bigger grpc limit for array requests
        - related / similar to moby#39306 Increase max recv gRPC message size for nodes and secrets
        - fixes moby/swarmkit#2733 Error generated when messages size is too big
    - backport of moby/swarmkit#2870 Fix update out of sequence

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this issue Oct 23, 2019
…v18.09)

full diff: moby/swarmkit@142a737...5c86095

- moby/swarmkit#2892 [18.09 backport] Remove hardcoded IPAM config subnet value for ingress network
    - backport of moby/swarmkit#2890 Remove hardcoded IPAM config subnet value for ingress network
    - fixes [ENGORC-2651](https://docker.atlassian.net/browse/ENGORC-2651)
- moby/swarmkit#2836 [18.09 backport] Switch to go 1.11
    - backport of moby/swarmkit#2752 Switch to go 1.11
- moby/swarmkit#2901 [18.09 backport] Bump to golang 1.12.9
    - backport of moby/swarmkit#2880 Bump to golang 1.12.9
- moby/swarmkit#2900 [18.09 backport] Fix update out of sequence and increase max recv gRPC message size for nodes and secrets
    - backport of moby/swarmkit#2762 Increased wait time on test utils WaitForCluster and WatchTaskCreate
    - backport of moby/swarmkit#2771 Allow using Configs as CredentialSpecs
        - **second commit only** (attempt to fix weirdly broken tests)
    - backport of moby/swarmkit#2808 Fix flaky tests
    - backport of moby/swarmkit#2866 Swap gometalinter for golangci-lint
    - backport of moby/swarmkit#2869 Increase max recv gRPC message size to initialize connection broker
        - related / similar to moby/moby#38103 / docker-archive/engine#102 cluster: set bigger grpc limit for array requests
        - related / similar to moby/moby#39306 Increase max recv gRPC message size for nodes and secrets
        - fixes moby/swarmkit#2733 Error generated when messages size is too big
    - backport of moby/swarmkit#2870 Fix update out of sequence

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: e06f07ef337ab890f211397d6b408b75a2512dc5
Component: engine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants