-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set grpc max message size to 128MB. #2375
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2375 +/- ##
==========================================
+ Coverage 60.35% 60.45% +0.09%
==========================================
Files 128 128
Lines 26267 26268 +1
==========================================
+ Hits 15854 15880 +26
+ Misses 9011 8993 -18
+ Partials 1402 1395 -7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to know the full side effects of doing this. My hunch is that message encoding gets affected for larger message size limits. Also related: https://stackoverflow.com/questions/34128872/google-protobuf-maximum-size
Ping @stevvooe.
I'd also recommend that we make this configurable. The default value for this parameter can stay at 128MB or whatever, but it'd be nice if we read this value out of |
Agreed @wsong
I'd like to also :)
I think encoding and decoding depends on actual message size and the structure of the data. For example, a deeply nested struct would be more costly to encode/decode. I don't think changing the limit affects encoding/decoding performance. @nishanttotla |
I'm not sure about that - when moving to the "proper" solution (streaming), the config option won't make sense anymore. I don't know if it makes sense to introduce a new option in a patch release just so we can deprecate it immediately after. |
manager/manager.go
Outdated
@@ -231,6 +231,7 @@ func New(config *Config) (*Manager, error) { | |||
grpc.Creds(config.SecurityConfig.ServerTLSCreds), | |||
grpc.StreamInterceptor(grpc_prometheus.StreamServerInterceptor), | |||
grpc.UnaryInterceptor(grpc_prometheus.UnaryServerInterceptor), | |||
grpc.MaxMsgSize(1024 * 1024 * 128), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This could be a constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use shift syntax: 128 << 20
.
Overall looks good What is going to happen when we reach 128M? Any way we could log a proper error message? |
I think some log messages should be included to bubble up errors from hitting the gRPC limit (either in this PR or as an immediate follow up). If we increase the limit to 128MB, then is it worth the engineering effort to implement streaming in addition? If we do implement streaming later, then we'd want to bring the limit down again. This may cause backward compatibility issues. I agree with @aluzzardi that if we introduce this option, it doesn't make sense to immediately deprecate it. |
We just met and agreed to not add the knob to adjust the max grpc message size. |
-1 on making this configurable.
Yes, this is a horrible thing to have as a scaling limit and it needs to be addressed. Besides patching this as a workaround, the next focus should be actually fixing this to handle arbitrary snapshot sizes. Increasing the message size limit and calling it good is not an acceptable engineering solution. Increasing this limit exposes us to message floods on all services, not just the raft service. Is there anything we can add to log messages that are over some threshold, so that we can have idea of the level of impact this is having? Prom counters bucketed by size might be appropriate. Such large messages will have an effect on stability at around 10MB and up with protobuf. The allocations required for processing a 128MB message may cause stability issues. I'd really like to know if there are messages that are larger than 1MB transiting swarm GRPC endpoints, as those are likely causing "buffer bloat" and allocation spikes. We really need to have an easy way for users to understand when they are impacted by this problem. |
Yeah, after meeting offline, I agree that this should not be configurable. Adding a metric to track large (i.e. > 1MB) GRPC messages is definitely a good idea, but it should probably go in a separate PR. Logging the length of every single message before sending it might have some unforeseen performance implications. |
No really what I was suggesting: if len(msg) > 1 << 20 {
// log it.
} Better would be to have bucketed counters (unrolled below): if len(msg) >= 1 << 20 {
gt1MB++
}
if len(msg) >= 4 << 20 {
gt4MB++
}
if len(msg) >= 10 << 20 {
gt10MB++
}
if len(msg) >= 16 << 20 {
gt16MB++
}
... Notice that this is a cumulative counter setup, rather than bucketed, which allows easy aggregates.
Sure, but let's make sure it gets backported: right now, this condition is hard to diagnose and debug. |
41c7e7d
to
58aa594
Compare
Manually tested for now. Positive test-case for around 120Mib. Negative for around 240Mib. Will explore the metrics and error reporting on a separate PR. |
Hmm...looks like the linter changed a few hours ago: golang/lint#319 I'm seeing these errors from the circleci job:
|
@anshulpundir @nishanttotla the CI is failing on lint. Do you know why? |
Can we pin the linter to a specific version to prevent that? |
Alternately, it may be faster to just open up a separate PR to fix those places in the code; it looks like there might not be too many. |
Doing this right now. |
Signed-off-by: Anshul Pundir <anshul.pundir@docker.com>
58aa594
to
7af6543
Compare
* Fix linter errors. Signed-off-by: Anshul Pundir <anshul.pundir@docker.com> (cherry picked from commit aa2c48b) Signed-off-by: Andrew Hsu <andrewhsu@docker.com> * Set grpc max message size to 128MB. (#2375) Signed-off-by: Anshul Pundir <anshul.pundir@docker.com>(cherry picked from commit b1bcc05) Signed-off-by: Andrew Hsu <andrewhsu@docker.com> * run vndr again because it now deletes unused files Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
Signed-off-by: Anshul Pundir <anshul.pundir@docker.com> (cherry picked from commit b1bcc05) Signed-off-by: Victor Vieux <victorvieux@gmail.com>
Signed-off-by: Anshul Pundir <anshul.pundir@docker.com> (cherry picked from commit b1bcc05) Signed-off-by: Anshul Pundir <anshul.pundir@docker.com>
Signed-off-by: Anshul Pundir <anshul.pundir@docker.com> (cherry picked from commit b1bcc05) Signed-off-by: Anshul Pundir <anshul.pundir@docker.com>
Signed-off-by: Anshul Pundir anshul.pundir@docker.com