-
Notifications
You must be signed in to change notification settings - Fork 665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
envoy listener exact balancer #3314
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3314 +/- ##
==========================================
- Coverage 75.71% 75.47% -0.25%
==========================================
Files 98 98
Lines 6559 6588 +29
==========================================
+ Hits 4966 4972 +6
- Misses 1485 1509 +24
+ Partials 108 107 -1
|
@sunjayBhatia have you worked with this Envoy setting? |
Not directly, no. I believe this feature is used for balancing connections on worker threads within Envoy so it might make some sense to expose for an environment that has many long lived connections since I guess you have some prior contextual knowledge the kernel does not that help in scheduling. From reading the issue this was useful to help solve some specific tail latency issues. @iyacontrol I think before accepting this change we would want some more context on why, what this solves for the submitter, what are the symptoms, reproduction maybe, etc. as well as testing, docs, etc. |
Yes, thanks for this PR, but in general, we like an issue to document the problem before accepting a PR to fix that problem. In this case, I'd like to know what the tradeoffs are. Should we just be using the exact balancer all the time? In general, we prefer not to add a configuration setting unless there is a strong need. |
thanks @stevesloka @sunjayBhatia @youngnick ! 1: Add the parameter settings of envoy's listener, which can meet the needs of more people in more scenarios. We use grpc extensively in our company. I have been using the exact balancer in our production environment for a month,the effect is very good. |
i will add some docs. |
related issue #3331 |
Thanks for this PR @iyacontrol, and thanks for being patient while I clarified what we're doing here. I agree that we should add the configurability, and it should be a config-file item, but I'm not sure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to make sure we have the right name for the config item, once we put it in, it's very difficult to change.
Also, any chance we could get a test in internal/xdscache
somewhere?
@@ -383,6 +383,15 @@ type NetworkParameters struct { | |||
XffNumTrustedHops uint32 `yaml:"num-trusted-hops"` | |||
} | |||
|
|||
// ListenerParameters hold various configurable listener values. | |||
type ListenerParameters struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like these are really upstream settings, not listener settings. How about we call it something like upstream
, or upstreamConfig
or similar? @skriss @stevesloka @sunjayBhatia, any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the balancing is done on which Envoy worker thread will accept and do work on a connection, I would say tying "upstream" into the naming makes this seem like an upstream load balancing parameter which it isn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove "Use" and have the name be something to the effect of "evenly distribute accepted connections between Envoy workers"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us read the envoy docs. there is some params of listener:
"name": "...",
"address": "{...}",
"filter_chains": [],
"use_original_dst": "{...}",
"per_connection_buffer_limit_bytes": "{...}",
"metadata": "{...}",
"drain_type": "...",
"listener_filters": [],
"listener_filters_timeout": "{...}",
"continue_on_listener_filters_timeout": "...",
"transparent": "{...}",
"freebind": "{...}",
"socket_options": [],
"tcp_fast_open_queue_length": "{...}",
"traffic_direction": "...",
"udp_listener_config": "{...}",
"api_listener": "{...}",
"connection_balance_config": "{...}",
"reuse_port": "...",
"access_log": []
}
Exact-balance is one of the parameters, I think contour may be added to support other parameter settings in the future.
So i think the listenterParams is appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, I guess we're adding listener
then.
I think we should call the inner parameter connectionBalancer
and have the only valid value be exact
. Other values should be ignored, with an error logged. Putting the wrong value should not be fatal.
Yes, I know, this is a more complicated way to do what's a boolean anyway, but this way it's more descriptive, shorter, and allows for more flexibility in the future.
So, the stanza in YAML should look like:
listener:
connectionBalancer: exact
What do you think? @sunjayBhatia @iyacontrol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that listener
is the right top-level name, but in that case, I'd like to see the actual field name a little more clear.
pkg/config/parameters.go
Outdated
|
||
// See https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/listener.proto#envoy-api-msg-listener-connectionbalanceconfig | ||
// for more information. | ||
UseExactConnectionBalancer bool `yaml:"use-exact-connection-balancer"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I commented earlier, I think this should be called ConnectionBalancer
, and be a string where the only valid value is exact
. The YAML tag should be connectionBalancer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @youngnick .
i agree with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM aside from one small nonblocking nit.
|
||
// See https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/listener.proto#envoy-api-msg-listener-connectionbalanceconfig | ||
// for more information. | ||
ConnectionBalancer string `yaml:"connection-balancer"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I generally prefer k8s-style camelCase
YAML names, but I see we have plenty of hyphenated ones in our config filie. So this is a small nit, I'll leave it up to other maintainers to make a call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lookd good overall, just a few minor nits and we probably still need some documentation added here: https://github.com/projectcontour/contour/blob/1afeff96f27ec0dfe75746194dbf08da66a1182d/site/docs/main/configuration.md#configuration-file (a listener entry at the top level of the config description table and a sub section describing the configurable listener fields)
pkg/config/parameters.go
Outdated
// ListenerParameters hold various configurable listener values. | ||
type ListenerParameters struct { | ||
// ConnectionBalancer. If the value is exact, the listener will use the exact connection balancer | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Remove this extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done it
cmd/contour/serve.go
Outdated
@@ -272,6 +272,12 @@ func doServe(log logrus.FieldLogger, ctx *serveContext) error { | |||
return fmt.Errorf("error parsing request timeout: %w", err) | |||
} | |||
|
|||
// connection balancer | |||
if ok := ctx.Config.Listener.ConnectionBalancer == "exact" || ctx.Config.Listener.ConnectionBalancer == ""; !ok { | |||
log.Warnf("invalid connection balancer value %s. the connection balancer only support exact for now.", ctx.Config.Listener.ConnectionBalancer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Warnf("invalid connection balancer value %s. the connection balancer only support exact for now.", ctx.Config.Listener.ConnectionBalancer) | |
log.Warnf("Invalid listener connection balancer value %q. Only "exact" connection balancing is supported for now.", ctx.Config.Listener.ConnectionBalancer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done it.
hi @sunjayBhatia i will add docs after pr is merged. |
d8451e0
to
3dc51b5
Compare
hi @sunjayBhatia can you resolve the conflicts ? |
You’ll have to merge the main branch with your PR branch and fix the merge conflicts |
done it. hi @sunjayBhatia can you approval the pr? |
I think we would usually like to keep the docs and code change together as one PR for consistency, also to ensure if we cut a release we have features documented already |
the docs has been added. |
site/docs/main/configuration.md
Outdated
|
||
| Field Name | Type| Default | Description | | ||
|------------|-----|----------|-------------| | ||
| connection-balancer | string | `""` | This field specifies the listener connection balancer. If the value is `exact`, the listener will use the exact connection balancer. See [the Envoy documentation][14] for more information. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| connection-balancer | string | `""` | This field specifies the listener connection balancer. If the value is `exact`, the listener will use the exact connection balancer. See [the Envoy documentation][14] for more information. | | |
| connection-balancer | string | `""` | This field specifies the listener connection balancer. If the value is `exact`, the listener will use the exact connection balancer to balance connections between threads in a single Envoy process. See [the Envoy documentation][14] for more information. | |
Small suggestion to make it clearer that this is Envoy-internal, and not for load-balancing to upstreams/backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done it. hi @youngnick @sunjayBhatia maybe this pr can be merged. Any other questions?
Re-enables the ExternalName service integration test and uses in-cluster services as the targets of the ExternalName services to avoid flaky dependencies on external domains. Closes projectcontour#3320. Signed-off-by: Steve Kriss <krisss@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Ciphers are validated against the allowed ciphers for Envoy. Contour will exit on startup if any invalid ciphers are present in the config file. If no ciphers are provided, Contour will use the defaults that exist now. Also allows ECDHE-ECDSA-AES(128|256)-GCM-SHA256 ciphers to be passed standalone to help with FIPS support Fixes projectcontour#2880 Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Updates the API reference docs generator to v0.3.0, primarily to get a fix that sorts API groups/versions in the output so it's deterministic. Signed-off-by: Steve Kriss <krisss@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
…3350) Updates projectcontour#3349 Signed-off-by: Steve Sloka <slokas@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
In projectcontour#3350, duplicate labels were added, this resolves that to make sure the labels are unique. Fixes projectcontour#3368 Signed-off-by: Steve Sloka <slokas@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
design: Design for HTTPProxy Conflict Resolution Updates: projectcontour#2019 projectcontour#3039 projectcontour#3071 Signed-off-by: Steve Sloka <slokas@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Updates projectcontour#2554. Signed-off-by: Amey Bhide <amey15@gmail.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
…roxy (projectcontour#3291) Sets the SNI on any TCPProxy.Service which references an externalName type service as well as having the upstream protocol of "tls". Updates projectcontour#2517 Signed-off-by: Steve Sloka <slokas@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Upgrades controller-runtime to v0.8.2 in preparation for updating the Gateway API dependency. Updates projectcontour#3370. Signed-off-by: Steve Kriss <krisss@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Adds support for configuring an external Rate Limit Service (RLS) that is used for making global rate limit decisions. Adds support to HTTPProxy for global rate limit policies, that generate descriptors for requests to be sent to the RLS for rate-limit decisions. Closes projectcontour#370. Signed-off-by: Steve Kriss <krisss@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
The CI job was missing the `yamllint` & `flags` tasks to verify the yaml format as well as flags. This adds those back in as well as fixes the one issue found with the comment on the ratelimit example. Signed-off-by: Steve Sloka <slokas@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
…our#3378) Adds a simple integration test based upon the httpproxy/003-path-condition-match.yaml test that already exists for HTTPProxy resources, but aligns to GatewayAPI. Updates projectcontour#2287 Signed-off-by: Steve Sloka <slokas@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Service APIs have been renamed to Gateway API. This replaces the service-apis module with the gateway-api one and updates related references. Updates projectcontour#3349 Signed-off-by: Steve Sloka <slokas@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
…tour#3382) The site proof check which runs on each run checks for invalid links, however, the links to envoyproxy.io seem to fail the site check. This adds the envoyproxy.io site to the ignore list which will skip this site from getting checked. Signed-off-by: Steve Sloka <slokas@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Link to variables not constants section Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
…contour#3380) Only cache Gateways that match what's configured for Contour. Updates projectcontour#3351 Signed-off-by: Steve Sloka <slokas@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Re-applies projectcontour#3237 and projectcontour#3154 Blocked on projectcontour#3300 and projectcontour#3301 and should be merged for 1.13.0 release Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
…ectcontour#3376) Fixes: projectcontour#3323 Signed-off-by: Amey Bhide <amey15@gmail.com> Signed-off-by: iyacontrol <gaohj2015@yeah.net>
8365b99
to
2f8b880
Compare
… exact-balance
Hi @youngnick All checks have passed. |
envoy issues: 4602
envoy docs: docs
When long keepalive connections, the listener should use the exact connection balancer. Otherwise, the cores of cpus are not balanced.