Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

envoy listener exact balancer #3314

Merged
merged 66 commits into from Feb 23, 2021
Merged

Conversation

iyacontrol
Copy link
Contributor

envoy issues: 4602
envoy docs: docs

When long keepalive connections, the listener should use the exact connection balancer. Otherwise, the cores of cpus are not balanced.

@iyacontrol iyacontrol requested a review from a team as a code owner February 4, 2021 10:11
@iyacontrol iyacontrol requested review from stevesloka and danehans and removed request for a team February 4, 2021 10:11
@codecov
Copy link

codecov bot commented Feb 4, 2021

Codecov Report

Merging #3314 (9326983) into main (64db908) will decrease coverage by 0.24%.
The diff coverage is 3.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3314      +/-   ##
==========================================
- Coverage   75.71%   75.47%   -0.25%     
==========================================
  Files          98       98              
  Lines        6559     6588      +29     
==========================================
+ Hits         4966     4972       +6     
- Misses       1485     1509      +24     
+ Partials      108      107       -1     
Impacted Files Coverage Δ
cmd/contour/serve.go 0.00% <0.00%> (ø)
internal/xdscache/v3/listener.go 93.33% <16.66%> (-2.26%) ⬇️
internal/dag/cache.go 91.51% <0.00%> (+0.60%) ⬆️
internal/k8s/log.go 69.56% <0.00%> (+6.52%) ⬆️

@stevesloka
Copy link
Member

@sunjayBhatia have you worked with this Envoy setting?

@sunjayBhatia
Copy link
Member

Not directly, no. I believe this feature is used for balancing connections on worker threads within Envoy so it might make some sense to expose for an environment that has many long lived connections since I guess you have some prior contextual knowledge the kernel does not that help in scheduling. From reading the issue this was useful to help solve some specific tail latency issues.

@iyacontrol I think before accepting this change we would want some more context on why, what this solves for the submitter, what are the symptoms, reproduction maybe, etc. as well as testing, docs, etc.

@youngnick
Copy link
Member

Yes, thanks for this PR, but in general, we like an issue to document the problem before accepting a PR to fix that problem.

In this case, I'd like to know what the tradeoffs are. Should we just be using the exact balancer all the time? In general, we prefer not to add a configuration setting unless there is a strong need.

@iyacontrol
Copy link
Contributor Author

thanks @stevesloka @sunjayBhatia @youngnick !

1: Add the parameter settings of envoy's listener, which can meet the needs of more people in more scenarios.
2: Envoy threading model . The existing architecture will not work well if Envoy is deployed in a scenario in which there are very few connections that require substantial resources to handle. This is because there is no guarantee that connections will be evenly spread between workers. This can be solved by implementing worker connection balancing in which a worker is capable of forwarding a connection to another worker for handling.

We use grpc extensively in our company. I have been using the exact balancer in our production environment for a month,the effect is very good.

@iyacontrol
Copy link
Contributor Author

Not directly, no. I believe this feature is used for balancing connections on worker threads within Envoy so it might make some sense to expose for an environment that has many long lived connections since I guess you have some prior contextual knowledge the kernel does not that help in scheduling. From reading the issue this was useful to help solve some specific tail latency issues.

@iyacontrol I think before accepting this change we would want some more context on why, what this solves for the submitter, what are the symptoms, reproduction maybe, etc. as well as testing, docs, etc.

i will add some docs.

@iyacontrol
Copy link
Contributor Author

iyacontrol commented Feb 6, 2021

related issue #3331

@youngnick
Copy link
Member

Thanks for this PR @iyacontrol, and thanks for being patient while I clarified what we're doing here.

I agree that we should add the configurability, and it should be a config-file item, but I'm not sure listeners is the right place to put it. I'll do a proper review and raise this there though.

Copy link
Member

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make sure we have the right name for the config item, once we put it in, it's very difficult to change.

Also, any chance we could get a test in internal/xdscache somewhere?

@@ -383,6 +383,15 @@ type NetworkParameters struct {
XffNumTrustedHops uint32 `yaml:"num-trusted-hops"`
}

// ListenerParameters hold various configurable listener values.
type ListenerParameters struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like these are really upstream settings, not listener settings. How about we call it something like upstream, or upstreamConfig or similar? @skriss @stevesloka @sunjayBhatia, any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the balancing is done on which Envoy worker thread will accept and do work on a connection, I would say tying "upstream" into the naming makes this seem like an upstream load balancing parameter which it isn't

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe remove "Use" and have the name be something to the effect of "evenly distribute accepted connections between Envoy workers"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us read the envoy docs. there is some params of listener:

"name": "...",
 "address": "{...}",
 "filter_chains": [],
 "use_original_dst": "{...}",
 "per_connection_buffer_limit_bytes": "{...}",
 "metadata": "{...}",
 "drain_type": "...",
 "listener_filters": [],
 "listener_filters_timeout": "{...}",
 "continue_on_listener_filters_timeout": "...",
 "transparent": "{...}",
 "freebind": "{...}",
 "socket_options": [],
 "tcp_fast_open_queue_length": "{...}",
 "traffic_direction": "...",
 "udp_listener_config": "{...}",
 "api_listener": "{...}",
 "connection_balance_config": "{...}",
 "reuse_port": "...",
 "access_log": []
}

Exact-balance is one of the parameters, I think contour may be added to support other parameter settings in the future.

So i think the listenterParams is appropriate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I guess we're adding listener then.

I think we should call the inner parameter connectionBalancer and have the only valid value be exact. Other values should be ignored, with an error logged. Putting the wrong value should not be fatal.

Yes, I know, this is a more complicated way to do what's a boolean anyway, but this way it's more descriptive, shorter, and allows for more flexibility in the future.

So, the stanza in YAML should look like:

listener:
  connectionBalancer: exact

What do you think? @sunjayBhatia @iyacontrol?

@youngnick youngnick self-requested a review February 15, 2021 02:54
Copy link
Member

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that listener is the right top-level name, but in that case, I'd like to see the actual field name a little more clear.


// See https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/listener.proto#envoy-api-msg-listener-connectionbalanceconfig
// for more information.
UseExactConnectionBalancer bool `yaml:"use-exact-connection-balancer"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I commented earlier, I think this should be called ConnectionBalancer, and be a string where the only valid value is exact. The YAML tag should be connectionBalancer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @youngnick .

i agree with you.

Copy link
Member

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from one small nonblocking nit.


// See https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/listener.proto#envoy-api-msg-listener-connectionbalanceconfig
// for more information.
ConnectionBalancer string `yaml:"connection-balancer"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I generally prefer k8s-style camelCase YAML names, but I see we have plenty of hyphenated ones in our config filie. So this is a small nit, I'll leave it up to other maintainers to make a call.

Copy link
Member

@sunjayBhatia sunjayBhatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lookd good overall, just a few minor nits and we probably still need some documentation added here: https://github.com/projectcontour/contour/blob/1afeff96f27ec0dfe75746194dbf08da66a1182d/site/docs/main/configuration.md#configuration-file (a listener entry at the top level of the config description table and a sub section describing the configurable listener fields)

// ListenerParameters hold various configurable listener values.
type ListenerParameters struct {
// ConnectionBalancer. If the value is exact, the listener will use the exact connection balancer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Remove this extra line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done it

@@ -272,6 +272,12 @@ func doServe(log logrus.FieldLogger, ctx *serveContext) error {
return fmt.Errorf("error parsing request timeout: %w", err)
}

// connection balancer
if ok := ctx.Config.Listener.ConnectionBalancer == "exact" || ctx.Config.Listener.ConnectionBalancer == ""; !ok {
log.Warnf("invalid connection balancer value %s. the connection balancer only support exact for now.", ctx.Config.Listener.ConnectionBalancer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Warnf("invalid connection balancer value %s. the connection balancer only support exact for now.", ctx.Config.Listener.ConnectionBalancer)
log.Warnf("Invalid listener connection balancer value %q. Only "exact" connection balancing is supported for now.", ctx.Config.Listener.ConnectionBalancer)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done it.

@iyacontrol
Copy link
Contributor Author

iyacontrol commented Feb 18, 2021

Lookd good overall, just a few minor nits and we probably still need some documentation added here: https://github.com/projectcontour/contour/blob/1afeff96f27ec0dfe75746194dbf08da66a1182d/site/docs/main/configuration.md#configuration-file (a listener entry at the top level of the config description table and a sub section describing the configurable listener fields)

hi @sunjayBhatia i will add docs after pr is merged.

@iyacontrol
Copy link
Contributor Author

hi @sunjayBhatia can you resolve the conflicts ?

@sunjayBhatia
Copy link
Member

hi @sunjayBhatia can you resolve the conflicts ?

You’ll have to merge the main branch with your PR branch and fix the merge conflicts

@iyacontrol
Copy link
Contributor Author

iyacontrol commented Feb 20, 2021

fix the merge conflicts

done it. hi @sunjayBhatia can you approval the pr?

@sunjayBhatia
Copy link
Member

Lookd good overall, just a few minor nits and we probably still need some documentation added here: https://github.com/projectcontour/contour/blob/1afeff96f27ec0dfe75746194dbf08da66a1182d/site/docs/main/configuration.md#configuration-file (a listener entry at the top level of the config description table and a sub section describing the configurable listener fields)

hi @sunjayBhatia i will add docs after pr is merged.

I think we would usually like to keep the docs and code change together as one PR for consistency, also to ensure if we cut a release we have features documented already

@iyacontrol
Copy link
Contributor Author

Lookd good overall, just a few minor nits and we probably still need some documentation added here: https://github.com/projectcontour/contour/blob/1afeff96f27ec0dfe75746194dbf08da66a1182d/site/docs/main/configuration.md#configuration-file (a listener entry at the top level of the config description table and a sub section describing the configurable listener fields)

hi @sunjayBhatia i will add docs after pr is merged.

I think we would usually like to keep the docs and code change together as one PR for consistency, also to ensure if we cut a release we have features documented already

the docs has been added.


| Field Name | Type| Default | Description |
|------------|-----|----------|-------------|
| connection-balancer | string | `""` | This field specifies the listener connection balancer. If the value is `exact`, the listener will use the exact connection balancer. See [the Envoy documentation][14] for more information. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| connection-balancer | string | `""` | This field specifies the listener connection balancer. If the value is `exact`, the listener will use the exact connection balancer. See [the Envoy documentation][14] for more information. |
| connection-balancer | string | `""` | This field specifies the listener connection balancer. If the value is `exact`, the listener will use the exact connection balancer to balance connections between threads in a single Envoy process. See [the Envoy documentation][14] for more information. |

Small suggestion to make it clearer that this is Envoy-internal, and not for load-balancing to upstreams/backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done it. hi @youngnick @sunjayBhatia maybe this pr can be merged. Any other questions?

@youngnick youngnick added this to the 1.13.0 milestone Feb 22, 2021
@youngnick youngnick added this to In progress in Contour Project Board via automation Feb 22, 2021
@youngnick youngnick moved this from In progress to 1.13 Release in Contour Project Board Feb 22, 2021
skriss and others added 23 commits February 23, 2021 11:42
Re-enables the ExternalName service integration test
and uses in-cluster services as the targets of the
ExternalName services to avoid flaky dependencies on
external domains.

Closes projectcontour#3320.

Signed-off-by: Steve Kriss <krisss@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Ciphers are validated against the allowed ciphers for Envoy. Contour
will exit on startup if any invalid ciphers are present in the config
file. If no ciphers are provided, Contour will use the defaults that
exist now.

Also allows ECDHE-ECDSA-AES(128|256)-GCM-SHA256 ciphers to be passed standalone to help with FIPS support

Fixes projectcontour#2880

Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Updates the API reference docs generator to v0.3.0, primarily
to get a fix that sorts API groups/versions in the output so
it's deterministic.

Signed-off-by: Steve Kriss <krisss@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
…3350)

Updates projectcontour#3349

Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
In projectcontour#3350, duplicate labels were added, this resolves that to make sure the labels are unique.

Fixes projectcontour#3368

Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
design: Design for HTTPProxy Conflict Resolution

Updates: projectcontour#2019 projectcontour#3039 projectcontour#3071

Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Updates projectcontour#2554.

Signed-off-by: Amey Bhide <amey15@gmail.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
…roxy (projectcontour#3291)

Sets the SNI on any TCPProxy.Service which references an externalName type service as well as having the upstream protocol of "tls".

Updates projectcontour#2517

Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Upgrades controller-runtime to v0.8.2 in preparation for updating
the Gateway API dependency.

Updates projectcontour#3370.

Signed-off-by: Steve Kriss <krisss@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Adds support for configuring an external Rate Limit
Service (RLS) that is used for making global rate
limit decisions. Adds support to HTTPProxy for
global rate limit policies, that generate descriptors
for requests to be sent to the RLS for rate-limit
decisions.

Closes projectcontour#370.

Signed-off-by: Steve Kriss <krisss@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
The CI job was missing the `yamllint` & `flags` tasks to verify the yaml format as well as flags.
This adds those back in as well as fixes the one issue found with the comment on the ratelimit example.

Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
…our#3378)

Adds a simple integration test based upon the httpproxy/003-path-condition-match.yaml test that already
exists for HTTPProxy resources, but aligns to GatewayAPI.

Updates projectcontour#2287

Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Service APIs have been renamed to Gateway API. This replaces the service-apis module with the gateway-api one and updates related references.

Updates projectcontour#3349

Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
…tour#3382)

The site proof check which runs on each run checks for invalid links, however, the links
to envoyproxy.io seem to fail the site check. This adds the envoyproxy.io site to the ignore
list which will skip this site from getting checked.

Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Link to variables not constants section

Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
…contour#3380)

Only cache Gateways that match what's configured for Contour.

Updates projectcontour#3351

Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
Re-applies projectcontour#3237 and projectcontour#3154

Blocked on projectcontour#3300 and projectcontour#3301 and should be merged for 1.13.0 release

Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
…ectcontour#3376)

Fixes: projectcontour#3323
Signed-off-by: Amey Bhide <amey15@gmail.com>
Signed-off-by: iyacontrol <gaohj2015@yeah.net>
@iyacontrol
Copy link
Contributor Author

This is all good to go, @iyacontrol, except that the commits need a DCO signoff (a git commit -s), then we can get this one merged. If you can do this in the next twenty four hours, we can get this into Contour 1.13.

Hi @youngnick All checks have passed.

@youngnick youngnick merged commit f6323d3 into projectcontour:main Feb 23, 2021
@iyacontrol iyacontrol deleted the exact-balance branch February 23, 2021 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants