Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace CTLB workaround with a config option #8139

Merged
merged 7 commits into from Oct 25, 2023

Conversation

sridhartigera
Copy link
Member

@sridhartigera sridhartigera commented Oct 18, 2023

Description

Related issues/PRs

Todos

  • Tests
  • Documentation
  • Release note

Release Note

ebpf: Config option added for host networked NAT. Change in the configs related to connect time load balancing.

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

@marvin-tigera marvin-tigera added this to the Calico v3.27.0 milestone Oct 18, 2023
@marvin-tigera marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Oct 18, 2023
@sridhartigera sridhartigera marked this pull request as ready for review October 23, 2023 18:35
@sridhartigera sridhartigera requested a review from a team as a code owner October 23, 2023 18:35
// BPFConnectTimeLoadBalancing when in BPF mode, controls whether Felix installs the connect-time load
// balancer. The connect-time load balancer is required for the host to be able to reach Kubernetes services
// and it improves the performance of pod-to-service connections.When set to TCP, connect time load balancing
// is available only for services with TCP ports. [Default: Enabled]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo our default should be TCP

// is available only for services with TCP ports. [Default: Enabled]
BPFConnectTimeLoadBalancing *BPFConnectTimeLBType `json:"bpfConnectTimeLoadBalancing,omitempty" validate:"omitempty,oneof=TCP Enabled Disabled"`
// BPFHostNetworkedNATWithoutCTLB when in BPF mode, controls whether Felix does a NAT without CTLB. This along with BPFConnectTimeLoadBalancing
// determines the CTLB behavior. [Default: Disabled]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default should be enabled

The end result is that DNS which is UDP does not break when backend dies, while TCP does not pay the perf hit.

@@ -289,6 +289,8 @@ func StartNNodeTopology(n int, opts TopologyOptions, infra DatastoreInfra) (tc T
// host. So, disable CTLB handling for subsequent Felixes.
if i > 0 {
optsPerFelix[i].ExtraEnvVars["FELIX_BPFConnectTimeLoadBalancingEnabled"] = "false"
optsPerFelix[i].ExtraEnvVars["FELIX_BPFConnectTimeLoadBalancing"] = string(api.BPFConnectTimeLBDisabled)
optsPerFelix[i].ExtraEnvVars["FELIX_BPFHostNetworkedNATWithoutCTLB"] = string(api.BPFHostNetworkedNATEnabled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be also disabled. We do not want to install the workaround. CTLB is installed test-wide. We do not want to install it multiple times. That is why the option FELIX_BPFConnectTimeLoadBalancing=false is overriden above. But this one should be kept as set by the test.

@@ -113,4 +113,12 @@ func setDefaults(fc *apiv3.FelixConfiguration) {
disabled := apiv3.FloatingIPsDisabled
fc.Spec.FloatingIPs = &disabled
}
if fc.Spec.BPFConnectTimeLoadBalancing == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the place where we also need to decide if the legacy BPFConnectTimeLoadBalancingEnabled is set, right? If it is is and none of the new ones are set, just set BPFConnectTimeLoadBalancing=true and BPFHostNetworkedNATWithoutCTLB=disabled and then you do not need to worry about BPFConnectTimeLoadBalancingEnabled in Felix. If the new one and the old one are set then we need to either raise a warning or fail validation or something, right? Once we deprecate the legacy option we would remove the code here.

felix/config/config_params.go Outdated Show resolved Hide resolved
felix/config/config_params.go Outdated Show resolved Hide resolved
felix/config/config_params.go Outdated Show resolved Hide resolved
if config.BPFConnTimeLB == string(apiv3.BPFConnectTimeLBDisabled) &&
config.BPFHostNetworkedNAT == string(apiv3.BPFHostNetworkedNATDisabled) {
log.Warn("Access to services from host networked process wont work, forcing hostnetworked NAT to Enabled")
config.BPFHostNetworkedNAT = string(apiv3.BPFHostNetworkedNATEnabled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to just give a warning instead of forcing it?

// The above cases are invalid configuration. Revert to CTLB enabled.
if config.BPFHostNetworkedNAT == string(apiv3.BPFHostNetworkedNATEnabled) {
if config.BPFConnTimeLB == string(apiv3.BPFConnectTimeLBEnabled) {
log.Warn("Access to services may not work properly, reverting to default CTLB configuration")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config.BPFHostNetworkedNAT == string(apiv3.BPFHostNetworkedNATEnabled doesnot make sense, but also would not do much harm. I would revert here to Disabled instead of flipping the other to TCP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think flipping the ConnTimeLB to TCP keeps it line with the defaults.

felix/fv/infrastructure/topology.go Show resolved Hide resolved
@tomastigera tomastigera merged commit 1bd431c into projectcalico:master Oct 25, 2023
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-pr-required Change is not yet documented release-note-required Change has user-facing impact (no matter how small)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants