Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NE-822 Don't scale route weight on single service routes #377

Merged

Conversation

gcs278
Copy link
Contributor

@gcs278 gcs278 commented Mar 22, 2022

Change to not scale weight to 256 if there is only one service for the route. More information can be found here: NE-709 Impact of Server Weight on Memory Allocation

In a nutshell, scaling the weight to 256 is redundant for single services routes since all servers in the haproxy backend will always have the same weight, so therefore weight 1 == weight 256. Reducing the weight helps with reducing the static memory allocation.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 22, 2022
@openshift-ci openshift-ci bot requested review from candita and Miciah March 22, 2022 21:39
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 22, 2022
@gcs278 gcs278 force-pushed the NE822-Backend-Weight-Scale branch from 6b7042e to fc7e579 Compare March 22, 2022 21:39
@frobware
Copy link
Contributor

  • 28 failures to create the sandbox

/retest

@frobware
Copy link
Contributor

cluster bootstrap failed

/test e2e-upgrade

@gcs278 gcs278 force-pushed the NE822-Backend-Weight-Scale branch from fc7e579 to 76028a0 Compare March 23, 2022 13:40
@gcs278
Copy link
Contributor Author

gcs278 commented Mar 23, 2022

/retest

@gcs278
Copy link
Contributor Author

gcs278 commented Mar 24, 2022

/retest

2 similar comments
@gcs278
Copy link
Contributor Author

gcs278 commented Mar 24, 2022

/retest

@gcs278
Copy link
Contributor Author

gcs278 commented Mar 25, 2022

/retest

@gcs278 gcs278 force-pushed the NE822-Backend-Weight-Scale branch from 76028a0 to e24ca3a Compare March 25, 2022 17:51
@gcs278 gcs278 changed the title [WIP] NE-822 Don't scale route weight on single service routes NE-822 Don't scale route weight on single service routes Mar 30, 2022
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 30, 2022
@gcs278
Copy link
Contributor Author

gcs278 commented Mar 30, 2022

/hold

I think we should wait for perf&scale test.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 30, 2022
@gcs278
Copy link
Contributor Author

gcs278 commented Mar 31, 2022

/retest

1 similar comment
@gcs278
Copy link
Contributor Author

gcs278 commented Apr 5, 2022

/retest

Comment on lines 1293 to 1297
for key := range serviceUnits {
serviceUnitNames[key] = 1
}
Copy link
Contributor

@Miciah Miciah Apr 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the service unit has 0 endpoints? We should check for that case:

Suggested change
for key := range serviceUnits {
serviceUnitNames[key] = 1
}
for key := range serviceUnits {
if r.numberOfEndpoints(key) > 0 {
serviceUnitNames[key] = 1
}
}

Granted, omitting the r.numberOfEndpoints(key) > 0 check might not change the ultimate result from the config template:

        {{- range $serviceUnitName, $weight := $cfg.ServiceUnitNames }}
          {{- if ge $weight 0 }}{{/* weight=0 is reasonable to keep existing connections to backends with cookies as we can see the HTTP headers */}}
            {{- with $serviceUnit := index $.ServiceUnits $serviceUnitName }}
              {{- range $idx, $endpoint := processEndpointsForAlias $cfg $serviceUnit (env "ROUTER_BACKEND_PROCESS_ENDPOINTS" "") }}
                {{/* [actual HAProxy config stuff here] */}}
            {{- end }}{{/* end range processEndpointsForAlias */}}
          {{- end }}{{/* end get serviceUnit from its name */}}
        {{- end }}{{/* end range over serviceUnitNames */}}

In the template, the effect is the same whether range $serviceUnitName, $weight := $cfg.ServiceUnitNames iterates 0 times or whether range $idx, $endpoint := processEndpointsForAlias $cfg $serviceUnit (env "ROUTER_BACKEND_PROCESS_ENDPOINTS" "") iterates 0 times. However, it could cause problems for the dynamic config manager, which has logic like this:

		// As the endpoints have changed, recalculate the weights.
		newWeights := r.calculateServiceWeights(cfg.ServiceUnits)

		// Get the weight for this service unit.
		weight, ok := newWeights[id]
		if !ok {
			weight = 0
		}

Anyway, from a strict correctness perspective, I believe calculateServiceWeights needs the r.numberOfEndpoints(key) > 0 check.

Please add a unit test case to ensure we don't regress:

--- a/pkg/router/template/router_test.go
+++ b/pkg/router/template/router_test.go
@@ -873,6 +873,16 @@ func TestCalculateServiceWeights(t *testing.T) {
 		serviceWeights  map[ServiceUnitKey]int32
 		expectedWeights map[ServiceUnitKey]int32
 	}{
+		{
+			name: "service with no endpoint",
+			serviceUnits: map[ServiceUnitKey][]Endpoint{
+				suKey1: {},
+			},
+			serviceWeights: map[ServiceUnitKey]int32{
+				suKey1: 100,
+			},
+			expectedWeights: map[ServiceUnitKey]int32{},
+		},
 		{
 			name: "equally weighted services with same number of endpoints",
 			serviceUnits: map[ServiceUnitKey][]Endpoint{

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, didn't think about a single service with no endpoints and exposing that datapath. Agreed it doesn't functionally changing anything, but keep our code clean for the future.

Done.

@gcs278 gcs278 force-pushed the NE822-Backend-Weight-Scale branch from e24ca3a to 9656da7 Compare April 6, 2022 22:04
@gcs278
Copy link
Contributor Author

gcs278 commented Apr 7, 2022

/retest

2 similar comments
@gcs278
Copy link
Contributor Author

gcs278 commented Apr 11, 2022

/retest

@gcs278
Copy link
Contributor Author

gcs278 commented Apr 12, 2022

/retest

@Miciah
Copy link
Contributor

Miciah commented Apr 22, 2022

/lgtm
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 22, 2022
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

13 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@quarterpin
Copy link

quarterpin commented Apr 25, 2022

label /qe-approved

@quarterpin
Copy link

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Apr 25, 2022
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@gcs278
Copy link
Contributor Author

gcs278 commented Apr 25, 2022

/retest

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@gcs278
Copy link
Contributor Author

gcs278 commented Apr 25, 2022

/skip

@gcs278
Copy link
Contributor Author

gcs278 commented Apr 25, 2022

/label px-approved
Parent Epic NE-709 has px not needed.

@openshift-ci openshift-ci bot added the px-approved Signifies that Product Support has signed off on this PR label Apr 25, 2022
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 26, 2022

@gcs278: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn-ipv6 9656da7 link false /test e2e-metal-ipi-ovn-ipv6

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 585d783 into openshift:master Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants