Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify all health checks to be specified via enums #2078

Merged
merged 2 commits into from
Jan 15, 2019
Merged

Conversation

siggy
Copy link
Member

@siggy siggy commented Jan 14, 2019

Modify all health checks to be specified via enums

The set of health checks to be executed were dependent on a combination
of check enums and boolean options.

This change modifies the health checks to be governed strictly by a set
of enums. This change does not add or remove any checks, but rather
moves checks into more granular categories, such that any set of checks
that are toggle-able are defined together under a single category.

This is a first step in cleaning up the linkerd check code, and moving towards #1471.

Next steps:

  • tightly couple category IDs to names
  • tightly couple checks to their parent categories
  • programmatic control over check ordering

Signed-off-by: Andrew Seigner siggy@buoyant.io

The set of health checks to be executed were dependent on a combination
of check enums and boolean options.

This change modifies the health checks to be governed strictly by a set
of enums.

Next steps:
- tightly couple category IDs to names
- tightly couple checks to their parent categories
- programmatic control over check ordering

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
@siggy siggy self-assigned this Jan 14, 2019
@siggy siggy requested a review from klingerf January 14, 2019 18:57
@siggy siggy added this to In progress in Post 2.1 Polish via automation Jan 14, 2019
siggy added a commit that referenced this pull request Jan 14, 2019
The `linkerd check` command organized the various checks via loosely
coupled category IDs, category names, and checkers themselves, all with
ordering defined by consumers of this code.

This change removes category IDs in favor of category names, groups all
checkers by category, and enforces ordering at the `HealthChecker`
level.

Part of #1471, depends on #2078.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
@siggy siggy added the area/cli label Jan 14, 2019
@siggy siggy moved this from In progress to Needs review in Post 2.1 Polish Jan 14, 2019
Post 2.1 Polish automation moved this from Needs review to Reviewer approved Jan 15, 2019
Copy link
Member

@klingerf klingerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐️ This is great! Much more easy to reason about now that those boolean variables are gone.

},
})

// TODO: refactor with LinkerdPreInstallSingleNamespaceChecks
roleType := "ClusterRole"
roleBindingType := "ClusterRoleBinding"
Copy link
Member

@klingerf klingerf Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that's you've split the RBAC checks into multiple separate methods, I think it's clearer to hardcode everything, rather than worrying about code reuse. I'm inclined to just remove these local vars. Something like:

diff --git a/pkg/healthcheck/healthcheck.go b/pkg/healthcheck/healthcheck.go
index 24b9722e..31c99ce2 100644
--- a/pkg/healthcheck/healthcheck.go
+++ b/pkg/healthcheck/healthcheck.go
@@ -316,23 +316,19 @@ func (hc *HealthChecker) addLinkerdPreInstallClusterChecks() {
 		},
 	})
 
-	// TODO: refactor with LinkerdPreInstallSingleNamespaceChecks
-	roleType := "ClusterRole"
-	roleBindingType := "ClusterRoleBinding"
-
 	hc.checkers = append(hc.checkers, &checker{
 		category:    LinkerdPreInstallClusterCategory,
-		description: fmt.Sprintf("can create %ss", roleType),
+		description: "can create ClusterRoles",
 		check: func() error {
-			return hc.checkCanCreate("", "rbac.authorization.k8s.io", "v1beta1", roleType)
+			return hc.checkCanCreate("", "rbac.authorization.k8s.io", "v1beta1", "ClusterRole")
 		},
 	})
 
 	hc.checkers = append(hc.checkers, &checker{
 		category:    LinkerdPreInstallClusterCategory,
-		description: fmt.Sprintf("can create %ss", roleBindingType),
+		description: "can create ClusterRoleBindings",
 		check: func() error {
-			return hc.checkCanCreate("", "rbac.authorization.k8s.io", "v1beta1", roleBindingType)
+			return hc.checkCanCreate("", "rbac.authorization.k8s.io", "v1beta1", "ClusterRoleBinding")
 		},
 	})
 

Same goes for the checks in the addLinkerdPreInstallSingleNamespaceChecks func.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, yep, carry on!

@siggy siggy merged commit 0437341 into master Jan 15, 2019
Post 2.1 Polish automation moved this from Reviewer approved to Done Jan 15, 2019
siggy added a commit that referenced this pull request Jan 15, 2019
The linkerd check command organized the various checks via loosely
coupled category IDs, category names, and checkers themselves, all with
ordering defined by consumers of this code.

This change removes category IDs in favor of category names, groups all
checkers by category, and enforces ordering at the HealthChecker
level.

Part of #1471, depends on #2078.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
@siggy siggy deleted the siggy/check-enums branch January 15, 2019 01:26
siggy added a commit that referenced this pull request Jan 15, 2019
The linkerd check command organized the various checks via loosely
coupled category IDs, category names, and checkers themselves, all with
ordering defined by consumers of this code.

This change removes category IDs in favor of category names, groups all
checkers by category, and enforces ordering at the HealthChecker
level.

Part of #1471, depends on #2078.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
hawkw added a commit that referenced this pull request Aug 3, 2023
In 2.13, the default inbound and outbound HTTP request queue capacity
decreased from 10,000 requests to 100 requests (in PR #2078). This
change results in proxies shedding load much more aggressively while
under high load to a single destination service, resulting in increased
error rates in comparison to 2.12 (see #11055 for
details).

This commit changes the default HTTP request queue capacities for the
inbound and outbound proxies back to 10,000 requests, the way they were
in 2.12 and earlier. In manual load testing I've verified that
increasing the queue capacity results in a substantial decrease in 503
Service Unavailable errors emitted by the proxy: with a queue capacity
of 100 requests, the load test described [here] observed a failure rate
of 51.51% of requests, while with a queue capacity of 10,000 requests,
the same load test observes no failures.

Note that I did not modify the TCP connection queue capacities, or the
control plane request queue capacity. These were previously configured
by the same variable before #2078, but were split out into separate vars
in that change. I don't think the queue capacity limits for TCP
connection establishment or for control plane requests are currently
resulting in instability the way the decreased request queue capacity
is, so I decided to make a more focused change to just the HTTP request
queues for the proxies.

[here]: #11055 (comment)

---

* Increase HTTP request queue capacity (linkerd/linkerd2-proxy#2449)

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@hawkw hawkw mentioned this pull request Aug 3, 2023
hawkw added a commit that referenced this pull request Aug 9, 2023
In 2.13, the default inbound and outbound HTTP request queue capacity
decreased from 10,000 requests to 100 requests (in PR #2078). This
change results in proxies shedding load much more aggressively while
under high load to a single destination service, resulting in increased
error rates in comparison to 2.12 (see #11055 for
details).

This commit changes the default HTTP request queue capacities for the
inbound and outbound proxies back to 10,000 requests, the way they were
in 2.12 and earlier. In manual load testing I've verified that
increasing the queue capacity results in a substantial decrease in 503
Service Unavailable errors emitted by the proxy: with a queue capacity
of 100 requests, the load test described [here] observed a failure rate
of 51.51% of requests, while with a queue capacity of 10,000 requests,
the same load test observes no failures.

Note that I did not modify the TCP connection queue capacities, or the
control plane request queue capacity. These were previously configured
by the same variable before #2078, but were split out into separate vars
in that change. I don't think the queue capacity limits for TCP
connection establishment or for control plane requests are currently
resulting in instability the way the decreased request queue capacity
is, so I decided to make a more focused change to just the HTTP request
queues for the proxies.

[here]: #11055 (comment)

---

* Increase HTTP request queue capacity (linkerd/linkerd2-proxy#2449)
hawkw added a commit that referenced this pull request Aug 9, 2023
In 2.13, the default inbound and outbound HTTP request queue capacity
decreased from 10,000 requests to 100 requests (in PR #2078). This
change results in proxies shedding load much more aggressively while
under high load to a single destination service, resulting in increased
error rates in comparison to 2.12 (see #11055 for
details).

This commit changes the default HTTP request queue capacities for the
inbound and outbound proxies back to 10,000 requests, the way they were
in 2.12 and earlier. In manual load testing I've verified that
increasing the queue capacity results in a substantial decrease in 503
Service Unavailable errors emitted by the proxy: with a queue capacity
of 100 requests, the load test described [here] observed a failure rate
of 51.51% of requests, while with a queue capacity of 10,000 requests,
the same load test observes no failures.

Note that I did not modify the TCP connection queue capacities, or the
control plane request queue capacity. These were previously configured
by the same variable before #2078, but were split out into separate vars
in that change. I don't think the queue capacity limits for TCP
connection establishment or for control plane requests are currently
resulting in instability the way the decreased request queue capacity
is, so I decided to make a more focused change to just the HTTP request
queues for the proxies.

[here]: #11055 (comment)

---

* Increase HTTP request queue capacity (linkerd/linkerd2-proxy#2449)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants