Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent: smarter readiness check #485

Merged

Conversation

ipochi
Copy link
Contributor

@ipochi ipochi commented Apr 12, 2023

This commit introduces a smarter way for the readiness check.

Currently as soon as the agent is up, it signals itself as ready regardless of whether its actually connected to the server or not.

This commit introduces a callback function and a connected field in ClientSet struct to update the status of the agent -> proxy-server connection at regular intervals.

Concern: This makes the assumption that the client is connected to at least one proxy server and therefore the status is ready.

I've thought about updating the readiness of the agent only when all proxy-servers are connected but wasn't sure about the implications from the kubernetes side, since this being a readiness check , kubernetes might treat the pod as not ready and not send any traffic.

EDIT: Since the agent pods are not behind a Kubernetes service, the readiness of the pod doesn't really matter.

Please share your reviews,concerns to the approach I've taken.

/cc: @jveski @cheftako @jkh52 @tallclair

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 12, 2023
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 12, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @ipochi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 12, 2023
@jkh52
Copy link
Contributor

jkh52 commented Apr 14, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 14, 2023
cmd/agent/app/server.go Outdated Show resolved Hide resolved
cmd/agent/app/server.go Outdated Show resolved Hide resolved
pkg/agent/clientset.go Outdated Show resolved Hide resolved
@ipochi ipochi force-pushed the imran/smarter-readiness-check branch from 135bc49 to a61723a Compare June 1, 2023 13:01
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 1, 2023
@ipochi
Copy link
Contributor Author

ipochi commented Jun 1, 2023

/test pull-apiserver-network-proxy-test

cmd/agent/app/server.go Outdated Show resolved Hide resolved
pkg/agent/readiness.go Outdated Show resolved Hide resolved
pkg/agent/readiness.go Outdated Show resolved Hide resolved
tests/readiness_test.go Outdated Show resolved Hide resolved
tests/readiness_test.go Outdated Show resolved Hide resolved
@jkh52
Copy link
Contributor

jkh52 commented Jun 1, 2023

This looks pretty close.

@mihivagyok can you verify that this meets the requirements of #491?

@ipochi ipochi force-pushed the imran/smarter-readiness-check branch from a61723a to 6fd7d33 Compare June 2, 2023 05:43
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 2, 2023
@ipochi ipochi force-pushed the imran/smarter-readiness-check branch from 6fd7d33 to 1b6c4be Compare June 2, 2023 05:44
tests/readiness_test.go Outdated Show resolved Hide resolved
tests/readiness_test.go Outdated Show resolved Hide resolved
@jkh52
Copy link
Contributor

jkh52 commented Jul 26, 2023

/retest

@@ -117,6 +119,7 @@ func (o *GrpcProxyAgentOptions) Flags() *pflag.FlagSet {
flags.StringVar(&o.AgentIdentifiers, "agent-identifiers", o.AgentIdentifiers, "Identifiers of the agent that will be used by the server when choosing agent. N.B. the list of identifiers must be in URL encoded format. e.g.,host=localhost&host=node1.mydomain.com&cidr=127.0.0.1/16&ipv4=1.2.3.4&ipv4=5.6.7.8&ipv6=:::::&default-route=true")
flags.BoolVar(&o.WarnOnChannelLimit, "warn-on-channel-limit", o.WarnOnChannelLimit, "Turns on a warning if the system is going to push to a full channel. The check involves an unsafe read.")
flags.BoolVar(&o.SyncForever, "sync-forever", o.SyncForever, "If true, the agent continues syncing, in order to support server count changes.")
flags.BoolVar(&o.EnableAgentServerReadiness, "agent-server-readiness", o.EnableAgentServerReadiness, "If true, the agent readiness probe checks if its connected to atleast one server.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flags.BoolVar(&o.EnableAgentServerReadiness, "agent-server-readiness",

Did you add this flag based on a misunderstanding with @cheftako comments (about changing liveness behavior)? I don't remember saying that we need a flag to enable the new readiness behavior (I am ok with not flag guarding the readiness change).

If we keep the flag: since this is the agent binary, "agent" in name seems somewhat redundant. Maybe "--enable-connected-readiness" or similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think there was a confusion on that. I too don't think a flag is necessary.
I'm happy to remove the commit related to addition of the flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the commit.

readinessHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "ok")
var failedChecks []string
Copy link
Contributor

@jkh52 jkh52 Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is very close to fully generic. It would be nice to organize it somewhere different than cmd/agent to keep that minimal. It would also be good to add some unit test coverage (for example: exercise the verbose codepath).

I'm ok with deferring these suggestions, to unblock making the feature available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, I'd like to get this merged as is.

The addition of unit tests, while useful would require more thoughts and time and the result would be a very trivial check and not worth the effort and time.

This commit introduces a smarter way for the readiness check.

Currently as soon as the agent is up, it signals itself as ready
regardless of whether its actually connected to the server or not.

This commit introduces a readiness check in the Agent which reports
true if its connected to atleast one proxy server.

Signed-off-by: Imran Pochi <imranpochi@microsoft.com>
@ipochi
Copy link
Contributor Author

ipochi commented Aug 8, 2023

@jkh52 @cheftako

friendly ping. Shall we merge this ?

@jkh52
Copy link
Contributor

jkh52 commented Aug 8, 2023

@jkh52 @cheftako

friendly ping. Shall we merge this ?

Did you see the feedback: But can you make small adjustments to make it more obvious to future readers? For example, only introduce readiness.go (not healthz.go), and avoid code naming like "HealthZ" (which some readers may think implies liveness health).

I'm actually OK with merging as-is, and we can always address the above separately. Adding a hold just so that @cheftako can also look.

/hold
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Aug 8, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ipochi, jkh52

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Imran Pochi <imranpochi@microsoft.com>
@ipochi ipochi force-pushed the imran/smarter-readiness-check branch from f5170c1 to 581b1f8 Compare August 9, 2023 14:09
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 9, 2023
@jkh52
Copy link
Contributor

jkh52 commented Aug 9, 2023

/lgtm

Feel free to remove the hold in a day or two, if no other reviewers have feedback.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 9, 2023
@ipochi
Copy link
Contributor Author

ipochi commented Aug 9, 2023

/lgtm

Feel free to remove the hold in a day or two, if no other reviewers have feedback.

@jkh52 How does one remove hold ?

@jkh52
Copy link
Contributor

jkh52 commented Aug 9, 2023

/lgtm
Feel free to remove the hold in a day or two, if no other reviewers have feedback.

@jkh52 How does one remove hold ?

I believe it is "/remove-hold"

@ipochi
Copy link
Contributor Author

ipochi commented Aug 10, 2023

/remove-hold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 10, 2023
@k8s-ci-robot k8s-ci-robot merged commit 218753a into kubernetes-sigs:master Aug 10, 2023
6 checks passed
@ipochi ipochi deleted the imran/smarter-readiness-check branch August 10, 2023 14:21
@jkh52 jkh52 mentioned this pull request Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants