Remove pkg/controller/ready #733

imjasonh · 2021-04-14T15:40:59Z

This package wrote a file to report that the controller was live and ready, which was unnecessary.

This change also removes some unused code in pkg/controller/controller.go

Fixes #732

/kind cleanup

Submitter Checklist

[n/a] Includes tests if functionality changed/was added
[n/a] Includes docs if changes are user-facing
[y] Set a kind label on this PR
[y] Release notes block has been filled in, or marked NONE

See the contributor guide
for details on coding conventions, github and prow interactions, and the code review process.

Release Notes

The controller no longer configures a readiness probe or liveness probe.

/assign @SaschaSchwarze0

SaschaSchwarze0 · 2021-04-15T09:16:31Z

deploy/500-controller.yaml

-          livenessProbe:
-            exec:
-              command:
-                - stat
-                - /tmp/shipwright-build-ready
-            initialDelaySeconds: 5
-            periodSeconds: 10
-          readinessProbe:
-            exec:
-              command:
-                - stat
-                - /tmp/shipwright-build-ready
-            initialDelaySeconds: 5
-            periodSeconds: 10


Here is an alternative using the metrics endpoint:

ports: - containerPort: 8383 name: metrics-port livenessProbe: httpGet: path: /metrics port: metrics-port initialDelaySeconds: 5 periodSeconds: 10 readinessProbe: httpGet: path: /metrics port: metrics-port initialDelaySeconds: 5 periodSeconds: 10

We will likely need to use this downstream as we require a probe. It is probably as good as the Tekton solution.

Do you require both a liveness probe and a readiness probe?

I do not know, would need to check. But, you are right, we probably should here only define the liveness probe because there is no service defined. In downstream we have our own deployment yaml anyway, so, if we would have to specify a readiness probe, we can always deviate from what is defined in upstream.

If you have your own downstream deployment anyway, I think I'd prefer to remove both probes here for now, until we decide there's real value in adding them. Does that sound okay to you?

I think I am okay. The risk then is that we use something in downstream (a probe based on the metrics endpoint) that is not "tested" upstream. But, if it would stop working (for whatever reason) and not being repairable, then I think we could fairly quickly introduce a new basic liveness endpoint like Tekton does.

We need both probes for some required standards/policies downstream. I will prefer to keep both in upstream so that we can continuously test them, if any vendor downstream does not require them, then they should remove it. PR looks good for me.

This package wrote a file to report that the controller was live and ready, which was unnecessary. Instead, we'll use the /metrics endpoint to determine liveness and readiness. This change also removes some unused code in pkg/controller/controller.go

xiujuan95 · 2021-04-16T02:31:37Z

Hi, @imjasonh maybe just for your info, I am not sure if liveness/readiness probes can work well for non-leader pods or not when we use metrics endpoint do these two checks.

Tekton-controller is using metrics endpoint to check liveness/readiness and it works normally. This is because tekton is using active/active HA mode for leader-election. That means each tekton controller pod can be the leader, only different pods hold different buckets at the same time.

However, for build side, we are using Leader-with-lease HA mode. And all requests are held by a unique controller pod. Other pods are stand by. Also before we have an issue when we use metrics endpoint to do liveness/readiness probes: #276.

SaschaSchwarze0 · 2021-04-16T06:09:58Z

Hi, @imjasonh maybe just for your info, I am not sure if liveness/readiness probes can work well for non-leader pods or not when we use metrics endpoint do these two checks.

Tekton-controller is using metrics endpoint to check liveness/readiness and it works normally. This is because tekton is using active/active HA mode for leader-election. That means each tekton controller pod can be the leader, only different pods hold different buckets at the same time.

However, for build side, we are using Leader-with-lease HA mode. And all requests are held by a unique controller pod. Other pods are stand by. Also before we have an issue when we use metrics endpoint to do liveness/readiness probes: #276.

@xiujuan95 in our downstream dev environment, the build controller is running with the httpGet based probes since yesterday and there has not been an alert yet = the probes must be working. You may also check the metrics endpoint of non-leading pods in our environment. They are present (they just do not contain build-related metrics which is expected).

qu1queee · 2021-04-21T10:49:59Z

@xiujuan95 its a good point, I was also wondering why it works now. I think something changed in the controller-runtime pkg that allowed us to serve the /metrics endpoint in both the leader and passive pods.

qu1queee

/lgtm

qu1queee · 2021-04-23T10:25:05Z

@imjasonh @SaschaSchwarze0 whats the state of this PR?

SaschaSchwarze0

/approve

openshift-ci-robot · 2021-04-26T11:28:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SaschaSchwarze0

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [SaschaSchwarze0]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot assigned SaschaSchwarze0 Apr 14, 2021

openshift-ci-robot added release-note Label for when a PR has specified a release note kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Apr 14, 2021

openshift-ci-robot requested review from adambkaplan and gabemontero April 14, 2021 15:41

SaschaSchwarze0 reviewed Apr 15, 2021

View reviewed changes

Remove pkg/controller/ready

c3d73a8

This package wrote a file to report that the controller was live and ready, which was unnecessary. Instead, we'll use the /metrics endpoint to determine liveness and readiness. This change also removes some unused code in pkg/controller/controller.go

imjasonh force-pushed the remove-ready branch from a829ac2 to c3d73a8 Compare April 15, 2021 13:45

qu1queee self-requested a review April 21, 2021 08:57

qu1queee approved these changes Apr 21, 2021

View reviewed changes

openshift-ci-robot assigned qu1queee Apr 21, 2021

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 21, 2021

SaschaSchwarze0 approved these changes Apr 26, 2021

View reviewed changes

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 26, 2021

openshift-merge-robot merged commit abad4c3 into shipwright-io:master Apr 26, 2021

adambkaplan added this to the release-v0.5.0 milestone Jun 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove pkg/controller/ready #733

Remove pkg/controller/ready #733

imjasonh commented Apr 14, 2021

SaschaSchwarze0 Apr 15, 2021

imjasonh Apr 15, 2021

SaschaSchwarze0 Apr 15, 2021

imjasonh Apr 15, 2021

SaschaSchwarze0 Apr 19, 2021 •

edited

Loading

qu1queee Apr 21, 2021

xiujuan95 commented Apr 16, 2021

SaschaSchwarze0 commented Apr 16, 2021

qu1queee commented Apr 21, 2021

qu1queee left a comment

qu1queee commented Apr 23, 2021

SaschaSchwarze0 left a comment

openshift-ci-robot commented Apr 26, 2021

Remove pkg/controller/ready #733

Remove pkg/controller/ready #733

Conversation

imjasonh commented Apr 14, 2021

Submitter Checklist

Release Notes

SaschaSchwarze0 Apr 15, 2021

Choose a reason for hiding this comment

imjasonh Apr 15, 2021

Choose a reason for hiding this comment

SaschaSchwarze0 Apr 15, 2021

Choose a reason for hiding this comment

imjasonh Apr 15, 2021

Choose a reason for hiding this comment

SaschaSchwarze0 Apr 19, 2021 • edited Loading

Choose a reason for hiding this comment

qu1queee Apr 21, 2021

Choose a reason for hiding this comment

xiujuan95 commented Apr 16, 2021

SaschaSchwarze0 commented Apr 16, 2021

qu1queee commented Apr 21, 2021

qu1queee left a comment

Choose a reason for hiding this comment

qu1queee commented Apr 23, 2021

SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

openshift-ci-robot commented Apr 26, 2021

SaschaSchwarze0 Apr 19, 2021 •

edited

Loading