Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove pkg/controller/ready #733

Merged
merged 1 commit into from
Apr 26, 2021

Conversation

imjasonh
Copy link
Contributor

This package wrote a file to report that the controller was live and ready, which was unnecessary.

This change also removes some unused code in pkg/controller/controller.go

Fixes #732

/kind cleanup

Submitter Checklist

  • [n/a] Includes tests if functionality changed/was added
  • [n/a] Includes docs if changes are user-facing
  • [y] Set a kind label on this PR
  • [y] Release notes block has been filled in, or marked NONE

See the contributor guide
for details on coding conventions, github and prow interactions, and the code review process.

Release Notes

The controller no longer configures a readiness probe or liveness probe.

/assign @SaschaSchwarze0

@openshift-ci-robot openshift-ci-robot added release-note Label for when a PR has specified a release note kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Apr 14, 2021
Comment on lines 35 to 48
livenessProbe:
exec:
command:
- stat
- /tmp/shipwright-build-ready
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
exec:
command:
- stat
- /tmp/shipwright-build-ready
initialDelaySeconds: 5
periodSeconds: 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an alternative using the metrics endpoint:

        ports:
        - containerPort: 8383
          name: metrics-port
        livenessProbe:
          httpGet:
            path: /metrics
            port: metrics-port
          initialDelaySeconds: 5
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /metrics
            port: metrics-port
          initialDelaySeconds: 5
          periodSeconds: 10

We will likely need to use this downstream as we require a probe. It is probably as good as the Tekton solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you require both a liveness probe and a readiness probe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know, would need to check. But, you are right, we probably should here only define the liveness probe because there is no service defined. In downstream we have our own deployment yaml anyway, so, if we would have to specify a readiness probe, we can always deviate from what is defined in upstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have your own downstream deployment anyway, I think I'd prefer to remove both probes here for now, until we decide there's real value in adding them. Does that sound okay to you?

Copy link
Member

@SaschaSchwarze0 SaschaSchwarze0 Apr 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am okay. The risk then is that we use something in downstream (a probe based on the metrics endpoint) that is not "tested" upstream. But, if it would stop working (for whatever reason) and not being repairable, then I think we could fairly quickly introduce a new basic liveness endpoint like Tekton does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need both probes for some required standards/policies downstream. I will prefer to keep both in upstream so that we can continuously test them, if any vendor downstream does not require them, then they should remove it. PR looks good for me.

This package wrote a file to report that the controller was live and
ready, which was unnecessary. Instead, we'll use the /metrics endpoint
to determine liveness and readiness.

This change also removes some unused code in
pkg/controller/controller.go
@xiujuan95
Copy link
Contributor

Hi, @imjasonh maybe just for your info, I am not sure if liveness/readiness probes can work well for non-leader pods or not when we use metrics endpoint do these two checks.

Tekton-controller is using metrics endpoint to check liveness/readiness and it works normally. This is because tekton is using active/active HA mode for leader-election. That means each tekton controller pod can be the leader, only different pods hold different buckets at the same time.

However, for build side, we are using Leader-with-lease HA mode. And all requests are held by a unique controller pod. Other pods are stand by. Also before we have an issue when we use metrics endpoint to do liveness/readiness probes: #276.

@SaschaSchwarze0
Copy link
Member

Hi, @imjasonh maybe just for your info, I am not sure if liveness/readiness probes can work well for non-leader pods or not when we use metrics endpoint do these two checks.

Tekton-controller is using metrics endpoint to check liveness/readiness and it works normally. This is because tekton is using active/active HA mode for leader-election. That means each tekton controller pod can be the leader, only different pods hold different buckets at the same time.

However, for build side, we are using Leader-with-lease HA mode. And all requests are held by a unique controller pod. Other pods are stand by. Also before we have an issue when we use metrics endpoint to do liveness/readiness probes: #276.

@xiujuan95 in our downstream dev environment, the build controller is running with the httpGet based probes since yesterday and there has not been an alert yet = the probes must be working. You may also check the metrics endpoint of non-leading pods in our environment. They are present (they just do not contain build-related metrics which is expected).

@qu1queee qu1queee self-requested a review April 21, 2021 08:57
@qu1queee
Copy link
Contributor

@xiujuan95 its a good point, I was also wondering why it works now. I think something changed in the controller-runtime pkg that allowed us to serve the /metrics endpoint in both the leader and passive pods.

Copy link
Contributor

@qu1queee qu1queee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 21, 2021
@qu1queee
Copy link
Contributor

@imjasonh @SaschaSchwarze0 whats the state of this PR?

Copy link
Member

@SaschaSchwarze0 SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SaschaSchwarze0

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 26, 2021
@openshift-merge-robot openshift-merge-robot merged commit abad4c3 into shipwright-io:master Apr 26, 2021
@adambkaplan adambkaplan added this to the release-v0.5.0 milestone Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm Indicates that a PR is ready to be merged. release-note Label for when a PR has specified a release note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove pkg/controller/ready
7 participants