Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsonnet/control-plane.libsonnet: Remove etcd rules #1233

Merged
merged 4 commits into from Jun 23, 2021

Conversation

lilic
Copy link
Contributor

@lilic lilic commented Jun 21, 2021

As per request we are moving the rules to CEO -> openshift/cluster-etcd-operator#613.

@openshift/openshift-team-monitoring please take a look.
cc @hexfusion

@lilic
Copy link
Contributor Author

lilic commented Jun 21, 2021

make: *** [assets/control-plane/etcd-prometheus-rule.yaml] Error 1
{"component":"entrypoint","error":"wrapped process failed: exit status 2","file":"prow/entrypoint/run.go:80","func":"k8s.io/test-infra/prow/entrypoint.Options.Run","level":"error","msg":"Error executing test process","severity":"error","time":"2021-06-21T12:25:02Z"}

🤷‍♀️ didn't get a chance to look into this yet, if someone has a hint?

@simonpasquier
Copy link
Contributor

Hmm it seems to fail on assets/control-plane/etcd-prometheus-rule.yaml which are exactly the rules you're removing, right?

@lilic
Copy link
Contributor Author

lilic commented Jun 21, 2021

Is it maybe checking

ControlPlaneEtcdPrometheusRule = "control-plane/etcd-prometheus-rule.yaml"
, and need to remove this as well?

@simonpasquier
Copy link
Contributor

@lilic you've spotted it :)

@simonpasquier
Copy link
Contributor

ASSETS=$(shell grep -oh '[^"]*/.*\.yaml' pkg/manifests/manifests.go | sed 's/^/assets\//')

@lilic
Copy link
Contributor Author

lilic commented Jun 21, 2021

Ah nice, that explains it!

@lilic
Copy link
Contributor Author

lilic commented Jun 21, 2021

I guess we need to delete the ControlPlaneEtcdPrometheusRule first if it exists.

@lilic
Copy link
Contributor Author

lilic commented Jun 21, 2021

@simonpasquier please take another look, thanks!

@@ -1428,46 +1428,6 @@ func TestPrometheusK8sControlPlaneRulesFiltered(t *testing.T) {
}
}

func TestPrometheusEtcdRulesFiltered(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No clue what this is, I think its legacy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, not completely until the etcd service monitor moves to CEO too :)
But yes it's fine to remove the tests.

if caFound && len(caContent) > 0 &&
certFound && len(certContent) > 0 &&
keyFound && len(keyContent) > 0 {
trueBool := true
c.ClusterMonitoringConfiguration.EtcdConfig.Enabled = &trueBool
}

@simonpasquier
Copy link
Contributor

/lgtm
/hold
Holding to let @lilic merge this whenever you want/need.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 21, 2021
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 21, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 21, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lilic, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 21, 2021
@lilic
Copy link
Contributor Author

lilic commented Jun 22, 2021

Watchdog alert had missing intervals during the run, which may be a sign of a Prometheus outage in violation of the prometheus query SLO of 100% uptime during upgrade

/retest

@lilic
Copy link
Contributor Author

lilic commented Jun 22, 2021

/retest

unrelated failures

@lilic
Copy link
Contributor Author

lilic commented Jun 22, 2021

alert KubePodCrashLooping fired for 2209 seconds with labels: {container="thanos-query", endpoint="https-main", job="kube-state-metrics", namespace="openshift-monitoring", pod="thanos-querier-d4d687cc-g4fzq", service="kube-state-metrics", severity="warning"}

I don't think its related to my change, but this keeps firing. 🤔

@lilic
Copy link
Contributor Author

lilic commented Jun 22, 2021

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 22, 2021

@lilic: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-single-node 9719e18 link /test e2e-aws-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@lilic
Copy link
Contributor Author

lilic commented Jun 23, 2021

/retest

@lilic
Copy link
Contributor Author

lilic commented Jun 23, 2021

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 23, 2021
@openshift-merge-robot openshift-merge-robot merged commit 2c7394e into openshift:master Jun 23, 2021
@lilic lilic deleted the remove-etcd branch June 23, 2021 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants