Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1809353: Exclude Kubernetes control plane rules when running on IBM Cloud #705

Merged

Conversation

csrwng
Copy link
Contributor

@csrwng csrwng commented Mar 12, 2020

If the current cluster's platform is IBMCloud, Kubernetes control plane rules are skipped.

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 12, 2020
@csrwng
Copy link
Contributor Author

csrwng commented Mar 12, 2020

Alternate implementation to #687
@s-urbaniak ptal
/cc @derekwaynecarr

@s-urbaniak
Copy link
Contributor

/retest

@s-urbaniak
Copy link
Contributor

/test e2e-aws-operator

@s-urbaniak
Copy link
Contributor

/assign brancz ptal, this is the short-term solution that was ratified by @derekwaynecarr.
/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 13, 2020
@lilic
Copy link
Contributor

lilic commented Mar 13, 2020

/assign @brancz

schedulerRulesFound := false
for _, g := range r.Spec.Groups {
switch g.Name {
case "kubernetes-system-apiserver":
Copy link
Contributor

@brancz brancz Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about etcd? and what about the servicemonitors that are provisioned to monitor these in the first place, they will now show "down" targets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Service monitors that are part of control plane operators are skipped via annotation: https://github.com/openshift/cluster-kube-apiserver-operator/blob/4de583dffa47c25adde871e7361386b0d55c674a/manifests/0000_90_kube-apiserver-operator_03_servicemonitor.yaml#L7
Etcd rules are disabled because the etcd-metric-client secret is not placed in the openshift-config namespace:

s, err := o.client.GetSecret("openshift-config", "etcd-metric-client")
if err != nil {
klog.Warningf("Error loading etcd client secrets for Prometheus. Proceeding with etcd disabled. Error: %v", err)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see ok.

@@ -219,6 +220,10 @@ var (
PrometheusTrustedCABundlePath = PrometheusTrustedCABundleDir + "ca-bundle.crt"
)

var (
IBMCloudPlatformType configv1.PlatformType = "IBMCloud"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't redefine constant as vars, that's horribly dangerous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx! fixed

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 16, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 17, 2020
@brancz
Copy link
Contributor

brancz commented Mar 17, 2020

/retest

1 similar comment
@csrwng
Copy link
Contributor Author

csrwng commented Mar 17, 2020

/retest

@brancz
Copy link
Contributor

brancz commented Mar 18, 2020

There is mostly a long term maintenance question here. The team is already way beyond full capacity, and this adds yet another dimension of bugs that are inevitably going to occur. Who takes care of that?

@brancz
Copy link
Contributor

brancz commented Mar 18, 2020

Can we have a ROKS component in bugzilla for example that these types of bugs would be assigned to?

@s-urbaniak
Copy link
Contributor

also cc'ing @bparees to consult if existing components/repositories like cluster-monitoring-operator can be assigned to this type of initiatives like ROKS and not our team. I didn't find a ROKS component in bugzilla.

@bparees
Copy link
Contributor

bparees commented Mar 18, 2020

Can we have a ROKS component in bugzilla for example that these types of bugs would be assigned to?

i don't know that we need a ROKS BZ component, but it sounds like we need a ROKS-enablement cross-team intiative with an identified owner and tracking jira stories.

imho this is similar to any other cloud provider enablement, though i realize there's significantly more structual differences in ROKS vs just another cloud provider.

I don't view this change as a bug fix.

@derekwaynecarr
Copy link
Member

@bparees see epic and jira tracking:

https://issues.redhat.com/browse/CO-716

an agreed upon solution is required this week and needs backport to 4.3.z per our discussion last week. we are happy to improve testing in any form for any managed service as a follow-on to ensure regressions do not occur. this could include managed service specific testing scenarios in existing openshift e2e.

@bparees
Copy link
Contributor

bparees commented Mar 18, 2020

Turns out there is a jira epic for this: https://issues.redhat.com/browse/CO-716 and i think this work is specifically tied to https://issues.redhat.com/browse/CO-756

but i'd reiterate that this is basically another platform so having its own BZ component doesn't make sense.

@lilic
Copy link
Contributor

lilic commented Mar 18, 2020

but i'd reiterate that this is basically another platform so having its own BZ component doesn't make sense.

Where do you think it should go then? Not under monitoring :)

Note that we are not against this feature or merging this PR. We only want to ensure that we do not support this feature as we go onwards, or the backports and there is a component we can reassign any bugzillas we might potentially get.

@s-urbaniak
Copy link
Contributor

The problem from our side is long term maintainability. If, at any point in time, a BZ is opened that something doesn't work in a ROKS environment, we are effectively flying blind to fix it.

Having a ROKS component owner/team would solve that specific concern. Someone who can verify and fix bugs in that environment.

@bparees
Copy link
Contributor

bparees commented Mar 18, 2020

@lilic @s-urbaniak let's take this discussion offline

@s-urbaniak
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 18, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng, s-urbaniak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 18, 2020
@csrwng
Copy link
Contributor Author

csrwng commented Mar 18, 2020

/retitle Bug 1809353: Exclude Kubernetes control plane rules when running on IBM Cloud

@openshift-ci-robot openshift-ci-robot changed the title Exclude Kubernetes control plane rules when running on IBM Cloud Bug 1809353: Exclude Kubernetes control plane rules when running on IBM Cloud Mar 18, 2020
@openshift-ci-robot
Copy link
Contributor

@csrwng: This pull request references Bugzilla bug 1809353, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.0) matches configured target release for branch (4.5.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1809353: Exclude Kubernetes control plane rules when running on IBM Cloud

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Mar 18, 2020
@csrwng
Copy link
Contributor Author

csrwng commented Mar 18, 2020

/hold cancel

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@csrwng
Copy link
Contributor Author

csrwng commented Mar 18, 2020

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants