Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USHIFT-1590: skip failing invariants in MicroShift #28193

Merged
merged 5 commits into from Sep 6, 2023

Conversation

pacevedom
Copy link
Contributor

No description provided.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 16, 2023

@pacevedom: This pull request references USHIFT-1590 which is a valid jira issue.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 16, 2023
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 16, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 17, 2023
@pacevedom
Copy link
Contributor Author

/retest-required

@pacevedom
Copy link
Contributor Author

/cc @ingvagabund @dgoodwin @deads2k

Comment on lines 69 to 73
if !isMicroShift {
if err := sampler.TearDownInClusterMonitors(w.adminRESTConfig); err != nil {
return err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !isMicroShift {
if err := sampler.TearDownInClusterMonitors(w.adminRESTConfig); err != nil {
return err
}
}
if isMicroShift {
return nil
}
if err := sampler.TearDownInClusterMonitors(w.adminRESTConfig); err != nil {
return err
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

} else {
infra, err := configClient.ConfigV1().Infrastructures().Get(context.Background(), "cluster", metav1.GetOptions{})
isMicroShift, err := exutil.IsMicroShiftCluster(kubeClient)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use early returns to eliminate nesting. This is hard to follow.

} else {
infra, err := configClient.ConfigV1().Infrastructures().Get(context.Background(), "cluster", metav1.GetOptions{})
isMicroShift, err := exutil.IsMicroShiftCluster(kubeClient)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should become a getPlatformType(restConfig) (platformType, error) to make it more clear. I think this is trying to

  1. fail if a kubeclient cannot be created
  2. if the cluster lacks an infrastructure CRD, return empty, no error
  3. if the cluster has an infrastructure CRD, error if there is no CR instance
  4. if the cluster has a CR instance, return the platform type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -30,6 +31,15 @@ func (w *kubeletLogCollector) CollectData(ctx context.Context, storageDir string
if err != nil {
return nil, nil, err
}
// MicroShift does not have a proper journal for the node logs api.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!

That's unfortunate. How are users going to collect node logs for kubelet bugs.

cc @rphillips

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its under a different service, the journal belongs to microshift which holds the whole control plane logging output, not just kubelet. All the supportability docs are prepared for it (as well as an sos plugin)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the output:

$ oc get --raw "/api/v1/nodes/microshift-dev/proxy/logs"
<pre>
<a href="README">README</a>
<a href="anaconda/">anaconda/</a>
<a href="audit/">audit/</a>
<a href="btmp">btmp</a>
<a href="btmp-20230810">btmp-20230810</a>
<a href="chrony/">chrony/</a>
<a href="containers/">containers/</a>
<a href="crio/">crio/</a>
<a href="cron">cron</a>
<a href="cron-20230704">cron-20230704</a>
<a href="cron-20230810">cron-20230810</a>
<a href="cron-20230817">cron-20230817</a>
<a href="cron-20230824">cron-20230824</a>
<a href="dnf.librepo.log">dnf.librepo.log</a>
<a href="dnf.log">dnf.log</a>
<a href="dnf.log.1">dnf.log.1</a>
<a href="dnf.rpm.log">dnf.rpm.log</a>
<a href="firewalld">firewalld</a>
<a href="hawkey.log">hawkey.log</a>
<a href="hawkey.log-20230704">hawkey.log-20230704</a>
<a href="hawkey.log-20230810">hawkey.log-20230810</a>
<a href="hawkey.log-20230817">hawkey.log-20230817</a>
<a href="hawkey.log-20230824">hawkey.log-20230824</a>
<a href="insights-client/">insights-client/</a>
<a href="kdump.log">kdump.log</a>
<a href="kube-apiserver/">kube-apiserver/</a>
<a href="lastlog">lastlog</a>
<a href="libvirt/">libvirt/</a>
<a href="maillog">maillog</a>
<a href="maillog-20230704">maillog-20230704</a>
<a href="maillog-20230810">maillog-20230810</a>
<a href="maillog-20230817">maillog-20230817</a>
<a href="maillog-20230824">maillog-20230824</a>
<a href="messages">messages</a>
<a href="messages-20230704">messages-20230704</a>
<a href="messages-20230810">messages-20230810</a>
<a href="messages-20230817">messages-20230817</a>
<a href="messages-20230824">messages-20230824</a>
<a href="openvswitch/">openvswitch/</a>
<a href="ovn/">ovn/</a>
<a href="ovn-kubernetes/">ovn-kubernetes/</a>
<a href="pods/">pods/</a>
<a href="private/">private/</a>
<a href="qemu-ga/">qemu-ga/</a>
<a href="rhsm/">rhsm/</a>
<a href="secure">secure</a>
<a href="secure-20230704">secure-20230704</a>
<a href="secure-20230810">secure-20230810</a>
<a href="secure-20230817">secure-20230817</a>
<a href="secure-20230824">secure-20230824</a>
<a href="spooler">spooler</a>
<a href="spooler-20230704">spooler-20230704</a>
<a href="spooler-20230810">spooler-20230810</a>
<a href="spooler-20230817">spooler-20230817</a>
<a href="spooler-20230824">spooler-20230824</a>
<a href="sssd/">sssd/</a>
<a href="swtpm/">swtpm/</a>
<a href="tallylog">tallylog</a>
<a href="wtmp">wtmp</a>
</pre>

Test is looking for journal and then filters by the systemd unit for kubelet. In microshift the kubelet is part of microshift systemd unit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like you need to fix this code then, not simply skip it.

@openshift-merge-robot openshift-merge-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 31, 2023
@pacevedom
Copy link
Contributor Author

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 5, 2023

@pacevedom: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-openstack-ovn 13857ff link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-single-node-upgrade 13857ff link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-gcp-ovn-rt-upgrade 13857ff link false /test e2e-gcp-ovn-rt-upgrade
ci/prow/e2e-metal-ipi-sdn 13857ff link false /test e2e-metal-ipi-sdn
ci/prow/e2e-aws-ovn-single-node-serial 13857ff link false /test e2e-aws-ovn-single-node-serial

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@neisw
Copy link
Contributor

neisw commented Sep 5, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 5, 2023
@pacevedom
Copy link
Contributor Author

/label acknowledge-critical-fixes-only

@openshift-ci openshift-ci bot added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Sep 5, 2023
return platform, fmt.Errorf("error checking MicroShift cluster: %v", err)
}
if isMicroShift {
return platform, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be empty, not nil, correct? You sure that's what you want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the platform is only needed when deploying on Azure, and thats not the case for MicroShift.

@deads2k
Copy link
Contributor

deads2k commented Sep 5, 2023

This has made obvious, but will not address shortcomings for addressing CI failures. I'm willing to allow it, but microshift failure resolution will suffer for lack of this.

/approve
/hold

holding for @pacevedom to acknowledge the shortcoming. You may release on acknowledgement without a fix if you wish.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 5, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 5, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, neisw, pacevedom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 5, 2023
@pacevedom
Copy link
Contributor Author

As we already talked about it over Slack, posting here a summary.
The shortcomings account for addressing failures in CI and how to debug them. Since MicroShift does not work/is not deployed like OpenShift there are a few key differences.
In this case, for example, the logs for kubelet and other components are to be found in the journal for microshift itself, not in /var/log/ which is where one of the tests is looking.
To address these limitations, a step in CI is included to run an sos report including microshift specific plugins.
Will work in the future to try to overcome some of these limitations where it makes sense.

/hold cancel because of the reasons above.

@pacevedom
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 6, 2023
@openshift-merge-robot openshift-merge-robot merged commit c3d0116 into openshift:master Sep 6, 2023
17 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants