Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug 1878129: add node collection for oc adm inspect #2081

Merged
merged 1 commit into from
Sep 18, 2020

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Sep 14, 2020

MCO holds kubelet config, which leads to node creation. This adds node collection for related resources because the MCO configuration has a direct impact on the node resources.

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Sep 14, 2020
@openshift-ci-robot
Copy link
Contributor

@deads2k: This pull request references Bugzilla bug 1878129, which is invalid:

  • expected the bug to target the "4.6.0" release, but it targets "4.7.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

bug 1878129: add node collection for oc adm inspect

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yuqi-zhang
Copy link
Contributor

Just to make sure, this should effectively only affect must-gather (via oc adm inspect clusteroperators) and not any other MCO functionality right?

Does this also mean the MCO is now the effective "owner" of a node object? In normal cluster operation does anyone "sync" node objects?

@kikisdeliveryservice
Copy link
Contributor

It doesn't really make sense to me to have the nodes grouped under MCO instead of a first class resource like @cgwalters saidin the BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1878129#c6

I think this needs more discussion bc this doesn't feel like the right choice.

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 14, 2020
@deads2k
Copy link
Contributor Author

deads2k commented Sep 15, 2020

Does this also mean the MCO is now the effective "owner" of a node object? In normal cluster operation does anyone "sync" node objects?

It doesn't mean that at all. This just means the data is gathered so that the node team can debug.

Just to make sure, this should effectively only affect must-gather (via oc adm inspect clusteroperators) and not any other MCO functionality right?

correct.

@deads2k
Copy link
Contributor Author

deads2k commented Sep 15, 2020

It doesn't really make sense to me to have the nodes grouped under MCO instead of a first class resource like @cgwalters saidin the BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1878129#c6

Resources are inspected and related resources gathered. The method to hardcode collection is to add them to a relatedresource. Doing it in generic client handling (used by RHACM for instance or for inspecting a customer namespace) does not make sense.

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 15, 2020
@deads2k
Copy link
Contributor Author

deads2k commented Sep 15, 2020

/retest

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its only for collection purposes I am +1

Also the BZ is targetted 4.7, did you want this in 4.6

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 15, 2020
@deads2k
Copy link
Contributor Author

deads2k commented Sep 15, 2020

If its only for collection purposes I am +1

Also the BZ is targetted 4.7, did you want this in 4.6

Yes. Looks like someone may not have understood the impact and ease of the change and moved it out.

@cgwalters
Copy link
Member

Are there written down rules around what data oc adm inspect should gather? Node objects today include the pulled images, which could indirectly leak some customer data potentially. Maybe that happens elsewhere too.

Anyways tentative
/approve
but I still wonder whether this should just be something that oc adm inspect gathers on its own by default, particularly if it's just oc adm inspect clusteroperator and not specifically oc adm inspect clusteroperator/machine-config.

(On the flip side though debugging a lot of MCO problems is going to really want the node objects too)

@deads2k
Copy link
Contributor Author

deads2k commented Sep 15, 2020

Are there written down rules around what data oc adm inspect should gather? Node objects today include the pulled images, which could indirectly leak some customer data potentially. Maybe that happens elsewhere too.

The enhancement describes what is collected: https://github.com/openshift/enhancements/blob/master/enhancements/oc/inspect.md#proposal

It also includes information about what data is elided from output. Since the images would also be listed in the pods themselves, I don't see image metadata as different.

@deads2k
Copy link
Contributor Author

deads2k commented Sep 15, 2020

but I still wonder whether this should just be something that oc adm inspect gathers on its own by default, particularly if it's just oc adm inspect clusteroperator and not specifically oc adm inspect clusteroperator/machine-config.

Inspect is a tool for pulling information related to what is requested. That relationship should be described by the resources themselves, especially since usage like oc adm inspect ns/foo is a valid, and very useful use-case for someone trying to get information about an NS.

We will not add information about particular resources for particular operators, since the operators have a structured spot to include the relationship.

If the MCO isn't able to add this information, group-b can provide a stop-gap, but it would be really disappointing if an additional piece of debugging information, most tightly coupled to the MCO wasn't able to be added where it is needed

@cgwalters
Copy link
Member

It also includes information about what data is elided from output. Since the images would also be listed in the pods themselves, I don't see image metadata as different.

I don't quite see this listed in the enhancement but basically - we're not gathering user projects (i.e. non-openshift-* namespaces) and the pods from those right? But enumerating nodes includes images from user pods too (images I think would generally be less concerning than pods but still)

(This data probably leaks into other operators or pod logs; it's not quite clear to me that oc adm inspect explicitly has this as a goal, but I think it should)

@cgwalters
Copy link
Member

/lgtm
based on above and Jerry's approval
/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@cgwalters: This pull request references Bugzilla bug 1878129, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

/lgtm
based on above and Jerry's approval
/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Sep 16, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, deads2k, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 16, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

10 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit bf97141 into openshift:master Sep 18, 2020
@openshift-ci-robot
Copy link
Contributor

@deads2k: All pull requests linked via external trackers have merged:

Bugzilla bug 1878129 has been moved to the MODIFIED state.

In response to this:

bug 1878129: add node collection for oc adm inspect

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants