Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1995785: crio: complete crio default config #2723

Merged

Conversation

haircommander
Copy link
Member

@haircommander haircommander commented Aug 19, 2021

Instead of relying on /etc/crio/crio.conf (shipped in the cri-o rpm) for fields, we should import all relevant fields into MCO.
This will allow us to eventaully use MCO to remove /etc/crio/crio.conf (and have the rpm stop shipping it).

We do this because there exists a feature where rpm-ostree pins a file if it was ever managed by an RPM, and doesn't update it.
however, we want to be able to change crio configuration post-installation, and having a pinned crio config could interfere with that.

Finally, even though conmon path ("conmon" option) is set to the default, we include it here, to work around the aforementioned bug manifesting
in CRI-O failing to come up.

Signed-off-by: Peter Hunt pehunt@redhat.com

- What I did

- How to verify it

- Description for the changelog

@haircommander
Copy link
Member Author

relevant line is

./bin/%{service_name} \
      --selinux \ 
      --cgroup-manager "systemd" \
      --storage-driver "overlay" \
      --cni-plugin-dir "%{_libexecdir}/cni" \
      --storage-opt "overlay.override_kernel_check=1" \
      --metrics-port 9537 \
      --enable-metrics \
      config > %{service_name}.conf


this enables all not currently enabled items in that command

@cgwalters
Copy link
Member

For completeness this is related to https://bugzilla.redhat.com/show_bug.cgi?id=1993385#c3

@mrunalp
Copy link
Member

mrunalp commented Aug 19, 2021

This looks fine but the commit message could use some more detail.

@haircommander haircommander force-pushed the crio-complete-config branch 2 times, most recently from aa8bad7 to 52f42dc Compare August 19, 2021 18:19
Instead of relying on /etc/crio/crio.conf (shipped in the cri-o rpm) for fields, we should import all relevant fields into MCO.
This will allow us to eventaully use MCO to remove /etc/crio/crio.conf (and have the rpm stop shipping it).

We do this because there exists a strange interaction with rpm-ostree and old version of MCO, where all files in `/etc/ are treated as `%config(noreplace)`.
In this case, from ostree's PoV the file was "manually" modified (by the MCO), then the MCO stopped modifying it. ostree doesn't know that though, so the file became "unmanaged state".
However, this unmanaged state causes updates to the rpm to not translate to the disk, and there remains a stale `conmon = "/usr/libexec/crio/conmon"` line in /etc/crio/crio.conf,
which causes nodes to not come up.

Instead of allowing nodes to not come up, we populate the MCO template with all relevant fields in the cri-o rpm (including "conmon" option, even though it's being set to the default),
which will override the stale config, and allow us to eventually remove it from the rpm and remove it from disk in MCO.

Signed-off-by: Peter Hunt <pehunt@redhat.com>
@mrunalp
Copy link
Member

mrunalp commented Aug 19, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 19, 2021
@cgwalters
Copy link
Member

I built a 4.9 image for this as registry.ci.openshift.org/coreos/openshift49-bz1993385-followup

@cgwalters
Copy link
Member

cgwalters commented Aug 19, 2021

We do this because there exists a feature where rpm-ostree pins a file if it was ever managed by an RPM, and doesn't update it.

Not worth a respin, but that's not quite correct. It's more that all files in /etc are treated as %config(noreplace) by ostree. And in this case, from ostree's PoV the file was "manually" modified (by the MCO), then the MCO stopped modifying it. ostree doesn't know that though, so the file became "unmanaged state".

Or to rephrase, the problem is that the file was edited compared to the RPM default, and then stopped being modified; not that it was "managed by an RPM".

@cgwalters
Copy link
Member

/cherrypick release-4.8

@openshift-cherrypick-robot

@cgwalters: once the present PR merges, I will cherry-pick it on top of release-4.8 in a new PR and assign it to you.

In response to this:

/cherrypick release-4.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander
Copy link
Member Author

updated the commit, ptal @cgwalters

@haircommander haircommander changed the title crio: complete crio default config Bug 1995785: crio: complete crio default config Aug 19, 2021
@openshift-ci openshift-ci bot added bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. and removed lgtm Indicates that a PR is ready to be merged. labels Aug 19, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 19, 2021

@haircommander: This pull request references Bugzilla bug 1995785, which is invalid:

  • expected the bug to target the "4.9.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1995785: crio: complete crio default config

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rphillips
Copy link
Contributor

/bugzilla refresh
/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 19, 2021

@rphillips: An error was encountered querying GitHub for users with public email (schoudha@redhat.com) for bug 1995785 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. Post "http://ghproxy/graphql": dial tcp 172.30.229.2:80: i/o timeout

Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

In response to this:

/bugzilla refresh
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 19, 2021
@mrunalp mrunalp added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 19, 2021
@rphillips
Copy link
Contributor

/bugzilla refresh

@openshift-ci openshift-ci bot removed the bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. label Aug 19, 2021
@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Aug 19, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 19, 2021

@rphillips: This pull request references Bugzilla bug 1995785, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.0) matches configured target release for branch (4.9.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (schoudha@redhat.com), skipping review request.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 19, 2021

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: haircommander, mrunalp, rphillips

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mtrmac
Copy link
Contributor

mtrmac commented Aug 19, 2021

Instead of allowing nodes to not come up, we populate the MCO template with all relevant fields in the cri-o rpm (including "conmon" option, even though it's being set to the default),

Do we need to think about how these “all relevant fields” will be kept up-to-date? I.e. is there a risk of CRI-O upstream, and/or the RPM maintainer, adjusting the default config file, and MCO not being updated?

Alternatively, would it be possible to somehow convince ostree to return /etc/crio/crio.conf to fully-managed state, coming from the RPM, and only have MCO ship its intentional overrides in /etc/crio/crio.conf.d?

Or are the relevant maintainers actually paying more attention to the MCO than to the upstream and/or RPM configuration, so this is not really a concern at all?

@cgwalters
Copy link
Member

Alternatively, would it be possible to somehow convince ostree to return /etc/crio/crio.conf to fully-managed state, coming from the RPM, and only have MCO ship its intentional overrides in /etc/crio/crio.conf.d?

Absolutely, that's easy: we can change the MCO to cp {/usr,}/etc/crio/crio.conf. However, this only works on ostree systems and unfortunately we need to consider traditional yum systems too at the moment, which don't have /usr/etc. (IMO, this is one of the cooler features of ostree)

This path though is aligned with the yum/rpm case of the semantics of %config(noreplace) - on a traditional yum system, reverting the file to the default will also return it to being managed by RPM. There just isn't a convenient way to do it without redownloading the RPM.

There's code in the MCO which tries to simulate this with .mcdorig files, but I think that's buggy.

I think this is the second or third bug we've had of this form. I think the most robust fix is: crio ships defaults in /usr/lib/crio.conf or just in the binary itself, and there is no /etc/crio/crio.conf at all by default. The MCO or admins can write drop-ins in /etc/crio.conf.d, and that's it.

Also, related to this: when the MCO writes its drop in, it should only write values that actually changed, and not write the whole config file.

@wking
Copy link
Member

wking commented Aug 19, 2021

upgrade failed with:

[sig-network] pods should successfully create sandboxes by not timing out

That's not unique to this PR, and sounds orthogonal. I think we should:

/override ci/prow/e2e-agnostic-upgrade

to get this landed and unblock the backport chain for this FastFix bug.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 19, 2021

@wking: wking unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file.

In response to this:

upgrade failed with:

[sig-network] pods should successfully create sandboxes by not timing out

That's not unique to this PR, and sounds orthogonal. I think we should:

/override ci/prow/e2e-agnostic-upgrade

to get this landed and unblock the backport chain for this FastFix bug.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 19, 2021

@haircommander: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-ovn-step-registry 7d79caa link /test e2e-ovn-step-registry
ci/prow/e2e-aws-workers-rhel7 7d79caa link /test e2e-aws-workers-rhel7
ci/prow/e2e-metal-ipi 7d79caa link /test e2e-metal-ipi
ci/prow/e2e-aws-disruptive 7d79caa link /test e2e-aws-disruptive

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@kikisdeliveryservice
Copy link
Contributor

Given the urgency and fact that the e2e test is required by basically red. Let's merge to unblock

/override ci/prow/e2e-agnostic-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 19, 2021

@kikisdeliveryservice: Overrode contexts on behalf of kikisdeliveryservice: ci/prow/e2e-agnostic-upgrade

In response to this:

Given the urgency and fact that the e2e test is required by basically red. Let's merge to unblock

/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@cgwalters: new pull request created: #2726

In response to this:

/cherrypick release-4.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 19, 2021

@haircommander: All pull requests linked via external trackers have merged:

Bugzilla bug 1995785 has been moved to the MODIFIED state.

In response to this:

Bug 1995785: crio: complete crio default config

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kikisdeliveryservice
Copy link
Contributor

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 19, 2021

@kikisdeliveryservice: Bugzilla bug 1995785 is in an unrecognized state (MODIFIED) and will not be moved to the MODIFIED state.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kikisdeliveryservice kikisdeliveryservice added bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. and removed bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. labels Aug 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants