Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cri-o: set manage_ns_lifecycle to true #1568

Merged

Conversation

haircommander
Copy link
Member

- What I did
change the entry in crio.conf template to manage ns lifecycle
As it is more secure and gives cri-o more control of namespace lifecycle. Also change the outdated config name value

- How to verify it

- Description for the changelog

CRI-O now manages namespace lifecycle

As it is more secure and gives cri-o more control of namespace lifecycle. Also change the outdated config name value

Signed-off-by: Peter Hunt <pehunt@redhat.com>
@kikisdeliveryservice
Copy link
Contributor

/retest

2 similar comments
@sinnykumari
Copy link
Contributor

/retest

@haircommander
Copy link
Member Author

/retest

@haircommander
Copy link
Member Author

/test e2e-aws

@haircommander
Copy link
Member Author

/retest

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 27, 2020
@kikisdeliveryservice
Copy link
Contributor

@mtrmac @umohnani8 PTAL and lift hold/add LGTM once you review

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 27, 2020
@mtrmac
Copy link
Contributor

mtrmac commented Mar 27, 2020

I really have no expertise in this aspect of CRI-O right now.

@umohnani8
Copy link
Contributor

/lgtm
/hold cancel

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Mar 30, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, kikisdeliveryservice, umohnani8

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit f6bd286 into openshift:master Mar 30, 2020
@dcbw
Copy link
Contributor

dcbw commented Mar 31, 2020

Seems to be breaking most of the ovn-kubernetes CI tests in both GCP and AWS with errors like:

Mar 30 14:13:23.587106 ip-10-0-134-138 crio[1259]: 2020-03-30T14:13:23Z [error] Multus: [openshift-controller-manager-operator/openshift-controller-manager-operator-77d79b5558-jwdvj]: error adding container to network "ovn-kubernetes": delegateAdd: error invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[openshift-controller-manager-operator/openshift-controller-manager-operator-77d79b5558-jwdvj] failed to configure pod interface: failed to open netns "/var/run/crio/ns/39e62710-ef69-46ec-bb04-1a82e1855e1c/net": failed to Statfs "/var/run/crio/ns/39e62710-ef69-46ec-bb04-1a82e1855e1c/net": no such file or directory

@dulek
Copy link

dulek commented Mar 31, 2020

@dcbw: It's breaking Kuryr too, please advice if we should make sure Kuryr is configured to expect the netns on /var/run/crio/ns or in old location as this will get reverted. Supporting both at the same time might be problematic and would make Kuryr mount more stuff than it needs.

@dulek
Copy link

dulek commented Mar 31, 2020

Ah, also it's worth asking about upgrade strategy here. Would 4.4->4.5 upgrade relocate all the network namespaces?

dulek added a commit to dulek/cluster-network-operator that referenced this pull request Mar 31, 2020
openshift/machine-config-operator#1568 moved pod namespaces from
/proc into /var/run/crio. As Kuryr needs access to them in order to
manipulate interfaces, we need to mount the new directory and this
commit does that.

Most likely the same change needs to be done for ovn-kubernetes, but
it's a bit out of my expertise.
@haircommander
Copy link
Member Author

Eventually, we will want this change (using /proc is inherently racy and risky). If needed, we can revert this to fix OVNKubernetes, but if the fix is as simple as adding a mount as it is for Kuryr, we should do that and keep this change. WDYT @dcbw

@dulek
Copy link

dulek commented Mar 31, 2020

Eventually, we will want this change (using /proc is inherently racy and risky). If needed, we can revert this to fix OVNKubernetes, but if the fix is as simple as adding a mount as it is for Kuryr, we should do that and keep this change. WDYT @dcbw

For some reason adding a mount is not solving the problem - there still are some permissions issues even though Kuryr runs as root in a privileged pod.

Also we do need to understand the upgrade strategy here. Will we be sure DEL requests for the pods that already existed during 4.4->4.5 upgrade will point to new directory where netns are or the old one?

@stbenjam
Copy link
Member

Do AWS or GCP CI tests not use ovn-kubernetes? This has broken baremetal IPI, as well. Is there any workaround we could do to make this work for us? Or any chance to revert if there won't be a quick fix? Thank you!

@haircommander
Copy link
Member Author

what has broken with baremetal IPI? I will revert this fix, though I'd like to compile all the failures it caused to try to mitigate them when we eventually make this switch

@haircommander
Copy link
Member Author

reverted here #1600

@stbenjam
Copy link
Member

what has broken with baremetal IPI? I will revert this fix, though I'd like to compile all the failures it caused to try to mitigate them when we eventually make this switch

I had a bunch of pods stuck at ContainerCreating:

NAMESPACE                                               NAME                                                              READY   STATUS              RESTARTS   AGE
openshift-apiserver-operator                            openshift-apiserver-operator-7df89f684f-n7fmt                     0/1     ContainerCreating   0          123m

master-0 journal was scrolling:

Mar 31 17:36:23 master-0.ostest.test.metalkube.org hyperkube[1499]: E0331 17:36:23.005806    1499 pod_workers.go:191] Error syncing pod 1444419f-0998-4719-a7b9-ada5bad599c2 ("openshift-service-catalog-controller-manager-operator-644c62t4c_openshift-service-catalog-controller-manager-operator(1444419f-0998-4719-a7b9-ada5bad599c2)"), skipping: failed to "CreatePodSandbox" for "openshift-service-catalog-controller-manager-operator-644c62t4c_openshift-service-catalog-controller-manager-operator(1444419f-0998-4719-a7b9-ada5bad599c2)" with CreatePodSandboxError: "CreatePodSandbox for pod \"openshift-service-catalog-controller-manager-operator-644c62t4c_openshift-service-catalog-controller-manager-operator(1444419f-0998-4719-a7b9-ada5bad599c2)\" failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_openshift-service-catalog-controller-manager-operator-644c62t4c_openshift-service-catalog-controller-manager-operator_1444419f-0998-4719-a7b9-ada5bad599c2_0(b3310a438b1f61e4fa616137406708a6a83acdc4b0bb8ec9d54ac31c88b97630): Multus: [openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator-644c62t4c]: error adding container to network \"ovn-kubernetes\": delegateAdd: error invoking DelegateAdd - \"ovn-k8s-cni-overlay\": error in getting result from AddNetwork: CNI request failed with status 400: '[openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator-644c62t4c] failed to configure pod interface: failed to open netns \"/var/run/crio/ns/f8feafeb-c38d-43db-b10a-dc3be9ed2b24/net\": failed to Statfs \"/var/run/crio/ns/f8feafeb-c38d-43db-b10a-dc3be9ed2b24/net\": no such file or directory\n'"

We do have OpenShift CI now, I've opened openshift/release#8051 to enable it on this repository (non-blocking, opt-in only) if someone here wants to review. The e2e-metal-ipi job is nice since all networks are default IPv6, and it uses ovn-kubernetes by default.

dulek added a commit to dulek/cluster-network-operator that referenced this pull request Apr 29, 2020
openshift/machine-config-operator#1568 moved pod namespaces from
/proc into /var/run/crio. As Kuryr needs access to them in order to
manipulate interfaces, we need to mount the new directory and this
commit does that.

Most likely the same change needs to be done for ovn-kubernetes, but
it's a bit out of my expertise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet