Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1831866: cri-o: manage ns lifecycle, again! #1689

Merged

Conversation

haircommander
Copy link
Member

@haircommander haircommander commented Apr 27, 2020

- What I did
change the entry in crio.conf template to manage ns lifecycle
As it is more secure and gives cri-o more control of namespace lifecycle

This is attempting to do what #1568 did, but now we've hopefully ironed out the issues that caused the need for #1600

- How to verify it

- Description for the changelog

CRI-O now manages namespace lifecycle

switch cri-o to manage namespace lifecycle again, after having ironed out some details with third party networking plugins

Signed-off-by: Peter Hunt <pehunt@redhat.com>
@haircommander
Copy link
Member Author

haircommander commented Apr 27, 2020

btw @stbenjam @dulek as this broke y'all last time 😄 , and hopefully will no longer do so

@mrunalp
Copy link
Member

mrunalp commented Apr 27, 2020

/hold to inspect the artifacts before merging.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 27, 2020
@stbenjam
Copy link
Member

e2e-metal-ipi passed, so I think this time it's good.

@haircommander
Copy link
Member Author

/retest

I'm running the e2e-network-stress test with clusterbot. that should give us an idea of how this is doing

I also verified that this PR is working as expected, though I only poked through the artifacts briefly

@haircommander
Copy link
Member Author

job test e2e-network-stress openshift/machine-config-operator#1689 succeeded
assuming that cluster bot did the right thing, then hooray 😄

@haircommander
Copy link
Member Author

haircommander commented Apr 28, 2020

does this work?

/test e2e-network-stress

edit: no

@openshift-ci-robot
Copy link
Contributor

@haircommander: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

  • /test e2e-aws
  • /test e2e-aws-disruptive
  • /test e2e-aws-scaleup-rhel7
  • /test e2e-gcp-op
  • /test e2e-gcp-upgrade
  • /test e2e-metal-ipi
  • /test e2e-openstack
  • /test e2e-ovirt
  • /test e2e-vsphere
  • /test images
  • /test unit
  • /test verify

Use /test all to run all jobs.

In response to this:

does this work?

/test e2e-network-stress

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dulek added a commit to dulek/cluster-network-operator that referenced this pull request Apr 29, 2020
openshift/machine-config-operator#1689 moves pod namespaces from
/proc into /var/run/netns. As Kuryr needs access to them in order to
manipulate interfaces, we need to mount the new directory and this
commit does that.
@dulek
Copy link

dulek commented Apr 29, 2020

Alright, so along with openshift/cluster-network-operator#562 this seem to work just fine. :)

@haircommander
Copy link
Member Author

/retest

dulek added a commit to dulek/cluster-network-operator that referenced this pull request May 5, 2020
openshift/machine-config-operator#1689 moves pod namespaces from
/proc into /run/netns. As Kuryr needs access to them in order to
manipulate interfaces, we need to mount the new directory and this
commit does that.

Note that CNI will pass /var/run/netns in netns paths, but /var/run is a
symlink to /run, so it should be just fine.
@haircommander
Copy link
Member Author

/retest

@haircommander haircommander changed the title cri-o: manage ns lifecycle, again! Bug 1831866: cri-o: manage ns lifecycle, again! May 5, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels May 5, 2020
@openshift-ci-robot
Copy link
Contributor

@haircommander: This pull request references Bugzilla bug 1831866, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.0) matches configured target release for branch (4.5.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1831866: cri-o: manage ns lifecycle, again!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander
Copy link
Member Author

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@haircommander: This pull request references Bugzilla bug 1831866, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.0) matches configured target release for branch (4.5.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander
Copy link
Member Author

we've gotten +1 from kuryr team and metal IPI teams, as well as tested with ovs and ovn, and run network stress tests.

PTAL @umohnani8 @mrunalp I believe this is ready.

JacobTanenbaum pushed a commit to JacobTanenbaum/cluster-network-operator that referenced this pull request May 7, 2020
openshift/machine-config-operator#1689 moves pod namespaces from
/proc into /run/netns. As Kuryr needs access to them in order to
manipulate interfaces, we need to mount the new directory and this
commit does that.

Note that CNI will pass /var/run/netns in netns paths, but /var/run is a
symlink to /run, so it should be just fine.
@haircommander
Copy link
Member Author

/retest

@haircommander
Copy link
Member Author

could not wait for build: the build machine-config-operator failed after 3m30s with reason DockerBuildFailed: Docker build strategy has failed.
/retest

@mrunalp
Copy link
Member

mrunalp commented May 8, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 8, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

8 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@haircommander
Copy link
Member Author

haircommander commented May 12, 2020

@runcom @kikisdeliveryservice @sinnykumari @yuqi-zhang can we skip gcp-op? It passed organically in CI before the timeouts started happening, and I ran them manually on a gcp cluster with a bumped timeout and they passed. I'd like this to have some soak time before 4.5 freeze to make sure we have time to react to issues if there are any

@yuqi-zhang
Copy link
Contributor

I can confirm that it did pass at some point. I'm going to go ahead and override

@yuqi-zhang
Copy link
Contributor

/override e2e-gcp-op

@yuqi-zhang
Copy link
Contributor

/override ci/prow/e2e-gcp-op

@openshift-ci-robot
Copy link
Contributor

@yuqi-zhang: /override requires a failed status context to operate on.
The following unknown contexts were given:

  • e2e-gcp-op

Only the following contexts were expected:

  • ci/prow/e2e-aws
  • ci/prow/e2e-aws-scaleup-rhel7
  • ci/prow/e2e-gcp-op
  • ci/prow/e2e-gcp-upgrade
  • ci/prow/e2e-metal-ipi
  • ci/prow/images
  • ci/prow/unit
  • ci/prow/verify
  • tide

In response to this:

/override e2e-gcp-op

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@yuqi-zhang: Overrode contexts on behalf of yuqi-zhang: ci/prow/e2e-gcp-op

In response to this:

/override ci/prow/e2e-gcp-op

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander
Copy link
Member Author

I can confirm that it did pass at some point. I'm going to go ahead and override

thanks!

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@sinnykumari
Copy link
Contributor

since e2e-gcp-op test has already passed earlier, attempting to override again to get this merged
/override ci/prow/e2e-gcp-op

@openshift-ci-robot
Copy link
Contributor

@sinnykumari: Overrode contexts on behalf of sinnykumari: ci/prow/e2e-gcp-op

In response to this:

since e2e-gcp-op test has already passed earlier, attempting to override again to get this merged
/override ci/prow/e2e-gcp-op

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 3be112d into openshift:master May 13, 2020
@openshift-ci-robot
Copy link
Contributor

@haircommander: All pull requests linked via external trackers have merged: openshift/machine-config-operator#1689. Bugzilla bug 1831866 has been moved to the MODIFIED state.

In response to this:

Bug 1831866: cri-o: manage ns lifecycle, again!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-network-operator that referenced this pull request Jun 29, 2020
openshift/machine-config-operator#1689 moves pod namespaces from
/proc into /run/netns. As Kuryr needs access to them in order to
manipulate interfaces, we need to mount the new directory and this
commit does that.

Note that CNI will pass /var/run/netns in netns paths, but /var/run is a
symlink to /run, so it should be just fine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet