Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1798191: Bump to ovn2.12 #445

Merged
merged 1 commit into from Feb 5, 2020

Conversation

pecameron
Copy link
Contributor

New paths for components

SDN-643

Signed-off-by: Phil Cameron pcameron@redhat.com

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 15, 2020
@pecameron
Copy link
Contributor Author

/test e2e-aws-ovn

@pecameron
Copy link
Contributor Author

/test e2e-aws-ovn

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 20, 2020
@pecameron
Copy link
Contributor Author

/test e2e-aws-ovn

1 similar comment
@pecameron
Copy link
Contributor Author

/test e2e-aws-ovn

@pecameron
Copy link
Contributor Author

/test e2e-aws-ovn
/retest

@pecameron pecameron changed the title [WIP] bump to ovn2.12 Bump to ovn2.12 Jan 21, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 21, 2020
@pecameron
Copy link
Contributor Author

/test e2e-aws-ovn

@pecameron
Copy link
Contributor Author

pecameron commented Jan 21, 2020

@alexanderConstantinescu @rcarrillocruz @dcbw PTAL
This only has effect when ovn-kube image has ovn2.12
I need to get this in to get ovn-kubernetes PR72 to pass ci
This worked in a local test on AWS.

@@ -70,7 +70,7 @@ spec:

exec ovn-northd \
--no-chdir "-vconsole:${OVN_LOG_LEVEL}" -vfile:off \
--pidfile=/var/run/openvswitch/ovn-northd.pid \
--pidfile=ovn-northd.pid \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the man pages (at least for 2.11) this will make it put the pidfile in .

However, do we actually need to be writing the pid files at all? It looks like the only pid file we ever read is the ovn-nbctl daemon's, and the others are "write-only" and we could just remove the --pidfile arguments...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. Dropped unneeded pid files.

@@ -330,7 +336,7 @@ spec:
fi

# start nbctl daemon for caching
export OVN_NB_DAEMON=$(ovn-nbctl --pidfile=/run/openvswitch/ovnk-nbctl.pid \
export OVN_NB_DAEMON=$(ovn-nbctl --pidfile=ovnk-nbctl.pid \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the container's preStop hook still does kill $(cat /run/openvswitch/ovnk-nbctl.pid)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -394,7 +402,11 @@ spec:
hostPath:
path: /var/lib/ovn/data
- name: run-openvswitch
emptyDir: {}
hostPath:
path: /var/run/openvswitch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it so that the container's /run/openvswitch would be preserved between runs of different containers, which I don't think we want...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I am not mistaken (and I think I would like to "call a friend" on this one: @dcbw ). Those locations contain the OVN DB. Which we do not want to preserve between runs because:

  1. I am pretty sure we can not upgrade and will crash when going from non-HA -> HA and vice versa (if we would ever, not sure we would though as non-HA is only in version 4.1/4.2 - tech-preview?)
  2. Our source of truth is k8s and we sync directly with it on startup anyways, and thus re-write the DB.

So, yes, I don't think we want this.

Why did you add this though @pecameron ? Is it required for 2.12?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unix socket needs to be everywhere it is referenced. This fixed the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually passes tests. I will back out the change.

@pecameron
Copy link
Contributor Author

/test e2e-aws-ovn

@pecameron pecameron changed the title Bump to ovn2.12 [WIP] (need to test with image that has ovn2.12) Bump to ovn2.12 Jan 23, 2020
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 23, 2020
@pecameron
Copy link
Contributor Author

/test e2e-aws-ovn

@pecameron
Copy link
Contributor Author

No nodes, no pods....
/test e2e-aws-ovn

@pecameron
Copy link
Contributor Author

"Error: ResourceLimitExceeded: Task limit exceeded"
/test e2e-aws-ovn

@pecameron
Copy link
Contributor Author

tcp 10.0.140.52:6443: connect: connection refused
/test e2e-aws-ovn

@pecameron
Copy link
Contributor Author

Cant't reach api server
/test e2e-aws-ovn

@pecameron
Copy link
Contributor Author

/test e2e-aws-ovn

@pecameron
Copy link
Contributor Author

/retest
/test e2e-aws-ovn

1 similar comment
@pecameron
Copy link
Contributor Author

/retest
/test e2e-aws-ovn

@pecameron
Copy link
Contributor Author

level=error msg="Cluster operator openshift-etcd Degraded is True with ConfigObservationDegradedError: ConfigObservationDegraded: error looking up self: lookup _etcd-server-ssl._tcp.ci-op-nd4331wx-583c6.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: read udp 10.129.0.21:43654->172.30.0.10:53: i/o timeout\nConfigObservationDegraded: error looking up self: lookup _etcd-server-ssl._tcp.ci-op-nd4331wx-583c6.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: read udp 10.129.0.21:55834->172.30.0.10:53: i/o timeout"

/retest
/test e2e-aws-ovn

@pecameron
Copy link
Contributor Author

Different set of failures this time. Trying again.
/retest
/test e2e-aws-ovn

@pecameron
Copy link
Contributor Author

/retest
/test e2e-aws-ovn

@pecameron
Copy link
Contributor Author

e2e-aws-ovn
error: Couldn't find resource for "crd-publish-openapi-test-foo.example.com/v1,
"alertname":"KubeAPILatencyHigh",
E kube-apiserver failed contacting the API: Get https://api.ci-op-h891bjwp-583c6.origin-ci-int-aws.dev.rhcloud.com:6443
The container could not be located when the pod was terminated
error deleting EBS volume "vol-0c95094660511ec2a" since volume is currently attached to "i-04c55ee4d769dbb2d"

@pecameron
Copy link
Contributor Author

@dcbw 445 passed tests. @alexanderConstantinescu @danwinship @rcarrillocruz Can we get this in? PTAL. I need it to test PR72 ovn2.12 image.

@pecameron pecameron changed the title Bump to ovn2.12 Bug 1798191 Bump to ovn2.12 Feb 4, 2020
New paths for components
pidfile changes.

SDN-643

Signed-off-by: Phil Cameron <pcameron@redhat.com>
@pecameron
Copy link
Contributor Author

/test e2e-aws-ovn

@pecameron
Copy link
Contributor Author

Back to a flaky CI....
release "release-latest" failed: the pod ci-op-kztttv4m/release-latest failed after 35s
/test e2e-aws-ovn

@dcbw
Copy link
Contributor

dcbw commented Feb 4, 2020

/test e2e-gcp-ovn

@dcbw
Copy link
Contributor

dcbw commented Feb 4, 2020

level=info msg="Credentials loaded from the \"default\" profile in file \"/etc/openshift-installer/.awscred\""
level=fatal msg="failed to fetch Terraform Variables: failed to fetch dependency of \"Terraform Variables\": failed to fetch dependency of \"Bootstrap Ignition Config\": failed to fetch dependency of \"Common Manifests\": failed to generate asset \"DNS Config\": getting public zone for \"origin-ci-int-aws.dev.rhcloud.com\": listing hosted zones: Throttling: Rate exceeded\n\tstatus code: 400, request id: 8de95699-cd1b-4fb2-b67b-d297e4222f61"
2020/02/04 20:51:09 Container setup in pod e2e-aws-ovn failed, exit code 1, reason Error

/test e2e-aws-ovn

@dcbw
Copy link
Contributor

dcbw commented Feb 4, 2020

/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 4, 2020
@pecameron pecameron changed the title Bug 1798191 Bump to ovn2.12 Bug 1798191: Bump to ovn2.12 Feb 4, 2020
@openshift-ci-robot
Copy link
Contributor

@pecameron: This pull request references Bugzilla bug 1798191, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Bug 1798191: Bump to ovn2.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Feb 4, 2020
@dcbw
Copy link
Contributor

dcbw commented Feb 4, 2020

/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dcbw, pecameron

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 9dc3987 into openshift:master Feb 5, 2020
@openshift-ci-robot
Copy link
Contributor

@pecameron: All pull requests linked via external trackers have merged. Bugzilla bug 1798191 has been moved to the MODIFIED state.

In response to this:

Bug 1798191: Bump to ovn2.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@pecameron: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp-ovn 2711b32 link /test e2e-gcp-ovn

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants