Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NP-877: Live migration suite e2e #28462

Merged

Conversation

martinkennelly
Copy link
Contributor

@martinkennelly martinkennelly commented Dec 14, 2023

/hold
/cc

Add live migration suite and e2e test cases.
This can be executed by:

export TEST_SDN_LIVE_MIGRATION_OPTIONS=target-cni=OVNKubernetes
./openshift-tests run openshift/network/live-migration

Prereq: ensure feature gate [1] TechPreviewNoUpgrade is set before executing suite.

[1] https://docs.openshift.com/container-platform/4.14/nodes/clusters/nodes-cluster-enabling-features.html#nodes-cluster-enabling-features-cli_nodes-cluster-enabling

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Dec 14, 2023
Copy link
Contributor

openshift-ci bot commented Dec 14, 2023

@martinkennelly: GitHub didn't allow me to request PR reviews from the following users: martinkennelly.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/hold
/cc

Add live migration suite and e2e test cases.
This can be executed by:

./openshift-tests run openshift/network/live-migration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the vendor-update Touching vendor dir or related files label Dec 14, 2023
@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 1c7567d

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-openstack-ovn IncompleteTests
Tests for this run (18) are below the historical average (1684): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-metal-ipi-sdn IncompleteTests
Tests for this run (17) are below the historical average (1440): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (17) are below the historical average (1576): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-gcp-ovn-upgrade IncompleteTests
Tests for this run (20) are below the historical average (745): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-gcp-ovn-rt-upgrade IncompleteTests
Tests for this run (20) are below the historical average (628): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-gcp-ovn IncompleteTests
Tests for this run (19) are below the historical average (1560): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-gcp-csi IncompleteTests
Tests for this run (19) are below the historical average (610): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade IncompleteTests
Tests for this run (21) are below the historical average (687): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade IncompleteTests
Tests for this run (20) are below the historical average (1931): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial IncompleteTests
Tests for this run (18) are below the historical average (625): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node IncompleteTests
Tests for this run (18) are below the historical average (1463): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-serial IncompleteTests
Tests for this run (19) are below the historical average (697): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-fips IncompleteTests
Tests for this run (19) are below the historical average (1706): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-cgroupsv2 IncompleteTests
Tests for this run (18) are below the historical average (1813): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-csi IncompleteTests
Tests for this run (19) are below the historical average (657): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-agnostic-ovn-cmd IncompleteTests
Tests for this run (19) are below the historical average (658): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: d11618c

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-openstack-ovn IncompleteTests
Tests for this run (18) are below the historical average (1663): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-metal-ipi-sdn IncompleteTests
Tests for this run (17) are below the historical average (1423): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (17) are below the historical average (1563): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-gcp-ovn-upgrade IncompleteTests
Tests for this run (20) are below the historical average (738): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-gcp-ovn-rt-upgrade IncompleteTests
Tests for this run (20) are below the historical average (621): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-gcp-ovn IncompleteTests
Tests for this run (19) are below the historical average (1549): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-gcp-csi IncompleteTests
Tests for this run (19) are below the historical average (603): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade IncompleteTests
Tests for this run (21) are below the historical average (678): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade IncompleteTests
Tests for this run (20) are below the historical average (1917): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial IncompleteTests
Tests for this run (18) are below the historical average (621): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node IncompleteTests
Tests for this run (18) are below the historical average (1446): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-serial IncompleteTests
Tests for this run (19) are below the historical average (691): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-fips IncompleteTests
Tests for this run (19) are below the historical average (1692): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-cgroupsv2 IncompleteTests
Tests for this run (18) are below the historical average (1791): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-csi IncompleteTests
Tests for this run (19) are below the historical average (650): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-agnostic-ovn-cmd IncompleteTests
Tests for this run (19) are below the historical average (650): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: f54dc19

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-serial IncompleteTests
Tests for this run (99) are below the historical average (662): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-fips IncompleteTests
Tests for this run (99) are below the historical average (1623): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@martinkennelly martinkennelly force-pushed the live-migration-suite-e2e branch 2 times, most recently from e04d5ec to 3f4533f Compare December 15, 2023 07:42
@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 3f4533f

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-openstack-ovn IncompleteTests
Tests for this run (2) are below the historical average (1497): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (19) are below the historical average (1443): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade IncompleteTests
Tests for this run (2) are below the historical average (611): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade IncompleteTests
Tests for this run (2) are below the historical average (1728): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial IncompleteTests
Tests for this run (2) are below the historical average (561): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node IncompleteTests
Tests for this run (2) are below the historical average (1302): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-serial IncompleteTests
Tests for this run (2) are below the historical average (637): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-fips IncompleteTests
Tests for this run (2) are below the historical average (1582): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@martinkennelly martinkennelly changed the title WIP: Live migration suite e2e Live migration suite e2e Dec 20, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 20, 2023
@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 4d5a4b7

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade IncompleteTests
Tests for this run (26) are below the historical average (1543): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@martinkennelly
Copy link
Contributor Author

martinkennelly commented Jan 4, 2024

Test [bz-monitoring][invariant] alert/Watchdog must have no gaps or changes is failing consistently with:
Watchdog alert not found

@martinkennelly
Copy link
Contributor Author

martinkennelly commented Jan 10, 2024

Depends on updates here openshift/cluster-network-operator#2179 done

@martinkennelly
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 10, 2024
@martinkennelly
Copy link
Contributor Author

martinkennelly commented Jan 10, 2024

Depends on updates here openshift/cluster-network-operator#2179

I am waiting for this PR to merge so I can update this PR but Id love a review to make sure I am going the right direction. done

@martinkennelly
Copy link
Contributor Author

No functional changes - just go get openshift/cluster-network-operator to consume latest update from openshift/cluster-network-operator#2179

I am happy with this PR now.

@martinkennelly
Copy link
Contributor Author

@neisw Any more comments?

@neisw
Copy link
Contributor

neisw commented Jan 11, 2024

Nope, I believe you have worked through Devan's last question. Will give him a chance to respond but I can tag it when you are ready if he doesn't get it.

@neisw
Copy link
Contributor

neisw commented Jan 11, 2024

/lgtm
/hold

release the hold when you are ready

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 11, 2024
@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 11, 2024
@martinkennelly
Copy link
Contributor Author

/unhold

If there are any additional comments, we can follow up with a patch. Hope thats ok Devon.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 12, 2024
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD fd0ae1d and 2 for PR HEAD 34863ad in total

@martinkennelly
Copy link
Contributor Author

Bug created to improve monitors - they should not flake when a set of pods go from Running -> Pending: https://issues.redhat.com/browse/OCPBUGS-27059

Tests are disabled by default even if called in the 'all'
suite because environment variable
'TEST_SDN_LIVE_MIGRATION_OPTIONS' must be set to a
known value, otherwise tests are skipped.

E.g. TEST_SDN_LIVE_MIGRATION_OPTIONS=target-cni=OVNKubernetes

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 12, 2024
@martinkennelly
Copy link
Contributor Author

@neisw needed to rebase because #28473 altered the go.*. I didnt make any changes to my first commit and only altered the Live Migration: go mod tidy && go mod vendor commit.

@dgoodwin
Copy link
Contributor

/lgtm
/approve

All good here, thanks for looking into that test flake.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 12, 2024
Copy link
Contributor

openshift-ci bot commented Jan 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin, martinkennelly, neisw, pliurh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit dcba6b5 into openshift:master Jan 12, 2024
19 of 23 checks passed
Copy link
Contributor

openshift-ci bot commented Jan 12, 2024

@martinkennelly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-csi 6500ca1 link false /test e2e-gcp-csi
ci/prow/e2e-aws-ovn-single-node-serial 6500ca1 link false /test e2e-aws-ovn-single-node-serial

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@martinkennelly
Copy link
Contributor Author

/cherry-pick release-4.15

@openshift-cherrypick-robot

@martinkennelly: #28462 failed to apply on top of branch "release-4.15":

Applying: Live migration: add suite and e2e tests
Applying: Live migration: go mod tidy && go mod vendor
.git/rebase-apply/patch:429: new blank line at EOF.
+
.git/rebase-apply/patch:2954: new blank line at EOF.
+
warning: 2 lines add whitespace errors.
Using index info to reconstruct a base tree...
M	go.mod
M	go.sum
M	vendor/modules.txt
Falling back to patching base and 3-way merge...
Auto-merging vendor/modules.txt
CONFLICT (content): Merge conflict in vendor/modules.txt
Removing vendor/github.com/spf13/cast/.travis.yml
Removing vendor/github.com/google/btree/.travis.yml
Auto-merging go.sum
CONFLICT (content): Merge conflict in go.sum
Auto-merging go.mod
CONFLICT (content): Merge conflict in go.mod
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 Live migration: go mod tidy && go mod vendor
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.15

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

martinkennelly added a commit to martinkennelly/release that referenced this pull request Jan 19, 2024
For details of the CNI migration feature
see the feature enhancement PR [1].

Within o/origin, there is a suite for live migration.
The suite accepts one environment variable.
The value of this environment variable controls
which CNI we are migrating to and optionally, if we
want to trigger rollback. Rollback will only start
once live migration to the target CNI is complete.
If the environemnt variables value is blank, live
migration test will be skipped. For more info,
see the o/origin live migration PR [2].

Note: enabling of eip feature tests depends on fixing
a bug [3]. A bug to track this enabling has also been
created [4].

[1] openshift/enhancements#1064
[2] openshift/origin#28462
[3] https://issues.redhat.com/browse/OCPBUGS-26196
[4] https://issues.redhat.com/browse/NP-884

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
openshift-merge-bot bot pushed a commit to openshift/release that referenced this pull request Jan 23, 2024
For details of the CNI migration feature
see the feature enhancement PR [1].

Within o/origin, there is a suite for live migration.
The suite accepts one environment variable.
The value of this environment variable controls
which CNI we are migrating to and optionally, if we
want to trigger rollback. Rollback will only start
once live migration to the target CNI is complete.
If the environemnt variables value is blank, live
migration test will be skipped. For more info,
see the o/origin live migration PR [2].

Note: enabling of eip feature tests depends on fixing
a bug [3]. A bug to track this enabling has also been
created [4].

[1] openshift/enhancements#1064
[2] openshift/origin#28462
[3] https://issues.redhat.com/browse/OCPBUGS-26196
[4] https://issues.redhat.com/browse/NP-884

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
skordas pushed a commit to skordas/release that referenced this pull request Jan 23, 2024
For details of the CNI migration feature
see the feature enhancement PR [1].

Within o/origin, there is a suite for live migration.
The suite accepts one environment variable.
The value of this environment variable controls
which CNI we are migrating to and optionally, if we
want to trigger rollback. Rollback will only start
once live migration to the target CNI is complete.
If the environemnt variables value is blank, live
migration test will be skipped. For more info,
see the o/origin live migration PR [2].

Note: enabling of eip feature tests depends on fixing
a bug [3]. A bug to track this enabling has also been
created [4].

[1] openshift/enhancements#1064
[2] openshift/origin#28462
[3] https://issues.redhat.com/browse/OCPBUGS-26196
[4] https://issues.redhat.com/browse/NP-884

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
memodi pushed a commit to memodi/release that referenced this pull request Mar 14, 2024
For details of the CNI migration feature
see the feature enhancement PR [1].

Within o/origin, there is a suite for live migration.
The suite accepts one environment variable.
The value of this environment variable controls
which CNI we are migrating to and optionally, if we
want to trigger rollback. Rollback will only start
once live migration to the target CNI is complete.
If the environemnt variables value is blank, live
migration test will be skipped. For more info,
see the o/origin live migration PR [2].

Note: enabling of eip feature tests depends on fixing
a bug [3]. A bug to track this enabling has also been
created [4].

[1] openshift/enhancements#1064
[2] openshift/origin#28462
[3] https://issues.redhat.com/browse/OCPBUGS-26196
[4] https://issues.redhat.com/browse/NP-884

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants