Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.6] Bug 1979585: Validate the status of the etcd snapshot during backup and restore #624

Merged
merged 1 commit into from Aug 4, 2021

Conversation

hexfusion
Copy link
Contributor

manual cherry-pick of #617 into release-4.6. This PR omits the refactor with a focus only on "validate the status of etcd snapshot."

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2021

@hexfusion: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

Validate the status of the etcd snapshot during backup and restore

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 6, 2021
@hexfusion hexfusion changed the title Validate the status of the etcd snapshot during backup and restore [release-4.6]: Validate the status of the etcd snapshot during backup and restore Jul 6, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2021

@hexfusion: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

[release-4.6]: Validate the status of the etcd snapshot during backup and restore

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion hexfusion changed the title [release-4.6]: Validate the status of the etcd snapshot during backup and restore [release-4.6] Bug 1976287: Validate the status of the etcd snapshot during backup and restore Jul 6, 2021
@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jul 6, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2021

@hexfusion: This pull request references Bugzilla bug 1976287, which is invalid:

  • expected the bug to target the "4.6.z" release, but it targets "4.7.z" instead
  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is VERIFIED instead
  • expected dependent Bugzilla bug 1965024 to target a release in 4.7.0, 4.7.z, but it targets "4.8.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

[release-4.6] Bug 1976287: Validate the status of the etcd snapshot during backup and restore

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2021

@hexfusion: This pull request references Bugzilla bug 1976287, which is invalid:

  • expected the bug to target the "4.6.z" release, but it targets "4.7.z" instead
  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is VERIFIED instead
  • expected dependent Bugzilla bug 1965024 to target a release in 4.7.0, 4.7.z, but it targets "4.8.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion hexfusion changed the title [release-4.6] Bug 1976287: Validate the status of the etcd snapshot during backup and restore [release-4.6] Bug 1979585: Validate the status of the etcd snapshot during backup and restore Jul 6, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2021

@hexfusion: This pull request references Bugzilla bug 1979585, which is invalid:

  • expected dependent Bugzilla bug 1965024 to target a release in 4.7.0, 4.7.z, but it targets "4.8.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

[release-4.6] Bug 1979585: Validate the status of the etcd snapshot during backup and restore

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2021

@hexfusion: This pull request references Bugzilla bug 1979585, which is invalid:

  • expected dependent Bugzilla bug 1965024 to target a release in 4.7.0, 4.7.z, but it targets "4.8.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion
Copy link
Contributor Author

/test list

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2021

@hexfusion: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

  • /test configmap-scale
  • /test e2e-agnostic
  • /test e2e-agnostic-upgrade
  • /test e2e-aws
  • /test e2e-azure
  • /test e2e-disruptive
  • /test e2e-gcp
  • /test e2e-metal-assisted
  • /test e2e-metal-ipi
  • /test e2e-operator
  • /test images
  • /test unit
  • /test verify
  • /test verify-deps

Use /test all to run the following jobs:

  • pull-ci-openshift-cluster-etcd-operator-release-4.6-e2e-agnostic
  • pull-ci-openshift-cluster-etcd-operator-release-4.6-e2e-agnostic-upgrade
  • pull-ci-openshift-cluster-etcd-operator-release-4.6-e2e-operator
  • pull-ci-openshift-cluster-etcd-operator-release-4.6-images
  • pull-ci-openshift-cluster-etcd-operator-release-4.6-unit
  • pull-ci-openshift-cluster-etcd-operator-release-4.6-verify
  • pull-ci-openshift-cluster-etcd-operator-release-4.6-verify-deps

In response to this:

/test list

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion
Copy link
Contributor Author

/test e2e-disruptive
/hold for disruptive test

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 6, 2021
@hexfusion
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jul 6, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2021

@hexfusion: This pull request references Bugzilla bug 1979585, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.z) matches configured target release for branch (4.6.z)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1976287 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE))
  • dependent Bugzilla bug 1976287 targets the "4.7.z" release, which is one of the valid target releases: 4.7.0, 4.7.z
  • bug has dependents

Requesting review from QA contact:
/cc @geliu2016

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from geliu2016 July 6, 2021 13:27
@hexfusion hexfusion linked an issue Jul 6, 2021 that may be closed by this pull request
@geliu2016
Copy link

Tried with 4.6.0-0.ci.test-2021-07-07-085525-ci-ln-yvd9x3k-latest, worked as expected:

sh-4.4# du * -s -h
92M snapshot_2021-07-07_094321.db
68K static_kuberesources_2021-07-07_094321.tar.gz
sh-4.4# truncate -s 126k snapshot_2021-07-07_094321.db
sh-4.4# du * -s -h
128K snapshot_2021-07-07_094321.db
68K static_kuberesources_2021-07-07_094321.tar.gz
sh-4.4# sudo -E /usr/local/bin/cluster-restore.sh /home/core/assets/backup
a0795d2b98740189bbadea50043442eab57d43cf07f640b6d3d23b4a490e018e
etcdctl version: 3.4.9
API version: 3.4
panic: freepages: failed to get all reachable pages (page 0: invalid type: unknown<00>)

goroutine 90 [running]:
go.etcd.io/etcd/vendor/go.etcd.io/bbolt.(*DB).freepages.func2(0xc0002320c0)
/go/src/go.etcd.io/etcd/vendor/go.etcd.io/bbolt/db.go:1003 +0xe5
created by go.etcd.io/etcd/vendor/go.etcd.io/bbolt.(*DB).freepages
/go/src/go.etcd.io/etcd/vendor/go.etcd.io/bbolt/db.go:1001 +0x1b5
Backup integrity verification failed. Backup appears corrupted. Aborting!

@hexfusion
Copy link
Contributor Author

/test e2e-disruptive

@kalexand-rh
Copy link

/retest

Copy link
Contributor

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

/retest

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 13, 2021
@sdodson
Copy link
Member

sdodson commented Jul 14, 2021

(patch manager) Please remove the hold if you wish for this PR to be approved.

@geliu2016
Copy link

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 15, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: geliu2016, hexfusion, lilic

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hexfusion
Copy link
Contributor Author

the test failure has me nervous here needs a deeper look

@mrunalp
Copy link
Member

mrunalp commented Jul 23, 2021

Patch manager: I don't see any updates on the test failures

@marun
Copy link
Contributor

marun commented Jul 23, 2021

@mrunalp The disruptive job is not required for merge. It is known not to pass consistently, hence it is optional.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 23, 2021
@marun
Copy link
Contributor

marun commented Jul 23, 2021

/retest

@marun
Copy link
Contributor

marun commented Jul 23, 2021

/hold

Nevermind, the most recent failure does look suspicious.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 23, 2021
@marun
Copy link
Contributor

marun commented Jul 24, 2021

@hexfusion Disruptive is failing because the dr test updates were never backported to 4.6. Do you want me to attempt to backport or would manual testing suffice?

@hexfusion
Copy link
Contributor Author

I just wanted to manually verify and have not had the chance but I guess if we can get the test passing would help on next time.

@marun
Copy link
Contributor

marun commented Jul 30, 2021

/retest

Disruptive should now be capable of passing against release-4.6.

@ecordell
Copy link

ecordell commented Aug 4, 2021

[patch-manager] ⌛ This pull request was not picked by the patch manager for the current z-stream window and have to wait for the next window.

skipped

  • Score: 0.00
  • Reason: skipping because "do-not-merge/hold" label found

NOTE: This message was automatically generated, if you have questions please ask on #forum-release

@marun
Copy link
Contributor

marun commented Aug 4, 2021

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 4, 2021
@ecordell
Copy link

ecordell commented Aug 4, 2021

Approving since the hold was removed

@ecordell ecordell added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Aug 4, 2021
@openshift-ci openshift-ci bot merged commit 6a3f3b5 into openshift:release-4.6 Aug 4, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 4, 2021

@hexfusion: All pull requests linked via external trackers have merged:

Bugzilla bug 1979585 has been moved to the MODIFIED state.

In response to this:

[release-4.6] Bug 1979585: Validate the status of the etcd snapshot during backup and restore

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion hexfusion deleted the cp-status-check branch August 5, 2021 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Backport #603 to 4.6
8 participants