Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API-1617: Add test which ensures the SNO cluster can recover after being suspended/shutdown #41714

Merged
merged 4 commits into from
Aug 8, 2023

Conversation

vrutkovs
Copy link
Member

@vrutkovs vrutkovs commented Jul 27, 2023

Ensures that the SNO cluster can be restored after:

  • VM was suspended and resumed or
  • machine was shut down and powered on again

when the following period has passed:

  • 30 days
  • 90 days
  • 180 days
  • 360 days

Current status:

  • 119 days seems to be the max period, as otherwise openshift-apiserver(?) can't connect to kube-apiserver:
    2024-01-23T12:48:28.078309356+00:00 stderr F E0123 12:48:28.078211 15 authentication.go:70] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z, verifying certificate SN=4550266744792405343, SKID=, AKID=BB:36:AC:22:3C:D3:10:A0:BA:08:9C:B4:DA:45:F3:F4:3D:45:B6:01 failed: x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z]"
  • cert-rotation controller generates a new signer cert:
    cabundle.go:64] Updated ca-bundle.crt configmap [#0]: "openshift-kube-apiserver-operator_aggregator-client-signer@1706381630" [] issuer="<self-signed>" (Jan 27 18:53:49 2024 to Jan 28 06:53:50 2024 (now=Jul 31 18:42:00 2023))/openshift-config-managed with:
    but apparently installer pod is not created to place it on the node

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 27, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 27, 2023

@vrutkovs: This pull request references API-1603 which is a valid jira issue.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vrutkovs vrutkovs changed the title API-1603: Add test which ensures the cluster can recover after being suspended/shutdown API-1603: Add test which ensures the SNO cluster can recover after being suspended/shutdown Jul 27, 2023
@vrutkovs
Copy link
Member Author

/pj-rehearse

@vrutkovs
Copy link
Member Author

/pj-rehearse

@vrutkovs
Copy link
Member Author

/pj-rehearse

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 27, 2023

@vrutkovs: This pull request references API-1603 which is a valid jira issue.

In response to this:

Ensures that the SNO cluster can be restored after:

  • VM was suspended and resumed or
  • machine was shut down and powered on again
    when the following period has passed:
  • 30 days
  • 90 days
  • 180 days
  • 360 days

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 27, 2023

@vrutkovs: This pull request references API-1603 which is a valid jira issue.

In response to this:

Ensures that the SNO cluster can be restored after:

  • VM was suspended and resumed or
  • machine was shut down and powered on again

when the following period has passed:

  • 30 days
  • 90 days
  • 180 days
  • 360 days

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 27, 2023

@vrutkovs: This pull request references API-1603 which is a valid jira issue.

In response to this:

Ensures that the SNO cluster can be restored after:

  • VM was suspended and resumed or
  • machine was shut down and powered on again

when the following period has passed:

  • 30 days
  • 90 days
  • 180 days
  • 360 days

Current status:

  • 180 days seems to be the max period, as otherwise smth(?) can't connect to kube-apiserver:
2024-01-23T12:48:28.078309356+00:00 stderr F E0123 12:48:28.078211      15 authentication.go:70] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z, verifying certificate SN=4550266744792405343, SKID=, AKID=BB:36:AC:22:3C:D3:10:A0:BA:08:9C:B4:DA:45:F3:F4:3D:45:B6:01 failed: x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z]"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vrutkovs
Copy link
Member Author

/pj-rehearse periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-suspend-180d
/pj-rehearse periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-shutdown-180d

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 27, 2023

@vrutkovs: This pull request references API-1603 which is a valid jira issue.

In response to this:

Ensures that the SNO cluster can be restored after:

  • VM was suspended and resumed or
  • machine was shut down and powered on again

when the following period has passed:

  • 30 days
  • 90 days
  • 180 days
  • 360 days

Current status:

  • 90 days seems to be the max period, as otherwise smth(?) can't connect to kube-apiserver:
2024-01-23T12:48:28.078309356+00:00 stderr F E0123 12:48:28.078211      15 authentication.go:70] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z, verifying certificate SN=4550266744792405343, SKID=, AKID=BB:36:AC:22:3C:D3:10:A0:BA:08:9C:B4:DA:45:F3:F4:3D:45:B6:01 failed: x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z]"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 27, 2023

@vrutkovs: This pull request references API-1603 which is a valid jira issue.

In response to this:

Ensures that the SNO cluster can be restored after:

  • VM was suspended and resumed or
  • machine was shut down and powered on again

when the following period has passed:

  • 30 days
  • 90 days
  • 180 days
  • 360 days

Current status:

  • 90 days seems to be the max period, as otherwise smth(?) can't connect to kube-apiserver:
    2024-01-23T12:48:28.078309356+00:00 stderr F E0123 12:48:28.078211 15 authentication.go:70] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z, verifying certificate SN=4550266744792405343, SKID=, AKID=BB:36:AC:22:3C:D3:10:A0:BA:08:9C:B4:DA:45:F3:F4:3D:45:B6:01 failed: x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z]"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vrutkovs
Copy link
Member Author

/pj-rehearse periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-shutdown-180d

@vrutkovs
Copy link
Member Author

/pj-rehearse periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-shutdown-180d

@vrutkovs
Copy link
Member Author

/pj-rehearse periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-suspend-180d

@vrutkovs
Copy link
Member Author

/pj-rehearse periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-suspend-180d

1 similar comment
@vrutkovs
Copy link
Member Author

/pj-rehearse periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-suspend-180d

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 31, 2023

@vrutkovs: This pull request references API-1603 which is a valid jira issue.

In response to this:

Ensures that the SNO cluster can be restored after:

  • VM was suspended and resumed or
  • machine was shut down and powered on again

when the following period has passed:

  • 30 days
  • 90 days
  • 180 days
  • 360 days

Current status:

  • 90 days seems to be the max period, as otherwise openshift-apiserver(?) can't connect to kube-apiserver:
    2024-01-23T12:48:28.078309356+00:00 stderr F E0123 12:48:28.078211 15 authentication.go:70] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z, verifying certificate SN=4550266744792405343, SKID=, AKID=BB:36:AC:22:3C:D3:10:A0:BA:08:9C:B4:DA:45:F3:F4:3D:45:B6:01 failed: x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z]"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 31, 2023

@vrutkovs: This pull request references API-1603 which is a valid jira issue.

In response to this:

Ensures that the SNO cluster can be restored after:

  • VM was suspended and resumed or
  • machine was shut down and powered on again

when the following period has passed:

  • 30 days
  • 90 days
  • 180 days
  • 360 days

Current status:

  • 119 days seems to be the max period, as otherwise openshift-apiserver(?) can't connect to kube-apiserver:
    2024-01-23T12:48:28.078309356+00:00 stderr F E0123 12:48:28.078211 15 authentication.go:70] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z, verifying certificate SN=4550266744792405343, SKID=, AKID=BB:36:AC:22:3C:D3:10:A0:BA:08:9C:B4:DA:45:F3:F4:3D:45:B6:01 failed: x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z]"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vrutkovs vrutkovs changed the title API-1603: Add test which ensures the SNO cluster can recover after being suspended/shutdown API-1617: Add test which ensures the SNO cluster can recover after being suspended/shutdown Jul 31, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 31, 2023

@vrutkovs: This pull request references API-1617 which is a valid jira issue.

In response to this:

Ensures that the SNO cluster can be restored after:

  • VM was suspended and resumed or
  • machine was shut down and powered on again

when the following period has passed:

  • 30 days
  • 90 days
  • 180 days
  • 360 days

Current status:

  • 119 days seems to be the max period, as otherwise openshift-apiserver(?) can't connect to kube-apiserver:
    2024-01-23T12:48:28.078309356+00:00 stderr F E0123 12:48:28.078211 15 authentication.go:70] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z, verifying certificate SN=4550266744792405343, SKID=, AKID=BB:36:AC:22:3C:D3:10:A0:BA:08:9C:B4:DA:45:F3:F4:3D:45:B6:01 failed: x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z]"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vrutkovs
Copy link
Member Author

/pj-rehearse

@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@vrutkovs: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-shutdown-60d N/A periodic Periodic changed
periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-suspend-180d N/A periodic Periodic changed
periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-shutdown-360d N/A periodic Periodic changed
periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-suspend-60d N/A periodic Periodic changed
periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-suspend-360d N/A periodic Periodic changed
periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-shutdown-180d N/A periodic Periodic changed
periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-shutdown-90d N/A periodic Periodic changed
periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-suspend-90d N/A periodic Periodic changed
periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-shutdown-30d N/A periodic Periodic changed
periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-suspend-30d N/A periodic Periodic changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 10 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 20 rehearsals
Comment: /pj-rehearse max to run up to 35 rehearsals
Comment: /pj-rehearse auto-ack to run up to 10 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse abort to abort all active rehearsals

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 31, 2023
@vrutkovs
Copy link
Member Author

/pj-rehearse

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 31, 2023

@vrutkovs: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-shutdown-360d a146aec link unknown /pj-rehearse periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-shutdown-360d
ci/rehearse/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-suspend-360d a146aec link unknown /pj-rehearse periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-suspend-360d
ci/rehearse/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-suspend-180d a146aec link unknown /pj-rehearse periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-suspend-180d
ci/rehearse/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-shutdown-180d a146aec link unknown /pj-rehearse periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ovn-sno-cert-rotation-shutdown-180d

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@vrutkovs
Copy link
Member Author

/pj-rehearse ack

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Jul 31, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 1, 2023

@vrutkovs: This pull request references API-1617 which is a valid jira issue.

In response to this:

Ensures that the SNO cluster can be restored after:

  • VM was suspended and resumed or
  • machine was shut down and powered on again

when the following period has passed:

  • 30 days
  • 90 days
  • 180 days
  • 360 days

Current status:

  • 119 days seems to be the max period, as otherwise openshift-apiserver(?) can't connect to kube-apiserver:
    2024-01-23T12:48:28.078309356+00:00 stderr F E0123 12:48:28.078211 15 authentication.go:70] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z, verifying certificate SN=4550266744792405343, SKID=, AKID=BB:36:AC:22:3C:D3:10:A0:BA:08:9C:B4:DA:45:F3:F4:3D:45:B6:01 failed: x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z]"
  • cert-rotation controller generates a new signer cert:
    cabundle.go:64] Updated ca-bundle.crt configmap [#0]: "openshift-kube-apiserver-operator_aggregator-client-signer@1706381630" [] issuer="<self-signed>" (Jan 27 18:53:49 2024 to Jan 28 06:53:50 2024 (now=Jul 31 18:42:00 2023))/openshift-config-managed with:
    but apparently installer pod is not created to place it on the node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@dinhxuanvu dinhxuanvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 8, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 8, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dinhxuanvu, vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 5bbc787 into openshift:master Aug 8, 2023
26 of 30 checks passed
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 8, 2023

@vrutkovs: Updated the following 2 configmaps:

  • ci-operator-master-configs configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-release-master__nightly-4.14.yaml using file ci-operator/config/openshift/release/openshift-release-master__nightly-4.14.yaml
  • job-config-master-periodics configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-release-master-periodics.yaml using file ci-operator/jobs/openshift/release/openshift-release-master-periodics.yaml

In response to this:

Ensures that the SNO cluster can be restored after:

  • VM was suspended and resumed or
  • machine was shut down and powered on again

when the following period has passed:

  • 30 days
  • 90 days
  • 180 days
  • 360 days

Current status:

  • 119 days seems to be the max period, as otherwise openshift-apiserver(?) can't connect to kube-apiserver:
    2024-01-23T12:48:28.078309356+00:00 stderr F E0123 12:48:28.078211 15 authentication.go:70] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z, verifying certificate SN=4550266744792405343, SKID=, AKID=BB:36:AC:22:3C:D3:10:A0:BA:08:9C:B4:DA:45:F3:F4:3D:45:B6:01 failed: x509: certificate has expired or is not yet valid: current time 2024-01-23T12:48:28Z is after 2023-11-24T12:20:41Z]"
  • cert-rotation controller generates a new signer cert:
    cabundle.go:64] Updated ca-bundle.crt configmap [#0]: "openshift-kube-apiserver-operator_aggregator-client-signer@1706381630" [] issuer="<self-signed>" (Jan 27 18:53:49 2024 to Jan 28 06:53:50 2024 (now=Jul 31 18:42:00 2023))/openshift-config-managed with:
    but apparently installer pod is not created to place it on the node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged
Projects
None yet
4 participants