Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MGMT-12552: Day-2 agent stuck with status_info rebooting although the node is already part of the cluster #4610

Merged
merged 1 commit into from Nov 15, 2022

Conversation

eranco74
Copy link
Contributor

@eranco74 eranco74 commented Nov 13, 2022

The following applies to kube-API only:
The host UpdateInstallProgress will not update day-2 hosts to HostStatusAddedToExistingCluster if the host stage is Rebooting. The host UpdateInstallProgress will update day-2 hosts to HostStatusAddedToExistingCluster if the host stage is Done The agent controller will keep updateing day-2 agents if the agent stage is Rebooting, Configuring or Joined.

  • Should this PR be tested by the reviewer? no
  • Is this PR relying on CI for an e2e test run? no
  • Should this PR be tested in a specific environment? kube-api
  • Any logs, screenshots, etc that can help with the review process? no

-->

List all the issues related to this PR

https://issues.redhat.com/browse/MGMT-12552

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 13, 2022
@openshift-ci
Copy link

openshift-ci bot commented Nov 13, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 13, 2022
@openshift-ci
Copy link

openshift-ci bot commented Nov 13, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eranco74

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 13, 2022
… node is already part of the cluster

The following applies to kube-API only
The host UpdateInstallProgress will not update day-2 hosts to HostStatusAddedToExistingCluster if the host stage is Rebooting.
The host UpdateInstallProgress will update day-2 hosts to HostStatusAddedToExistingCluster if the host stage is Done
The agent controller will keep updateing day-2 agents if the agent stage is Rebooting, Configuring or Joined.
@openshift-ci openshift-ci bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 13, 2022
@eranco74 eranco74 marked this pull request as ready for review November 13, 2022 12:39
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 13, 2022
@codecov
Copy link

codecov bot commented Nov 13, 2022

Codecov Report

Merging #4610 (ecec245) into master (e676974) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4610      +/-   ##
==========================================
+ Coverage   67.07%   67.10%   +0.02%     
==========================================
  Files         199      200       +1     
  Lines       28062    28086      +24     
==========================================
+ Hits        18823    18847      +24     
- Misses       7556     7557       +1     
+ Partials     1683     1682       -1     
Impacted Files Coverage Δ
...nternal/controller/controllers/agent_controller.go 78.78% <100.00%> (-0.16%) ⬇️
internal/host/host.go 74.65% <100.00%> (+0.15%) ⬆️
internal/bminventory/inventory.go 69.37% <0.00%> (ø)
internal/operators/lvm/lvm_operator.go 78.04% <0.00%> (ø)
...grations/20221110174000_rename_kernel_arguments.go 83.33% <0.00%> (ø)
internal/oc/release.go 78.36% <0.00%> (+1.16%) ⬆️
internal/migrations/migrations.go 89.65% <0.00%> (+1.19%) ⬆️
internal/host/common.go 88.88% <0.00%> (+3.17%) ⬆️

@tsorya
Copy link
Contributor

tsorya commented Nov 13, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 13, 2022
@eranco74
Copy link
Contributor Author

/retest-required

@eranco74
Copy link
Contributor Author

/override edge-e2e-metal-assisted-capi

@openshift-ci
Copy link

openshift-ci bot commented Nov 14, 2022

@eranco74: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

  • edge-e2e-metal-assisted-capi

Only the following failed contexts/checkruns were expected:

  • ci/prow/edge-ci-index
  • ci/prow/edge-e2e-ai-operator-ztp
  • ci/prow/edge-e2e-metal-assisted
  • ci/prow/edge-e2e-metal-assisted-capi
  • ci/prow/edge-images
  • ci/prow/edge-lint
  • ci/prow/edge-subsystem-aws
  • ci/prow/edge-subsystem-kubeapi-aws
  • ci/prow/edge-unit-test
  • ci/prow/edge-verify-generated-code
  • ci/prow/images
  • ci/prow/mce-images
  • pull-ci-openshift-assisted-service-cloud_hotfix_releases-images
  • pull-ci-openshift-assisted-service-master-edge-ci-index
  • pull-ci-openshift-assisted-service-master-edge-e2e-ai-operator-ztp
  • pull-ci-openshift-assisted-service-master-edge-e2e-metal-assisted
  • pull-ci-openshift-assisted-service-master-edge-e2e-metal-assisted-capi
  • pull-ci-openshift-assisted-service-master-edge-images
  • pull-ci-openshift-assisted-service-master-edge-lint
  • pull-ci-openshift-assisted-service-master-edge-subsystem-aws
  • pull-ci-openshift-assisted-service-master-edge-subsystem-kubeapi-aws
  • pull-ci-openshift-assisted-service-master-edge-unit-test
  • pull-ci-openshift-assisted-service-master-edge-verify-generated-code
  • pull-ci-openshift-assisted-service-master-mce-images
  • tide

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/override edge-e2e-metal-assisted-capi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link

openshift-ci bot commented Nov 14, 2022

@eranco74: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/edge-e2e-metal-assisted-capi ecec245 link false /test edge-e2e-metal-assisted-capi

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@eranco74
Copy link
Contributor Author

operator-ztp seems flakey:

[2022-11-14 12:56:12] + oc apply -f /home/assisted-service/deploy/operator/ztp/generated/baremetalHost.yaml
[2022-11-14 12:56:14] secret/ostest-extraworker-0-bmc-secret created
[2022-11-14 12:56:14] Error from server (InternalError): error when creating "/home/assisted-service/deploy/operator/ztp/generated/baremetalHost.yaml": Internal error occurred: failed calling webhook "baremetalhost.metal3.io": failed to call webhook: Post "https://baremetal-operator-webhook-service.openshift-machine-api.svc:443/validate-metal3-io-v1alpha1-baremetalhost?timeout=10s": no endpoints available for service "baremetal-operator-webhook-service"

/retest-required

@openshift-merge-robot openshift-merge-robot merged commit 7eda5a1 into openshift:master Nov 15, 2022
danielerez pushed a commit to danielerez/assisted-service that referenced this pull request Oct 15, 2023
… node is already part of the cluster (openshift#4610)

The following applies to kube-API only
The host UpdateInstallProgress will not update day-2 hosts to HostStatusAddedToExistingCluster if the host stage is Rebooting.
The host UpdateInstallProgress will update day-2 hosts to HostStatusAddedToExistingCluster if the host stage is Done
The agent controller will keep updateing day-2 agents if the agent stage is Rebooting, Configuring or Joined.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants