Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MGMT-16814: Pass both IP family URLs to ironic agent #6047

Closed
wants to merge 4 commits into from

Conversation

carbonin
Copy link
Member

@carbonin carbonin commented Mar 4, 2024

When deploying a single stack spoke cluster from a dual-stack hub it's hard for us to determine what callback URLs we should send to the ironic agent. If we get the choice wrong the agent simply never registers and the ironic agent gets stuck trying to send data to ironic on the hub.

As of https://issues.redhat.com/browse/OCPBUGS-24579 the ironic agent can accept a comma-separated list of URLs and will select the correct one based on the host's networking situation at runtime.

This PR provides that comma-separated list of URLs to the ironic agent if the agent is of a version that includes the fix (currently OCP 4.16+).

As a side effect we also need to track the version of the ironic agent we're embedding. This is so that we don't provide a list of URLs to an agent that is only expecting a single one. This is simple enough for now, but may become more complicated as the fix is backported (we may need to track multiple releases for each Y-stream depending on how far back the fix goes). This also means that we need users to provide the version of overrides and the env-configurable defaults.

List all the issues related to this PR

https://issues.redhat.com/browse/MGMT-16814

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

I don't have the environment to test this, but QE is going to run my image through a test.

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see [CONTRIBUTING] guide)
  • This change does not require a documentation update (docstring, docs, README, etc) - docs updated
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

This is needed in order to determine if the agent image in use can
support particular features. Specifically, can it take multiple URLs in
the case of a dual-stack hub.
@carbonin carbonin requested a review from eranco74 March 4, 2024 20:24
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 4, 2024

@carbonin: This pull request references MGMT-16814 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.16.0" version, but no target version was set.

In response to this:

When deploying a single stack spoke cluster from a dual-stack hub it's hard for us to determine what callback URLs we should send to the ironic agent. If we get the choice wrong the agent simply never registers and the ironic agent gets stuck trying to send data to ironic on the hub.

As of https://issues.redhat.com/browse/OCPBUGS-24579 the ironic agent can accept a comma-separated list of URLs and will select the correct one based on the host's networking situation at runtime.

This PR provides that comma-separated list of URLs to the ironic agent if the agent is of a version that includes the fix (currently OCP 4.16+).

As a side effect we also need to track the version of the ironic agent we're embedding. This is so that we don't provide a list of URLs to an agent that is only expecting a single one. This is simple enough for now, but may become more complicated as the fix is backported (we may need to track multiple releases for each Y-stream depending on how far back the fix goes). This also means that we need users to provide the version of overrides and the env-configurable defaults.

List all the issues related to this PR

https://issues.redhat.com/browse/MGMT-16814

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

I don't have the environment to test this, but QE is going to run my image through a test.

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see [CONTRIBUTING] guide)
  • This change does not require a documentation update (docstring, docs, README, etc) - docs updated
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 4, 2024
@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 4, 2024
Copy link

openshift-ci bot commented Mar 4, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: carbonin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 4, 2024
Copy link

codecov bot commented Mar 4, 2024

Codecov Report

Attention: Patch coverage is 88.23529% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 68.39%. Comparing base (ee0189b) to head (98910e7).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6047      +/-   ##
==========================================
- Coverage   68.39%   68.39%   -0.01%     
==========================================
  Files         239      239              
  Lines       35454    35472      +18     
==========================================
+ Hits        24248    24260      +12     
- Misses       9100     9104       +4     
- Partials     2106     2108       +2     
Files Coverage Δ
...rnal/controller/controllers/infraenv_controller.go 64.13% <ø> (ø)
internal/ignition/ironic.go 100.00% <100.00%> (ø)
...ler/controllers/preprovisioningimage_controller.go 81.41% <87.50%> (+0.41%) ⬆️

... and 3 files with indirect coverage changes

@carbonin
Copy link
Member Author

carbonin commented Mar 4, 2024

I'm a bit concerned about maintaining this ...

If, for example, the ironic fix gets backported to both 4.15 and 4.14 we'd need to maintain 2 different versions for this mode to be available.

I wonder if it would be acceptable to just fix the existing case that should have handled the dual-stack hub + ipv6-only spoke (I'm only assuming there is a bug there because there is no must-gather).
My guess is that since we're using the machine networks to determine if the cluster is dual-stack we're running into an issue because machine networks wouldn't be set before nodes are discovered with baremetal spokes, right?

@eranco74 @avishayt what do you think?

@carbonin
Copy link
Member Author

carbonin commented Mar 4, 2024

/hold

See previous comment for reasoning.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 4, 2024
Copy link

openshift-ci bot commented Mar 4, 2024

@carbonin: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/edge-e2e-ai-operator-disconnected-capi 98910e7 link false /test edge-e2e-ai-operator-disconnected-capi
ci/prow/e2e-agent-compact-ipv4 98910e7 link false /test e2e-agent-compact-ipv4

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@carbonin
Copy link
Member Author

carbonin commented Mar 5, 2024

Closing this in favor of #6048.

@carbonin carbonin closed this Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants