Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate Ironic port when node exists and has no port allocated. #817

Merged
merged 2 commits into from May 10, 2021

Conversation

s3rj1k
Copy link
Member

@s3rj1k s3rj1k commented Mar 16, 2021

This PR fixes issues below:

  • While node is created in Ironic DB and some misfortune happens to Ironic pod or DB pod, Ironic does not allocate port created node, this in turn breaks inspection with Ironic erroring on absence of allocated port, operator does not handle this case and stuck in erroring reconcile loop until back-off timer is triggered.

When node has no allocated port to it, Ironic returns this error

No lookup attributes were found, inspector won't be able to find it after introspection, consider creating ironic ports or providing an IPMI address

IPMI address part is just wrong, BMC creds are checked in Registering state and are validated, also they where validated manually just in case.

After checking sources that return this error in: https://github.com/openstack/ironic-inspector/blob/stable/victoria/ironic_inspector/introspect.py#L102-L119
We can clearly see that this error is indeed because of not allocated port to node (no MAC), this also is verified by this unix pipe magic for specified host.

kubectl get bmh worker-1 -o jsonpath='{.spec.bootMACAddress}' | xargs -rI{} --verbose openstack baremetal port show --address '{}' -f value --fields node_uuid | xargs -rI{} --verbose openstack baremetal node show {} -f value --fields name

@s3rj1k
Copy link
Member Author

s3rj1k commented Mar 16, 2021

/assign @stbenjam

@metal3-io-bot
Copy link
Contributor

Hi @s3rj1k. Thanks for your PR.

I'm waiting for a metal3-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metal3-io-bot metal3-io-bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 16, 2021
@s3rj1k
Copy link
Member Author

s3rj1k commented Mar 16, 2021

cc: @dhellmann Can you please also look at this, thanks.

@dtantsur
Copy link
Member

I was under impression that we always create a port before inspection, otherwise e.g. Redfish inspection would not work.

@s3rj1k
Copy link
Member Author

s3rj1k commented Mar 17, 2021

@dtantsur We allocate port only when node was not created before, for existing node we expect that port was created.
This PR fixes this exact edge case when node was created but for some unknown reason port was not allocated for that node.

@dtantsur
Copy link
Member

/ok-to-test
/test-integration

@metal3-io-bot metal3-io-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 17, 2021
@s3rj1k
Copy link
Member Author

s3rj1k commented Mar 18, 2021

/test-integration

@s3rj1k
Copy link
Member Author

s3rj1k commented Mar 18, 2021

@dtantsur All tests are green :)

@s3rj1k
Copy link
Member Author

s3rj1k commented Mar 19, 2021

@stbenjam @dhellmann @dtantsur Are we safe to merge?

@zaneb
Copy link
Member

zaneb commented Mar 24, 2021

@s3rj1k can you describe what the actual change is in the first patch? Git makes the diff really hard to interpret but after going through it as closely as I can using a visual diff tool, I was unable it find any change in logic. It looks like a straight refactor (although arguably an improvement in readability). The commit message only says that something was broken and is now fixed but I can't figure out what. What's the actual code path that is different?

@s3rj1k
Copy link
Member Author

s3rj1k commented Mar 24, 2021

@zaneb Yea, its a diff mess

So basically I am introducing var introspectionStatusNotFound bool flag and refactoring an if case to make it more readable

doing this

	status, err := introspection.GetIntrospectionStatus(p.inspector, ironicNode.UUID).Extract()
	if err != nil {
		if _, isNotFound := err.(gophercloud.ErrDefault404); isNotFound {
			introspectionStatusNotFound = true
			p.log.Info("inspection result not found")
		} else {
			result, err = transientError(errors.Wrap(err, "failed to extract hardware inspection status"))
			return
		}
	}

	if introspectionStatusNotFound { ... }

Old case code is mostly unchanged in { ... }, this fixes an issue with forced introspection, before, because of complicated flow logic it was hard to spot.

if nodes.ProvisionState(ironicNode.ProvisionState) == nodes.InspectFail && !force {

in before PR never gets executed

@s3rj1k
Copy link
Member Author

s3rj1k commented Mar 29, 2021

@zaneb Did you get a chance to look at this again?

@zaneb
Copy link
Member

zaneb commented Mar 30, 2021

I'm not following how that code fails to get executed:
If GetIntrospectionStatus() on line 653 returns a 404 Not Found then
err will be non-nil on line 654
isNotFound will be true on line 655
If the node's provision state is InspectFail, we'll hit the default case of the switch on line 661
The block starting at line 662 will execute or not depending on whether the force flag is set.

AFAICT the logic is the same after the patch, only the line numbers change:
If GetIntrospectionStatus() on line 655 returns a 404 Not Found then
isNotFound will be true on line 657
introspectionStatusNotFound will be true on line 666
If the node's provision state is InspectFail, we'll hit the default case of the switch on line 672
The block starting at line 673 will execute or not depending on whether the force flag is set.

There's no difference.

On closer inspection (no pun intended) there does appear to be a bug here though: I'm pretty sure I meant to put a return statement after err = nil on line 669. There'd be no point setting it to nil when we're going to fall through and overwrite it anyway, so that seems like a mistake on my part. Your patch doesn't change that though.

@s3rj1k
Copy link
Member Author

s3rj1k commented Mar 30, 2021

@zaneb I could split it out into two different PRs, I noticed some work around Introspection in #607 so after this PR seems like it would be needed to recheck first part of my PR.

So that do you think? Leaving only a second part of a patch sounds good?

@metal3-io-bot metal3-io-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 30, 2021
@s3rj1k
Copy link
Member Author

s3rj1k commented Mar 30, 2021

@zaneb, so I've removed first part of PR, currently it only deals with Ironic port creation, hoping this would help in getting this merged.

@s3rj1k
Copy link
Member Author

s3rj1k commented Mar 30, 2021

/test-integration

@s3rj1k
Copy link
Member Author

s3rj1k commented Mar 30, 2021

@dtantsur Can you please take a look at this PR, hopefully approve it :)

@zaneb
Copy link
Member

zaneb commented Mar 30, 2021

I'm pretty sure I meant to put a return statement after err = nil on line 669.

I opened #840 to fix this.

So that do you think? Leaving only a second part of a patch sounds good?

Sounds good to me, but this leaves only the part I don't understand well enough to review.

@s3rj1k
Copy link
Member Author

s3rj1k commented Mar 30, 2021

Sounds good to me, but this leaves only the part I don't understand well enough to review.

I've highlighted @dtantsur on rebased PR, I think he could help with Ironic internals

@s3rj1k s3rj1k force-pushed the IRONIC_PORT branch 2 times, most recently from 4537026 to fd3b303 Compare April 12, 2021 12:42
@metal3-io-bot metal3-io-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 12, 2021
@s3rj1k
Copy link
Member Author

s3rj1k commented Apr 12, 2021

/test-integration

@s3rj1k
Copy link
Member Author

s3rj1k commented Apr 12, 2021

@zaneb Can you please take another look, PR is rebased

@s3rj1k
Copy link
Member Author

s3rj1k commented Apr 15, 2021

@zaneb @stbenjam kindly remainder, PR waits review, thanks.

@s3rj1k
Copy link
Member Author

s3rj1k commented Apr 21, 2021

/cc @dtantsur
/cc @stbenjam

@s3rj1k
Copy link
Member Author

s3rj1k commented Apr 22, 2021

/retest

Copy link
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct, but expensive.

pkg/provisioner/ironic/ironic.go Outdated Show resolved Hide resolved
pkg/provisioner/ironic/ironic.go Show resolved Hide resolved
@s3rj1k
Copy link
Member Author

s3rj1k commented Apr 26, 2021

/test-integration

@s3rj1k s3rj1k requested a review from zaneb April 26, 2021 10:21
@s3rj1k
Copy link
Member Author

s3rj1k commented Apr 30, 2021

/recheck

Copy link
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remain worried about the extra overhead of this on every reconcile, but it doesn't seem avoidable.
/approve

pkg/provisioner/ironic/ironic.go Outdated Show resolved Hide resolved
pkg/provisioner/ironic/ironic.go Outdated Show resolved Hide resolved
pkg/provisioner/ironic/ironic.go Show resolved Hide resolved
pkg/provisioner/ironic/ironic.go Outdated Show resolved Hide resolved
pkg/provisioner/ironic/ironic.go Show resolved Hide resolved
Signed-off-by: s3rj1k <evasive.gyron@gmail.com>
@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: s3rj1k, zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 3, 2021
@s3rj1k
Copy link
Member Author

s3rj1k commented May 3, 2021

/test-integration

@s3rj1k s3rj1k requested a review from zaneb May 3, 2021 16:49
Signed-off-by: s3rj1k <evasive.gyron@gmail.com>
@s3rj1k
Copy link
Member Author

s3rj1k commented May 3, 2021

/test-integration

@dtantsur
Copy link
Member

/lgtm

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label May 10, 2021
@metal3-io-bot metal3-io-bot merged commit a3c360b into metal3-io:master May 10, 2021
@s3rj1k s3rj1k deleted the IRONIC_PORT branch May 10, 2021 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants