Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check provisioner ready status when applying reconcile logic #574

Merged
merged 1 commit into from
Jul 30, 2020

Conversation

andfasano
Copy link
Member

This PR aims to improve the reconcile loop by checking the status of the Provisioner dependencies before trying to reconcile the state: if the provisioner is not ready, the current request is requeued.
In particular, it checks the availability of the Ironic and Ironic Inspector services (thanks to @stbenjam for specific check logic, extracted from https://github.com/openshift-metal3/terraform-provider-ironic/blob/master/ironic/provider.go).
This approach should help especially during the bootstrap of those deployment scenarios where the Ironic services are launched at the same time of the operator - where the reconcile loop could start to operate before Ironic services were effectively up and running.

@metal3-io-bot metal3-io-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 6, 2020
@metal3-io-bot
Copy link
Contributor

Hi @andfasano. Thanks for your PR.

I'm waiting for a metal3-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metal3-io-bot metal3-io-bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 6, 2020
@dhellmann
Copy link
Member

/ok-to-test

@metal3-io-bot metal3-io-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 6, 2020
@dhellmann
Copy link
Member

/test-integration

pkg/provisioner/demo/demo.go Outdated Show resolved Hide resolved
pkg/controller/baremetalhost/baremetalhost_controller.go Outdated Show resolved Hide resolved
pkg/controller/baremetalhost/baremetalhost_controller.go Outdated Show resolved Hide resolved
pkg/provisioner/ironic/ironic.go Outdated Show resolved Hide resolved
pkg/provisioner/ironic/ironic.go Outdated Show resolved Hide resolved
@metal3-io-bot metal3-io-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 7, 2020
@andfasano
Copy link
Member Author

/test-integration

Copy link
Member

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still feels like a lot of extra code to be able to make 2 HTTP calls. I know you mentioned doing it in parallel for performance, but I'm not sure the baremetal-operator is ever going to be the bottleneck in the system. Ironic can take ages to actually provision a host, for example. Maybe we can talk through the approach on a call Wednesday?

pkg/controller/baremetalhost/baremetalhost_controller.go Outdated Show resolved Hide resolved
pkg/provisioner/demo/demo.go Outdated Show resolved Hide resolved
pkg/provisioner/ironic/ironicdependencies.go Outdated Show resolved Hide resolved
pkg/provisioner/provisionerdependencies.go Outdated Show resolved Hide resolved
}

m.server = httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
m.requests += r.Host + r.RequestURI + ";"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel like very idiomatic go to me - can we use slices instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was done by design in reality to improve test readability / handling: with this testing technique events are accumulated in a string (using a separator), so that it would be easier to setup the expected string (and to read in case of failure, also the sequence is naturally represented). It's an approach that works well especially when there a few events.

{
name: "IsReady",
ironic: newMockServer(6385).addDrivers(),
inspector: newMockServer(5050),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the ports hard coded so much that we have to do this? I'd feel better if we let go choose unused system ports which httptest can do.

If I'm running tests on my local dev box, I may very well have my own Ironic already running on these ports.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an impediment here:

func init() {
// NOTE(dhellmann): Use Fprintf() to report errors instead of
// logging, because logging is not configured yet in init().
deployKernelURL = os.Getenv("DEPLOY_KERNEL_URL")
if deployKernelURL == "" {
fmt.Fprintf(os.Stderr, "Cannot start: No DEPLOY_KERNEL_URL variable set\n")
os.Exit(1)
}
deployRamdiskURL = os.Getenv("DEPLOY_RAMDISK_URL")
if deployRamdiskURL == "" {
fmt.Fprintf(os.Stderr, "Cannot start: No DEPLOY_RAMDISK_URL variable set\n")
os.Exit(1)
}
ironicEndpoint = os.Getenv("IRONIC_ENDPOINT")
if ironicEndpoint == "" {
fmt.Fprintf(os.Stderr, "Cannot start: No IRONIC_ENDPOINT variable set\n")
os.Exit(1)
}
inspectorEndpoint = os.Getenv("IRONIC_INSPECTOR_ENDPOINT")
if inspectorEndpoint == "" {
fmt.Fprintf(os.Stderr, "Cannot start: No IRONIC_INSPECTOR_ENDPOINT variable set")
os.Exit(1)
}
}
. The init() method gets triggered before the test, and thus the related (not accessible) package variables fetching the env vars are already set once the newProvisioner call is made (and I think it's the reason why those env vars are set in the Makefile and also in hack/unit.sh). It could help to refactor the configuration initialization, maybe with a lazy setup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really convinced that a unit test should be making network calls at all.

pkg/provisioner/provisionerdependencies.go Outdated Show resolved Hide resolved
@metal3-io-bot metal3-io-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 8, 2020
@andfasano andfasano requested a review from dhellmann July 8, 2020 16:28
@metal3-io-bot metal3-io-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 10, 2020
@andfasano
Copy link
Member Author

/test-integration

pkg/provisioner/ironic/dependencies.go Outdated Show resolved Hide resolved
pkg/provisioner/ironic/dependencies.go Outdated Show resolved Hide resolved
pkg/provisioner/ironic/ironic_test.go Show resolved Hide resolved
pkg/provisioner/ironic/dependencies.go Outdated Show resolved Hide resolved
pkg/provisioner/ironic/dependencies.go Outdated Show resolved Hide resolved
@andfasano
Copy link
Member Author

/test unit

provStatus, err := prov.IsReady()
if err != nil {
return reconcile.Result{}, errors.Wrap(err,
fmt.Sprintf("failed to check services availability"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could just be an inline string, since there are no additional variables in the Sprintf(). That can be fixed in a follow-up PR unless there are other changes needed for this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@metal3-io-bot metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 14, 2020
Copy link
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@@ -17,7 +17,7 @@ var provisionRequeueDelay = time.Second * 10

// Provisioner implements the provisioning.Provisioner interface
// and uses Ironic to manage the host.
type fixtureProvisioner struct {
type Provisioner struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of renaming this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To allow setting the ready flag from the test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type is private and only accessible via an interface that doesn't have the SetIsReady() function, so renaming it makes it public but still requires a type coercion in the test. What if we move it up into pkg/provisioner instead? That would let us keep it private, and the tests could instantiate one directly to avoid the type coercion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not a problem to move it then I think it sounds fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the struct, see comment below


// IsReady checks if the provisioning backend is available to accept
// all the incoming requests.
IsReady() (result Result, err error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sceptical about returning a provisioner.Result here. It's even less obvious than usual what the 'Dirty' flag means. A function named something like IsReady should return a bool.
I realise this is done so that we can return a RequeueAfter value, but I'm not sure that we really need that to be customisable by the provisioner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested the provisioner.Result in part to be consistent with the other methods, but I see your point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted back the IsReady() signature

@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andfasano, dhellmann, zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -28,22 +28,25 @@ type fixtureProvisioner struct {
publisher provisioner.EventPublisher
// state to manage the two-step adopt process
adopted bool
// status of the provisioner
ready bool
}

// New returns a new Ironic Provisioner
func New(host *metal3v1alpha1.BareMetalHost, bmcCreds bmc.Credentials, publisher provisioner.EventPublisher) (provisioner.Provisioner, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is public and you can add whatever arguments you like, so maybe rather than making the type public you could allow the caller to pass something here (e.g. number of times to return false before returning true from IsReady(), or a user-supplied function to call from IsReady() to get the result).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the New function must match the provisioner.Factory type so I don't think it's recommended to modify its signature. Alternatively I could add another factory method (ie NewExt or some other better name), to be used only from the test, with the required additional params, and keep the type private.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a new factory to allow injecting test parameters. With this approach, the fixture remains private and no need to move the file from its original location

@metal3-io-bot metal3-io-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 29, 2020
@dhellmann
Copy link
Member

/test-integration

This version looks OK to me. I'll leave it open for @zaneb to give the final approval since he also had some comments on earlier drafts.

@andfasano
Copy link
Member Author

/test-integration

@zaneb
Copy link
Member

zaneb commented Jul 30, 2020

I have some reservations about @stbenjam's comments, but we can always resolve that later; I think it's better not to hold up this work.
/lgtm

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2020
@metal3-io-bot metal3-io-bot merged commit d6eaf67 into metal3-io:master Jul 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants