Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baremetal: Add diagnostic error message for ironic terraform errors #3950

Merged
merged 1 commit into from Jul 31, 2020
Merged

Baremetal: Add diagnostic error message for ironic terraform errors #3950

merged 1 commit into from Jul 31, 2020

Conversation

kirankt
Copy link
Contributor

@kirankt kirankt commented Jul 23, 2020

baremetal: Add diagnostic error message for ironic terraform errors

This PR enables the installer to track certain ironic terraform provider error messages and translates it into easy-to-understand errors output by the installer.

Examples:
Terraform errors such as "Error: could not contact Ironic API: timeout reached" and "Error: could not inspect: could not inspect node, node is currently 'inspect failed', last error was 'timeout reached while inspecting the node" will each get translated to their corresponding user-friendly error messages.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 23, 2020
@kirankt
Copy link
Contributor Author

kirankt commented Jul 23, 2020

/label platform/baremetal
/assign @stbenjam

@openshift-ci-robot openshift-ci-robot added the platform/baremetal IPI bare metal hosts platform label Jul 23, 2020
pkg/terraform/diagnose.go Outdated Show resolved Hide resolved
Copy link
Contributor

@abhinavdahiya abhinavdahiya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe include an example of an actual failure in CI or locally as part of commit message

@@ -78,4 +78,9 @@ var conditions = []condition{{

reason: "GCPComputeBackendTimeout",
message: `GCP is experiencing backend service interuptions, the compute instance failed to create in reasonable time.`,
}, {
match: regexp.MustCompile(`Error: could not contact API: .*`),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex match is a little too vague to not pick up non ironic failies imo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It currently matches two error messages that look like these:

level=error msg="Error: could not contact API: timeout reached"
level=error
level=error msg="  on ../../../../tmp/openshift-install-431515935/masters/main.tf line 1, in resource \"ironic_node_v1\" \"openshift-master-host\":"
level=error msg="   1: resource \"ironic_node_v1\" \"openshift-master-host\" {"
level=error
level=error
level=error
level=error msg="Error: could not contact API: timeout reached"
level=error
level=error msg="  on ../../../../tmp/openshift-install-431515935/masters/main.tf line 1, in resource \"ironic_node_v1\" \"openshift-master-host\":"
level=error msg="   1: resource \"ironic_node_v1\" \"openshift-master-host\" {"
level=error
level=error
level=error
level=error msg="Error: could not contact API: context deadline exceeded"
level=error
level=error msg="  on ../../../../tmp/openshift-install-431515935/masters/main.tf line 1, in resource \"ironic_node_v1\" \"openshift-master-host\":"
level=error msg="   1: resource \"ironic_node_v1\" \"openshift-master-host\" {"
level=error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this is the maximum match. We can fix this in the terraform provider: openshift-metal3/terraform-provider-ironic#45

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's been merged and I released v0.2.3. You should be able to bump the go.mod

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Won't we still have two different errors for the Ironic API errors? One from context and the other one we print explicitly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added examples in the commit and PR messages.

@kirankt kirankt changed the title [WIP} Baremetal: Add diagnostic error message for ironic terraform errors [WIP] Baremetal: Add diagnostic error message for ironic terraform errors Jul 23, 2020
@kirankt kirankt changed the title [WIP] Baremetal: Add diagnostic error message for ironic terraform errors Baremetal: Add diagnostic error message for ironic terraform errors Jul 28, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 28, 2020
@stbenjam
Copy link
Member

/test e2e-metal-ipi

@abhinavdahiya
Copy link
Contributor

/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2020
@kirankt
Copy link
Contributor Author

kirankt commented Jul 30, 2020

/retest

1 similar comment
@kirankt
Copy link
Contributor Author

kirankt commented Jul 30, 2020

/retest

This PR enables the installer to track certain ironic terraform provider error messages and translates it into easy-to-understand errors output by the installer.

Examples:
Terraform errors such as "Error: could not contact Ironic API: timeout reached" and "Error: could not inspect: could not inspect node, node is currently 'inspect failed', last error was 'timeout reached while inspecting the node" will each get translated to their corresponding user-friendly error messages.
@kirankt
Copy link
Contributor Author

kirankt commented Jul 30, 2020

/test e2e-metal-ipi

@kirankt
Copy link
Contributor Author

kirankt commented Jul 30, 2020

/retest

@kirankt
Copy link
Contributor Author

kirankt commented Jul 30, 2020

/test e2e-metal-ipi

1 similar comment
@kirankt
Copy link
Contributor Author

kirankt commented Jul 30, 2020

/test e2e-metal-ipi

@stbenjam
Copy link
Member

/lgtm
/hold

This looks good to me. Remove the hold once e2e-metal-ipi passes.

@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Jul 30, 2020
@stbenjam
Copy link
Member

/test e2e-metal-ipi

@stbenjam
Copy link
Member

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 31, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

@kirankt: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-openstack 17be09f link /test e2e-openstack
ci/prow/e2e-aws-scaleup-rhel7 17be09f link /test e2e-aws-scaleup-rhel7
ci/prow/e2e-crc 17be09f link /test e2e-crc
ci/prow/e2e-aws-fips 17be09f link /test e2e-aws-fips

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit dafddad into openshift:master Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. platform/baremetal IPI bare metal hosts platform
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants