New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Baremetal: Add diagnostic error message for ironic terraform errors #3950
Baremetal: Add diagnostic error message for ironic terraform errors #3950
Conversation
/label platform/baremetal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also maybe include an example of an actual failure in CI or locally as part of commit message
pkg/terraform/diagnose.go
Outdated
@@ -78,4 +78,9 @@ var conditions = []condition{{ | |||
|
|||
reason: "GCPComputeBackendTimeout", | |||
message: `GCP is experiencing backend service interuptions, the compute instance failed to create in reasonable time.`, | |||
}, { | |||
match: regexp.MustCompile(`Error: could not contact API: .*`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex match is a little too vague to not pick up non ironic failies imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It currently matches two error messages that look like these:
level=error msg="Error: could not contact API: timeout reached"
level=error
level=error msg=" on ../../../../tmp/openshift-install-431515935/masters/main.tf line 1, in resource \"ironic_node_v1\" \"openshift-master-host\":"
level=error msg=" 1: resource \"ironic_node_v1\" \"openshift-master-host\" {"
level=error
level=error
level=error
level=error msg="Error: could not contact API: timeout reached"
level=error
level=error msg=" on ../../../../tmp/openshift-install-431515935/masters/main.tf line 1, in resource \"ironic_node_v1\" \"openshift-master-host\":"
level=error msg=" 1: resource \"ironic_node_v1\" \"openshift-master-host\" {"
level=error
level=error
level=error
level=error msg="Error: could not contact API: context deadline exceeded"
level=error
level=error msg=" on ../../../../tmp/openshift-install-431515935/masters/main.tf line 1, in resource \"ironic_node_v1\" \"openshift-master-host\":"
level=error msg=" 1: resource \"ironic_node_v1\" \"openshift-master-host\" {"
level=error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this is the maximum match. We can fix this in the terraform provider: openshift-metal3/terraform-provider-ironic#45
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's been merged and I released v0.2.3. You should be able to bump the go.mod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Won't we still have two different errors for the Ironic API errors? One from context and the other one we print explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added examples in the commit and PR messages.
/test e2e-metal-ipi |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
1 similar comment
/retest |
This PR enables the installer to track certain ironic terraform provider error messages and translates it into easy-to-understand errors output by the installer. Examples: Terraform errors such as "Error: could not contact Ironic API: timeout reached" and "Error: could not inspect: could not inspect node, node is currently 'inspect failed', last error was 'timeout reached while inspecting the node" will each get translated to their corresponding user-friendly error messages.
/test e2e-metal-ipi |
/retest |
/test e2e-metal-ipi |
1 similar comment
/test e2e-metal-ipi |
/lgtm This looks good to me. Remove the hold once e2e-metal-ipi passes. |
/test e2e-metal-ipi |
/hold cancel |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
@kirankt: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
baremetal: Add diagnostic error message for ironic terraform errors
This PR enables the installer to track certain ironic terraform provider error messages and translates it into easy-to-understand errors output by the installer.
Examples:
Terraform errors such as "Error: could not contact Ironic API: timeout reached" and "Error: could not inspect: could not inspect node, node is currently 'inspect failed', last error was 'timeout reached while inspecting the node" will each get translated to their corresponding user-friendly error messages.