New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MGMT-16721: handle hosts with no luks #5940
Conversation
@paul-maidment: This pull request references MGMT-16721 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@rccrdpccl please review |
4328699
to
206bffe
Compare
/retest |
206bffe
to
9d31b35
Compare
What does it mean for Luks to be present in an ignition file but for Clevis to be nil? Is that a valid situation? What is the expected behavior from the service in that situation? To ignore it? To report an error? Please elaborate |
Could you please also change description of the issue and commit according to the answer for the question above :)? |
/retest |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #5940 +/- ##
==========================================
+ Coverage 68.26% 68.32% +0.05%
==========================================
Files 236 236
Lines 34913 34922 +9
==========================================
+ Hits 23833 23859 +26
+ Misses 9006 8995 -11
+ Partials 2074 2068 -6
|
I just observed that both luks and clevis are pointers and can have a value of nil. |
I'm in favor of checking, but we perform a check so we can branch out our behavior. In case of A, do X. In case of B, do Y. In this PR it's not clear to me how you decided on X and Y, as I don't understand the circumstances that lead to A and B and the reasoning behind what is the correct behavior for the service in each of these circumstances Or in other words, why |
If you're not sure, even something like |
I'll go with this suggestion and will also add test cases for luks == nil and luks.clevis == nil, just to ensure we have the best coverage. Seems to me that just because a cluster is imported does not necessarily imply that there needs to be disk encryption, maybe we are simply covering the case where an imported cluster does not have disk encryption? Certainly the case of missing Missing So for now, I will go with the comment you suggest. I'll also add a warning to the log about this scenario. |
9d31b35
to
5921d6f
Compare
40efb88
to
f8a2ff2
Compare
@@ -38,6 +38,11 @@ func (c *tangConnectivityCheckCmd) getTangServersFromHostIgnition(host *models.H | |||
if err != nil { | |||
return nil, err | |||
} | |||
if luks == nil || luks.Clevis == nil { | |||
// TODO: not sure how this could happen or whether its possibly valid, for now pretend encryption is disabled just so we avoid a nil pointer dereference | |||
c.log.Warn("luks configuration is missing or incomplete in the host ignition, disk encryption will be assumed to be disabled.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
luks being nil is pretty common, it doesn't warrant a warning. We'll get a warning every time a cluster doesn't have encryption (at-least, that's what it I think would happen, not sure, need to look at ignitions from actual clusters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, removed the log entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment was about only luks
being nil
, which doesn't warrant a warning.
But luks
being non-nil and Clevis
nil does warrant a warning and a code comment, that's the case I've been discussing in all my comments so far, the one I don't understand and I'm not sure how we should react to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, updated to deal with that scenario.
ea7a108
to
6291ed4
Compare
@paul-maidment: This pull request references MGMT-16721 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
0a6659d
to
c4b2913
Compare
3e2431f
to
9880f3e
Compare
/unhold |
Not really, we're discussing validations for adding hosts to an existing cluster here. This entire ordeal of looking at the ignition to get tang details is just so we can perform a validation that the tang config specified in the target cluster actually works on this new day-2 host - we know how to recognize and validate tang, so we look into the clevis struct to get tang servers (when those are present), so we can send those details to our agent so it can attempt to connect to the tang servers, followed by the agent reporting about what it experienced doing that, so that our validation can ensure tang works. If you see the cluster has encryption (i.e. has the luks struct) but it's not using using the The reasoning of skipping instead of giving an error is that the validation was best-effort in the first place - we're doing it as a "favor" to the user. We can validate tang so we do. If the user uses some other scheme we don't recognize, we should still let them add the host to the cluster, and if they run into issues, that's unfortunate, but it's none of our business, because it just might work. We can't assume it won't because we don't know what it is Also regardless of whether you simply |
/hold Holding waiting for clarification on this comment: Is this changing current behaviour, other than fixing the nil pointer case when luks not present? |
Regarding the validation of disk encryption requirements. It seems that firstly we fetch host ignition and validate that luks is not nil and contains at least one entry. https://github.com/openshift/assisted-service/blob/master/internal/host/validator.go#L422-L425
the comment With this aside, we can see how a Day2 host without Clevis is treated. This is treated as though there is no disk encryption. |
By "day 1 cluster" this comment actually means "target cluster" |
Should we change the above comment to make this a lot clearer? |
Yes, it's the same. The comment however is making the false statement that because clavis is nil then there's no encryption. Which is a wrong thing to say as it's too general, there's no tang servers to validate, doesn't mean there's no encryption. But its behavior is still correct since its goal is to validate tang, and clavis being nil does mean there's no tang servers to validate |
Sure |
9880f3e
to
0a87210
Compare
In short...zero behavioral change. Our existing validations expect these fields to be set otherwise we assume that disk encryption is disabled. |
0a87210
to
6b6be5e
Compare
@paul-maidment: This pull request references MGMT-16721 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
In line 41 of internal/host/hostcommands/tang_connectivity_check_cmd.go Both `luks` and `luks.Clevis` are pointers which can be nil, there was no checking in place to ensure that we avoid a nil pointer dereference This PR fixes that by making appropriate checks to ensure that neither of these pointers are nil.
6b6be5e
to
9a82bd4
Compare
Thanks for clarifying @paul-maidment |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: paul-maidment, rccrdpccl The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
@paul-maidment: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
[ART PR BUILD NOTIFIER] This PR has been included in build ose-agent-installer-api-server-container-v4.16.0-202402072112.p0.gbc922ba.assembly.stream.el8 for distgit ose-agent-installer-api-server. |
In line 41 of internal/host/hostcommands/tang_connectivity_check_cmd.go
Both
luks
andluks.Clevis
are pointers which can be nil, there was no checking in place to ensure that we avoid a nil pointer dereferenceThis PR fixes that by making appropriate checks to ensure that neither of these pointers are nil.
List all the issues related to this PR
What environments does this code impact?
How was this code tested?
Checklist
docs
, README, etc)Reviewers Checklist