New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add node controller to HCCO Manager #1702
Add node controller to HCCO Manager #1702
Conversation
/hold |
cc @dagrayvid |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
9a0ef0e
to
37399a9
Compare
@enxebre, I don't think we can label nodes with the full / of the NodePools, since '/' characters aren't allowed for labels, for instance:
Unless I'm missing something, this doesn't account for that yet. Will we need 2 separate labels, one for namespace and one for name? |
37399a9
to
c7b8dd9
Compare
Only the name is needed to filter Nodes, updated. |
/hold cancel |
@enxebre for NTO we have 2 uses for these labels:
In this second case, I think we likely need the name and the namespace, to make sure NTO and the NodePool controller are distinguishing between NodePools of the same name from separate namespaces. Perhaps we could separate name and namespace by '_', as this is supported in label values but not allowed in object names? Or we could have two labels, one for name and one for namespace of the NodePool. /cc @csrwng as this is related to the comment thread here and I would probably like to use the same convention if we end up going one of these routes. |
The namespace is not needed, the relation goes in the other direction, the NodePool controller is aware of the cp namespace where its config live for each NodePool (in this case the config auto generated by the NTO). |
Ack, that makes sense. As long as it is not supported to have two NodePools of the same name in different namespaces for the same hosted cluster, we should be fine with just the name |
/test e2e-aws-nested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @enxebre. Some comments.
) | ||
|
||
const ( | ||
nodePoolAnnotation = "hypershift.openshift.io/nodePool" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/nodePoolAnnotation/nodePoolLabel/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be part of the API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's both annotation and label. I updated to move the label declaration into API.
var apiErr *apierrors.StatusError | ||
nodePoolName, err := r.nodeToNodePoolName(node) | ||
if err != nil { | ||
if errors.As(err, &apiErr) && !apierrors.IsNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/IsNotFound(err)/IsNotFound(apiErr)/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is right as it is, isn't it? I want to pass the err I got through apierrors.IsNotFound check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the check would only work on StatusError no? (otherwise I don't see the point of checking that err contains a StatusError)
// annotations are not in place yet, so we'll reconcile triggered by the event which sets them in the Node. | ||
return ctrl.Result{}, err | ||
} else { | ||
log.Error(err, "failed to get nodePool name from Node") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add node name to error args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should come with logger as it's part of the request.
error: true, | ||
}, | ||
{ | ||
name: "When Machine does not exist it should fail", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that this describes the case above, this case looks like "When node has no annotations"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
}, | ||
}, | ||
} | ||
machineWithOutNodePoolAnnotation := &capiv1.Machine{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be used in any of the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
test/e2e/create_cluster_test.go
Outdated
t.Fatalf("failed to list nodes in guest cluster: %v", err) | ||
} | ||
for _, node := range nodes.Items { | ||
g.Expect(node.Labels["hypershift.openshift.io/nodePool"]).NotTo(BeEmpty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use constant (esp if you move it to the API)
What: This introduces a feature to reconcile Nodes with an opinionated Hypershift label pointing to the owning NodePool. Why: This is useful to list and filter Nodes by NodePool. This is useful for all operators to have a single and consistent bidirectional way for NodePool<->Node without having to reimplement the same logic, vendoring clients and consuming compute resources. E.g NTO tuneD. How: Change the NodePool controller to propagate the NodePool annotation down to Machines. Introduce a Node controller within the HCCO manager: - Watches Nodes. - Find the Machine by using the appropriate CAPI annotations. - Find the NodePool annotation. - Apply the annotation as a Label into the Node. In future we might extend this to support label propagation into Nodes via NodePool API.
c7b8dd9
to
6e484d5
Compare
✅ Deploy Preview for hypershift-docs ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
/lgtm Looks like we're hitting AWS limits:
|
@enxebre: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
What this PR does / why we need it:
What:
This introduces a feature to reconcile Nodes with an opinionated Hypershift label pointing to the owning NodePool.
Why:
This is useful to list and filter Nodes by NodePool. This is useful for all operators to have a single and consistent bidirectional way for NodePool<->Node without having to reimplement the same logic, vendoring clients and consuming compute resources. E.g NTO tuneD.
How:
Change the NodePool controller to propagate the NodePool annotation down to Machines.
Introduce a Node controller within the HCCO manager:
In future we might extend this to support label propagation into Nodes via NodePool API.
Which issue(s) this PR fixes (optional, use
fixes #<issue_number>(, fixes #<issue_number>, ...)
format, where issue_number might be a GitHub issue, or a Jira story:ref https://issues.redhat.com/browse/HOSTEDCP-545
Checklist