New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for fetching node collected information #5030
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project, in which case you'll need to sign a Contributor License Agreement (CLA) at https://cla.developers.google.com/. If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check the information on your CLA or see this help article on setting the email on your git commits. Once you've done that, please reply here to let us know. If you signed the CLA as a corporation, please let us know the company's name. |
@thockin can you help me to find the relevant people for reviewing this? Thanks! |
(Note: these information should probably be consolidate either in info or spec) |
cc @rjnagal |
type NodeInfo struct { | ||
TypeMeta `json:",inline"` | ||
// The number of cores in this machine. | ||
NumCores int `json:"numCores"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different from Capacity
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cores and memory can be consolidated inside NodeSpec but I am planning to return more information (I have another pull here on cadvisor: google/cadvisor#560).
(And as far as I understood Capacity is currently not updated with the information from the host)
Anyway I think it's just a matter of deciding where to return this information (spec/info), I mentioned it in a previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry s/NodeSpec/ResourceList/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, use Capacity.
@smarterclayton @bgrant0607 is it normal that I keep failing the CLA check? (also on other projects, e.g. cadvisor) |
@bgrant0607 did we decide if node-provided info is going to be part of the API eg. node/pod stats go here too? The only other issue is to provide a distinction between cloud-provider capacity info and kubelet node info - which one to use when. We were toying with the idea of overriding capacity information we get from cloud provider with the node discovered value once kubelet returns it. |
@rjnagal +1 to override the cloud provider info with the kubelet discovered ones (or at least return both, if not confusing) |
@simon3z All Redhat employees currently fail the new CLA bot. Don't worry about it. Known issue. |
Yes, capacity reported by the node should override info from the cloud provider and other defaults. Observed usage should not go into the Node or Pod. If we expect capacity to not frequently change, I think NodeStatus is a logical place to put it. We should cache it in Kubelet rather than query cAdvisor for every request. Also note: #4562 |
@bgrant0607 with regard to google/cadvisor#560 do you think that MachineID and SystemUUID should be reported in the spec/capacity together with "cpu" and "memory"? If so, I'll remove NodeInfo from the Node struct.
I am not reporting any usage stat (in this patch), so it should be good to go, right? (or am I missing something)
I'll try to get this done as well, is it a must to get this merged? |
@simon3z I meant that I prefer using NodeStatus rather than a new API path+object (NodeInfo). As for NodeStatus vs. NodeSpec, it belongs in NodeStatus if it's populated via observation 100% of the time on all platforms. Only resources belong in capacity. Do you need MachineID and SystemUUID? cc @ddysher |
@rjnagal In case it wasn't clear, node-provided info does belong in NodeStatus. It doesn't have to be info originating from the cloudprovider only. |
I was using NodeInfo for consistency (and simplicity) of what kubelet returns.
Yes, that's my primary goal. So, let's see if this is ok:
@bgrant0607 is that ok? Or do you expect kubelet to return somehow capacity+NodeStatus (not the new NodeInfo)? I don't like this second approach because it will need the definition of a "nil" value for each NodeStatus entry so that it won't override values that kubelet can't fetch/return. |
@simon3z We're in the middle of changing Kubelet's APIs and how information is communicated to the apiserver. We're changing from a "pull" to a "push" model. PodInfo will go away, and Kubelet will just post PodStatus and NodeStatus. |
@@ -885,6 +894,9 @@ type Node struct { | |||
|
|||
// Status describes the current status of a Node | |||
Status NodeStatus `json:"status,omitempty"` | |||
|
|||
// Info are information collected on the Node | |||
Info NodeInfo `json:"info,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This violates the v1beta3 convention. NodeInfo cannot go here. It must go into NodeStatus.
If you want to go forward with NodeInfo on Kubelet for now rather than waiting for Kubelet to post back to apiserver, you'll need a different type to be embedded in NodeStatus. We should not be adding new Cpu and Memory capacity fields. Instead, we should move Capacity to NodeStatus in v1beta3. |
@bgrant0607 I basically almost re-implemented the pach(es) entirely following (I hope) your suggestions. Now I use NodeInfo only for the communication between kubelet and nodecontroller:
Then both Capacity and NodeIdentity (new) are properly updated in the Node Spec and NodeStatus. With regard to push vs pull. Are you planning to use watches? (Communication initiated by nodecontroller) Or will you allow kubelet to talk to apiserver? How long for this change to happen? I can contribute to the effort if you need help. |
@@ -788,6 +788,13 @@ type NodeSpec struct { | |||
ExternalID string `json:"externalID,omitempty"` | |||
} | |||
|
|||
type NodeIdentity struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't mind the missing comments/descriptions (here and below) it's still a work in progress.
@ddysher for DecodeInto:
|
Probably not but I must be careful when I switch to just use GetNodeInfo, the specific errors that I get (connection refused, etc..) must be properly handled in order to get the same behaviors of the old /healtz. I am a little worried of regressions there so I would handle that later if possible. |
@@ -221,6 +221,7 @@ func (s *NodeController) SyncNodeStatus() error { | |||
} | |||
nodes = s.DoChecks(nodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @ddysher this should have been removed (replaced by s.UpdateNodesStatus).
I'll take a closer look when you feel comfortable, looks like you are actively working on it right now :) |
56477ce
to
492a8ed
Compare
@bgrant0607 I updated the patches following your suggestions and it's much better but now it seems that NodeInfo/nodeInfo/NodeSystemInfo are a little bit confusing in their use/type. What do you think? If instead you're fine with it then I think we're ready to merge. Thanks! |
Why do we need an inline struct this is eaiser // NodeStatus is information about the current status of a node.
type NodeStatus struct {
// NodePhase is the current lifecycle phase of the node.
Phase NodePhase `json:"phase,omitempty"`
// Conditions is an array of current node conditions.
Conditions []NodeCondition `json:"conditions,omitempty"`
// Queried from cloud provider, if available.
Addresses []NodeAddress `json:"addresses,omitempty"`
// NodeInfo is the information collected on the node.
NodeInfo NodeInfo `json:"nodeInfo,omitempty"`
}
// NodeInfo is the information collected on the node.
type NodeInfo struct {
TypeMeta `json:",inline"`
// Capacity represents the available resources of a node
Capacity ResourceList `json:"capacity,omitempty"`
// MachineID is the machine-id reported by the node
MachineID string `json:"machineID"`
// SystemUUID is the system-uuid reported by the node
SystemUUID string `json:"systemUUID"`
} When you get information from kubelet, set all fields in Status.NodeInfo, as well as Spec.Capacity. If desirable, we should document Spec.Capacity as deprecated in favor of Status.NodeInfo.Capacity. |
@ddysher I want Capacity at the top level in NodeStatus. NodeInfo is a temporary nuisance that's a result of NodeStatus not being posted by Kubelet. The sooner you do that, the sooner we can clean this up. :-) |
We could also call the NodeStatus field SystemInfo rather than NodeInfo. |
@bgrant0607 I rebased but it seems that the coverall check is stuck (travis passed). Do you think we can merge this? |
Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
Travis passing is good enough, but you need to move/remove cpu frequency first. |
@bgrant0607 sorry! I mistakenly reintroduced it after rebasing :( |
LGTM |
Github says you need to rebase. |
Sorry, could you please squash the commits? |
And please comment on the PR after you do so, so I get notified. |
@bgrant0607 should be done (sorry I am between meetings and travels so I hope I didn't do any miskate) |
@bgrant0607: With this PR the API will have two capacity fields, one in NodeSpec and one in NodeStatus. Is this intentional? |
I'll be able to get to the kubelet post PR this week; was doing some clean up in kubelet earlier. |
@vishh No, there is only a single Capacity field. Look carefully at the struct type names. It's debatable whether Capacity should be moved from Spec to Status in v1beta3, though. LGTM for now. |
Add support for fetching node collected information
I bet this will confuse others as well. How about we rename struct NodeInfo to struct NodeInfoResult, like the PodStatusResult? |
I'd be fine with that, but I'd be much happier to just nuke it entirely. |
This patch adds the support for collecting node information (mainly cadvisor-reported details, e.g. at the moment NumCores and MemoryCapacity.
If this is acceptable (and as usual I am not attached to this specific implementation, so feel free to suggest even a completely different solution), then I will add more information.