Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for fetching node collected information #5030

Merged
merged 3 commits into from Mar 10, 2015

Conversation

simon3z
Copy link
Contributor

@simon3z simon3z commented Mar 4, 2015

This patch adds the support for collecting node information (mainly cadvisor-reported details, e.g. at the moment NumCores and MemoryCapacity.

If this is acceptable (and as usual I am not attached to this specific implementation, so feel free to suggest even a completely different solution), then I will add more information.

@googlebot
Copy link

Thanks for your pull request.

It looks like this may be your first contribution to a Google open source project, in which case you'll need to sign a Contributor License Agreement (CLA) at https://cla.developers.google.com/.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check the information on your CLA or see this help article on setting the email on your git commits.

Once you've done that, please reply here to let us know. If you signed the CLA as a corporation, please let us know the company's name.

@simon3z
Copy link
Contributor Author

simon3z commented Mar 4, 2015

@thockin can you help me to find the relevant people for reviewing this? Thanks!

@simon3z
Copy link
Contributor Author

simon3z commented Mar 4, 2015

(Note: these information should probably be consolidate either in info or spec)

@vishh
Copy link
Contributor

vishh commented Mar 4, 2015

cc @rjnagal

type NodeInfo struct {
TypeMeta `json:",inline"`
// The number of cores in this machine.
NumCores int `json:"numCores"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from Capacity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cores and memory can be consolidated inside NodeSpec but I am planning to return more information (I have another pull here on cadvisor: google/cadvisor#560).

(And as far as I understood Capacity is currently not updated with the information from the host)

Anyway I think it's just a matter of deciding where to return this information (spec/info), I mentioned it in a previous comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry s/NodeSpec/ResourceList/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, use Capacity.

@bgrant0607 bgrant0607 self-assigned this Mar 4, 2015
@simon3z
Copy link
Contributor Author

simon3z commented Mar 4, 2015

@smarterclayton @bgrant0607 is it normal that I keep failing the CLA check? (also on other projects, e.g. cadvisor)

@rjnagal
Copy link
Contributor

rjnagal commented Mar 4, 2015

@bgrant0607 did we decide if node-provided info is going to be part of the API eg. node/pod stats go here too? The only other issue is to provide a distinction between cloud-provider capacity info and kubelet node info - which one to use when. We were toying with the idea of overriding capacity information we get from cloud provider with the node discovered value once kubelet returns it.

@simon3z
Copy link
Contributor Author

simon3z commented Mar 4, 2015

@rjnagal +1 to override the cloud provider info with the kubelet discovered ones (or at least return both, if not confusing)

@bgrant0607
Copy link
Member

@simon3z All Redhat employees currently fail the new CLA bot. Don't worry about it. Known issue.

@bgrant0607
Copy link
Member

@rjnagal @vishh @simon3z

Yes, capacity reported by the node should override info from the cloud provider and other defaults.

Observed usage should not go into the Node or Pod.

If we expect capacity to not frequently change, I think NodeStatus is a logical place to put it. We should cache it in Kubelet rather than query cAdvisor for every request.

Also note: #4562

@simon3z
Copy link
Contributor Author

simon3z commented Mar 4, 2015

Yes, capacity reported by the node should override info from the cloud provider and other defaults.

@bgrant0607 with regard to google/cadvisor#560 do you think that MachineID and SystemUUID should be reported in the spec/capacity together with "cpu" and "memory"?

If so, I'll remove NodeInfo from the Node struct.

Observed usage should not go into the Node or Pod.

I am not reporting any usage stat (in this patch), so it should be good to go, right? (or am I missing something)

If we expect capacity to not frequently change, I think NodeStatus is a logical place to put it. We should cache it in Kubelet rather than query cAdvisor for every request.

I'll try to get this done as well, is it a must to get this merged?

@bgrant0607
Copy link
Member

@simon3z I meant that I prefer using NodeStatus rather than a new API path+object (NodeInfo). As for NodeStatus vs. NodeSpec, it belongs in NodeStatus if it's populated via observation 100% of the time on all platforms.

Only resources belong in capacity.

Do you need MachineID and SystemUUID?

cc @ddysher

@bgrant0607
Copy link
Member

@rjnagal In case it wasn't clear, node-provided info does belong in NodeStatus. It doesn't have to be info originating from the cloudprovider only.

@simon3z
Copy link
Contributor Author

simon3z commented Mar 4, 2015

@simon3z I meant that I prefer using NodeStatus rather than a new API path+object (NodeInfo)

I was using NodeInfo for consistency (and simplicity) of what kubelet returns.

Do you need MachineID and SystemUUID?

Yes, that's my primary goal.

So, let's see if this is ok:

  • nodecontroller would get a NodeInfo from kubelet (/api/v1beta1/nodeInfo)
  • nodecontroller would use the collected NodeInfo to populate "cpu", "memory" (and in the future "cpu_mhz", etc.) in capacity, and machineID and systemUUID in NodeStatus

@bgrant0607 is that ok?

Or do you expect kubelet to return somehow capacity+NodeStatus (not the new NodeInfo)?

I don't like this second approach because it will need the definition of a "nil" value for each NodeStatus entry so that it won't override values that kubelet can't fetch/return.

@bgrant0607
Copy link
Member

@simon3z We're in the middle of changing Kubelet's APIs and how information is communicated to the apiserver. We're changing from a "pull" to a "push" model. PodInfo will go away, and Kubelet will just post PodStatus and NodeStatus.

@@ -885,6 +894,9 @@ type Node struct {

// Status describes the current status of a Node
Status NodeStatus `json:"status,omitempty"`

// Info are information collected on the Node
Info NodeInfo `json:"info,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This violates the v1beta3 convention. NodeInfo cannot go here. It must go into NodeStatus.

@bgrant0607
Copy link
Member

If you want to go forward with NodeInfo on Kubelet for now rather than waiting for Kubelet to post back to apiserver, you'll need a different type to be embedded in NodeStatus. We should not be adding new Cpu and Memory capacity fields. Instead, we should move Capacity to NodeStatus in v1beta3.

@simon3z
Copy link
Contributor Author

simon3z commented Mar 5, 2015

@bgrant0607 I basically almost re-implemented the pach(es) entirely following (I hope) your suggestions.

Now I use NodeInfo only for the communication between kubelet and nodecontroller:

type NodeIdentity struct {
    MachineID string `json:"machineID"`
    SystemUUID string `json:"systemUUID"`
}

// NodeInfo is the information collected on the node.
type NodeInfo struct {
    TypeMeta `json:",inline"`
    // Capacity represents the available resources of a node
    Capacity ResourceList `json:"capacity,omitempty"`
    Identity NodeIdentity `json:"identity,omitempty"`
}

Then both Capacity and NodeIdentity (new) are properly updated in the Node Spec and NodeStatus.

With regard to push vs pull. Are you planning to use watches? (Communication initiated by nodecontroller) Or will you allow kubelet to talk to apiserver?

How long for this change to happen? I can contribute to the effort if you need help.

@@ -788,6 +788,13 @@ type NodeSpec struct {
ExternalID string `json:"externalID,omitempty"`
}

type NodeIdentity struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't mind the missing comments/descriptions (here and below) it's still a work in progress.

@simon3z
Copy link
Contributor Author

simon3z commented Mar 5, 2015

Why NodeInfo must be an apiobject?

@ddysher for DecodeInto:

func (c *HTTPKubeletClient) getEntity(host, path string, entity runtime.Object) (*http.Response, error) {
    [...]
    err = latest.Codec.DecodeInto(body, entity)
    return response, err
}

@simon3z
Copy link
Contributor Author

simon3z commented Mar 5, 2015

Do you still need DoChecks?

Probably not but I must be careful when I switch to just use GetNodeInfo, the specific errors that I get (connection refused, etc..) must be properly handled in order to get the same behaviors of the old /healtz. I am a little worried of regressions there so I would handle that later if possible.

@@ -221,6 +221,7 @@ func (s *NodeController) SyncNodeStatus() error {
}
nodes = s.DoChecks(nodes)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @ddysher this should have been removed (replaced by s.UpdateNodesStatus).

@ddysher
Copy link
Contributor

ddysher commented Mar 5, 2015

I'll take a closer look when you feel comfortable, looks like you are actively working on it right now :)

@simon3z simon3z force-pushed the nodeinfo branch 2 times, most recently from 56477ce to 492a8ed Compare March 6, 2015 09:41
@simon3z
Copy link
Contributor Author

simon3z commented Mar 8, 2015

@bgrant0607 I updated the patches following your suggestions and it's much better but now it seems that NodeInfo/nodeInfo/NodeSystemInfo are a little bit confusing in their use/type. What do you think? If instead you're fine with it then I think we're ready to merge. Thanks!

@ddysher
Copy link
Contributor

ddysher commented Mar 9, 2015

Why do we need an inline struct NodeSystemInfo if the unit of information change between nodecontroller and kubelet is NodeInfo? If there is no compelling reason, let's not do NodeInfo/nodeInfo/NodeSystemInfo, that's really confusing and didn't add much.

this is eaiser

 // NodeStatus is information about the current status of a node.
 type NodeStatus struct {
    // NodePhase is the current lifecycle phase of the node.
    Phase NodePhase `json:"phase,omitempty"`
    // Conditions is an array of current node conditions.
    Conditions []NodeCondition `json:"conditions,omitempty"`
    // Queried from cloud provider, if available.
    Addresses []NodeAddress `json:"addresses,omitempty"`
    // NodeInfo is the information collected on the node.
    NodeInfo NodeInfo `json:"nodeInfo,omitempty"`
}

// NodeInfo is the information collected on the node.
type NodeInfo struct {
    TypeMeta `json:",inline"`
    // Capacity represents the available resources of a node
    Capacity ResourceList `json:"capacity,omitempty"`
    // MachineID is the machine-id reported by the node
    MachineID string `json:"machineID"`
    // SystemUUID is the system-uuid reported by the node
    SystemUUID string `json:"systemUUID"`
 }

When you get information from kubelet, set all fields in Status.NodeInfo, as well as Spec.Capacity. If desirable, we should document Spec.Capacity as deprecated in favor of Status.NodeInfo.Capacity.

@bgrant0607
Copy link
Member

@ddysher I want Capacity at the top level in NodeStatus. NodeInfo is a temporary nuisance that's a result of NodeStatus not being posted by Kubelet. The sooner you do that, the sooner we can clean this up. :-)

@bgrant0607
Copy link
Member

We could also call the NodeStatus field SystemInfo rather than NodeInfo.

@simon3z
Copy link
Contributor Author

simon3z commented Mar 10, 2015

@bgrant0607 I rebased but it seems that the coverall check is stuck (travis passed). Do you think we can merge this?

Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
@bgrant0607
Copy link
Member

Travis passing is good enough, but you need to move/remove cpu frequency first.

@simon3z
Copy link
Contributor Author

simon3z commented Mar 10, 2015

@bgrant0607 sorry! I mistakenly reintroduced it after rebasing :(

@bgrant0607
Copy link
Member

LGTM

@bgrant0607
Copy link
Member

Github says you need to rebase.

@bgrant0607 bgrant0607 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 10, 2015
@bgrant0607
Copy link
Member

Sorry, could you please squash the commits?

@bgrant0607
Copy link
Member

And please comment on the PR after you do so, so I get notified.

@bgrant0607 bgrant0607 removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 10, 2015
@simon3z
Copy link
Contributor Author

simon3z commented Mar 10, 2015

@bgrant0607 should be done (sorry I am between meetings and travels so I hope I didn't do any miskate)

@vishh
Copy link
Contributor

vishh commented Mar 10, 2015

@bgrant0607: With this PR the API will have two capacity fields, one in NodeSpec and one in NodeStatus. Is this intentional?

@ddysher
Copy link
Contributor

ddysher commented Mar 10, 2015

I'll be able to get to the kubelet post PR this week; was doing some clean up in kubelet earlier.

@bgrant0607
Copy link
Member

@vishh No, there is only a single Capacity field. Look carefully at the struct type names. It's debatable whether Capacity should be moved from Spec to Status in v1beta3, though.

LGTM for now.

bgrant0607 added a commit that referenced this pull request Mar 10, 2015
Add support for fetching node collected information
@bgrant0607 bgrant0607 merged commit 9aa7449 into kubernetes:master Mar 10, 2015
@ddysher
Copy link
Contributor

ddysher commented Mar 10, 2015

I bet this will confuse others as well. How about we rename struct NodeInfo to struct NodeInfoResult, like the PodStatusResult?

@bgrant0607
Copy link
Member

I'd be fine with that, but I'd be much happier to just nuke it entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants