Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify nodes to register directly with the master. #6949

Merged
merged 2 commits into from May 19, 2015

Conversation

roberthbailey
Copy link
Contributor

Ref #6087.
Fixes #7495.

@roberthbailey
Copy link
Contributor Author

/cc @bgrant0607

@@ -165,12 +165,14 @@ func (h *EtcdHelper) ExtractObjToList(key string, listObj runtime.Object) error
}

nodes := make([]*etcd.Node, 0)
nodes = append(nodes, response.Node)
if !IsEtcdNotFound(err) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton I found that if there is an etcd not found error here (type 100) then response will be null so response.Node or response.EtcdIndex (below) will cause this goroutine to panic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derek just fixed this in another pull.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#6938 covers it

On Apr 16, 2015, at 9:30 PM, Robert Bailey notifications@github.com wrote:

In pkg/tools/etcd_helper.go:

@@ -165,12 +165,14 @@ func (h *EtcdHelper) ExtractObjToList(key string, listObj runtime.Object) error
}

nodes := make([]*etcd.Node, 0)
  • nodes = append(nodes, response.Node)
  • if !IsEtcdNotFound(err) {
    @smarterclayton I found that if there is an etcd not found error here (type 100) then response will be null so response.Node or response.EtcdIndex (below) will cause this goroutine to panic.


Reply to this email directly or view it on GitHub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rebased on top of that PR.

@roberthbailey
Copy link
Contributor Author

@derekwaynecarr @pires @justinsb I haven't been able to test this on other cloud providers -- can you take a look and see if you think it will break anything?

@smarterclayton
Copy link
Contributor

What are the security implications of this? How do I control as an admin clients interfering with other nodes? Does this require the cloud provider on the kubelet to have security credentials to the cluster (very scary)? Can I still disable the cloud provider on the kubelet?

@roberthbailey
Copy link
Contributor Author

Security implications: A node with credentials can add itself to the cluster. Right now, nodes can do this because they have bearer tokens.

Right now we don't have any authz in the apiserver. Once we do, we should add policies that only allow nodes to update their own data (and not that of other nodes). Any node today has free reign to update anything in the APIserver because it has full read/write access through the secured port with the node bearer token.

As a first cut I just moved the cloud provider bits from the nodecontroller to the kubelet. But I think we should be able to remove most/all of them. For GCE (and likely AWS) we may still want make a call into the cloud provider to hit the local metadata server and get the external IP address of a node. But calling into the cloud provider to determine system resources seems silly.

Yes, you can still disable the cloud provider on the kubelet. When you run ./hack/test-cmd.sh it runs the kubelet without a cloud provider (https://github.com/GoogleCloudPlatform/kubernetes/blob/master/hack/test-cmd.sh#L72-L83) and all tests still pass.

@smarterclayton
Copy link
Contributor

On Apr 16, 2015, at 9:50 PM, Robert Bailey notifications@github.com wrote:

Security implications: A node with credentials can add itself to the cluster. Right now, nodes can do this because they have bearer tokens.

Right now we don't have any authz in the apiserver.

The policy model @deads2k has created for OpenShift (which post 1.0 we'll evaluate bringing into kube) should be able to handle this. We also have the pattern to properly give kubelets unique identity, although we do not yet have the ideal policies.
Once we do, we should add policies that only allow nodes to update their own data (and not that of other nodes). Any node today has free reign to update anything in the APIserver because it has full read/write access through the secured port with the node bearer token.

I just wanted to make sure this didn't change the detente. Thanks.
As a first cut I just moved the cloud provider bits from the nodecontroller to the kubelet. But I think we should be able to remove most/all of them. For GCE (and likely AWS) we may still want make a call into the cloud provider to hit the local metadata server and get the external IP address of a node. But calling into the cloud provider to determine system resources seems silly.

Yes, you can still disable the cloud provider on the kubelet. When you run ./hack/test-cmd.sh it runs the kubelet without a cloud provider (https://github.com/GoogleCloudPlatform/kubernetes/blob/master/hack/test-cmd.sh#L72-L83) and all tests still pass.


Reply to this email directly or view it on GitHub.

// syncNodeStatus periodically synchronizes node status to master.
func (kl *Kubelet) syncNodeStatus() {
if kl.kubeClient == nil {
return
}
if err := kl.registerNode(); err != nil {
glog.Errorf("Failed to register node: %s. Giving up.", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use util.HandleError here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize that existed -- cool. I've passed it a new error so that we keep the custom text on top of the error text returned from registerNode.

@pires
Copy link
Contributor

pires commented Apr 17, 2015

@roberthbailey right now I can only test on GCE since I no longer use AWS.

Now, from what I understand from your changes, you're moving cloud-provider stuff from controller-manager (poll) to kubelet (push). If my assumption is correct, I think this is great because it can replace what I and others (cc @AntonioMeireles @wkharold) already use with kube-register.

After reading @smarterclayton I too believe there are risks, but right now, as a sysadmin, I'm OK with it.

@roberthbailey
Copy link
Contributor Author

@pires I'm trying to move the node creation from being done in a non-obvious way (a control loop that talks to the cloud provider and uses a regular expression to decide which nodes should be part of the cluster) to an explicit call from a node to join a specific master.

Right now this call "just works" because the kubelet already has credentials to talk to the master, but in the future (with dynamic clustering) the node will first ask to join by sending a CSR to the master to get itself credentials to POST and create a node entry. This will give sysadmins the ability to set policies on whether nodes can join automatically or whether they need to be manually approved to join a cluster. But that's post-1.0. For now, if a node has credentials, it will automatically join the cluster.

Some of the fields for the node may still make sense to fill in from the node controller (which is why I cc'd @bgrant0607 as I thought he'd have an opinion here) but I think that most of the ones we are setting are better known by the kubelet itself (e.g. node resources, hostname, ip addresses).

@pires
Copy link
Contributor

pires commented Apr 17, 2015

@roberthbailey just to be sure, this would remove the need for tools such as kube-register, right?

@roberthbailey
Copy link
Contributor Author

@pires I think so. Would it be possible for you to test this CL on CoreOS and find out?

@AntonioMeireles
Copy link
Contributor

@roberthbailey given the nature of this @kelseyhightower and his pals at CoreOS are the most abilitated to postulate about this. anyway i may be able to test this CL early next week...

@pires
Copy link
Contributor

pires commented Apr 17, 2015

@roberthbailey sure will you be available on IRC for debugging?

@roberthbailey
Copy link
Contributor Author

I'll hop on right now.

@pires
Copy link
Contributor

pires commented Apr 17, 2015

Removed kube-register from master node, replaced kubelet with @roberthbailey version and... it works!

$ kubectl get minions
NAME           LABELS        STATUS
172.17.8.102   Schedulable   <none>    Ready
172.17.8.103   Schedulable   <none>    Ready

@bgrant0607
Copy link
Member

cc @gmarek @davidopp

@bgrant0607
Copy link
Member

Nodes shouldn't need to call the cloud provider.

We have discussed making it possible to contact the local metadata server for clouds that provide that, but I'd like to avoid that, since it's not universally available.

We need to make some changes to the Node object. The node name is currently overloaded for at least 2 purposes: object name and cloud provider ("external") ID. It's often set to the hostname or address, which adds to the confusion about the name requirements and usage.

The only information currently required to create a node is the externalID, which is automatically defaulted to the node name if not provided -- that part I don't mind. However, I'd like the externalID to be used to lookup the node with the cloud provider, rather than looking up the external ID using the node name:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/cloudprovider/nodecontroller/nodecontroller.go#L640

Addresses are also looked up via the cloud provider. It's true that the node knows at least a subset of its addresses, though not necessarily external ones. The node controller could fill in additional addresses after creation (late initialization).

We don't really have a good way of knowing which address is preferred for master-to-node communication (/validate, log access, exec, port forwarding, proxy, redirect), and we don't know which name/address is specified in the node cert, to the extent that's relevant -- #6985.

Some master component will allocate the PodCIDR via late initialization.

Capacity can be populated after creation, also.

@bgrant0607
Copy link
Member

This should also allow us to get rid of the --machines flag, which @eparis should also like.

kube-register allows users to specify fleet node labels to select which nodes to add to Kubernetes. That's still useful, so I don't think this obsoletes kube-register. I wouldn't mind adding fleet/kube-register as a cloudprovider -- #2890 @kelseyhightower .

@bgrant0607
Copy link
Member

So maybe the one cloudprovider bit the node needs is the ability to get its externalID, which it should get from a local metadata server, or maybe just use the hostname if no cloudprovider.

ObjectMeta: api.ObjectMeta{Name: kl.hostname},
Status: api.NodeStatus{
Capacity: api.ResourceList{
api.ResourceCPU: *resource.NewMilliQuantity(1000, resource.DecimalSI),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubelet already has code for populating this with real values.
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/kubelet/kubelet.go#L1768

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ask the cloud provider to determine the node resources when the node can just self report them? How will that work for deployments without a cloud provider?

Also, see https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/cloudprovider/gce/gce.go#L512

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No good reason -- legacy cruft. We must get the values from the node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code was in NodeController so there would be some capacity in NodeStatus between the time the node is created in the master, and the first time the node reports its status.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with leaving it this way in this PR, but we should clean up at some point. Please leave a TODO if you don't think you can get to it soon.

@roberthbailey
Copy link
Contributor Author

kube-register allows users to specify fleet node labels to select which nodes to add to Kubernetes.

Could this be done instead by using fleet node labels to choose which --api_server argument we pass to the kubelet on each node? It doesn't make sense for a kubelet to connect to an apiserver and watch for pods to run (and send events) if it isn't actually part of the cluster.

"time"

"github.com/GoogleCloudPlatform/kubernetes/pkg/api"
apierrors "github.com/GoogleCloudPlatform/kubernetes/pkg/api/errors"
"github.com/GoogleCloudPlatform/kubernetes/pkg/client"
"github.com/GoogleCloudPlatform/kubernetes/pkg/client/record"
"github.com/GoogleCloudPlatform/kubernetes/pkg/cloudprovider"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to delete the cloudprovider here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I re-added the code to handle deleting nodes that no longer exist, we are still using the cloud provider.

@bgrant0607
Copy link
Member

Actually, I'm liking this more. I think we should eliminate cloudprovider stuff from the nodecontroller and move it out of the cloudprovider subtree, as discussed in #4851.

The node should get any info it needs from the metadata server, if present, or from a local config file, fleet, whatever -- the mechanism should be pluggable. Calling the existing cloudprovider API could be a stopgap.

We'll likely need to resurrect a cloudprovider-oriented nodecontroller in the future in order to automatically repair and provision nodes (cluster auto-scaling), but that's ok.

@roberthbailey
Copy link
Contributor Author

Just finished running e2e tests:

Ran 43 of 49 Specs in 1678.866 seconds
SUCCESS! -- 43 Passed | 0 Failed | 1 Pending | 5 Skipped PASS

@mbforbes
Copy link
Contributor

(Needs rebase)

 - Delete nodes when they are no longer ready and don't exist in the
cloud provider.
 - Label each node with it's hostname.
 - Add flag to skip node registration.
 - Add a test for registering an existing node.
@roberthbailey
Copy link
Contributor Author

Rebased again.

@erictune
Copy link
Member

reviewing now

@roberthbailey
Copy link
Contributor Author

Thanks @erictune

@erictune
Copy link
Member

LGTM

erictune added a commit that referenced this pull request May 19, 2015
Modify nodes to register directly with the master.
@erictune erictune merged commit 1f4172d into kubernetes:master May 19, 2015
@mbforbes
Copy link
Contributor

/xref should fix #8315 (linking for tracking)

availableCIDRs.Delete(node.Spec.PodCIDR)
// reconcilePodCIDRs looks at each node and assigns it a valid CIDR
// if it doesn't currently have one.
func (nc *NodeController) reconcilePodCIDRs(nodes *api.NodeList) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be called reconcile_Node_CIDRs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probably should have renamed it while I was in here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatically label every node with its hostname