Skip to content
This repository has been archived by the owner on Dec 5, 2017. It is now read-only.

Ports could not be allocated #131

Closed
brandon-adams opened this issue Jan 25, 2015 · 16 comments
Closed

Ports could not be allocated #131

brandon-adams opened this issue Jan 25, 2015 · 16 comments

Comments

@brandon-adams
Copy link

Hi all,

I'm trying to launch a simple pod, using a pre-built docker image. The kubernetes-mesos scheduler is giving me an error related to port allocation:

I0125 21:48:55.076816 5499 pod_task.go:155] Could not schedule pod jboss-master-1: 2 ports could not be allocated
I0125 21:48:55.077076 5499 pod_task.go:143] Evaluating port range {31000:32000} 9990
I0125 21:48:55.077290 5499 pod_task.go:143] Evaluating port range {31000:32000} 8080

I'm wondering if that port range is something that's configurable. The port that's throwing the error is the container port, which I'd like to avoid changing. Thanks.

@jdef
Copy link

jdef commented Jan 26, 2015

The available port range is determined based on the resources offered by
the slaves you have set up. The default port range for slaves is
31000-32000. Container ports should not be restricted by the ports resource
range, only host ports. Can you paste a copy of your pod configuration?

On Sun, Jan 25, 2015 at 5:54 PM, Brandon notifications@github.com wrote:

Hi all,

I'm trying to launch a simple pod, using a pre-built docker image. The
kubernetes-mesos scheduler is giving me an error related to port allocation:

I0125 21:48:55.076816 5499 pod_task.go:155] Could not schedule pod
jboss-master-1: 2 ports could not be allocated
I0125 21:48:55.077076 5499 pod_task.go:143] Evaluating port range
{31000:32000} 9990
I0125 21:48:55.077290 5499 pod_task.go:143] Evaluating port range
{31000:32000} 8080

I'm wondering if that port range is something that's configurable. The
port that's throwing the error is the container port, which I'd like to
avoid changing. Thanks.


Reply to this email directly or view it on GitHub
#131.

James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)

@brandon-adams
Copy link
Author

@jdef Turns out I was mistaken about the cause of the error. I was trying to assign a port on the host machine which was outside of the allowable range. Changing that made everything move smoothly enough.

This has raised a similar issue however. It seems like when I try to launch a replication controller, it tries to use the same host ports for each replicated container within each pod. That is to say, if I describe a replicationController with two replicas, one container each, with the template describing the port bindings, it creates one then spins indefinitely trying to bind to the same host port for the other replica. Is this expected behavior?

@jdef
Copy link

jdef commented Jan 26, 2015

Since host ports are mesos-managed resources, if you define them in a
replication controller's pod template then you can only scale that
controller up to match the number of slaves you have (assuming that nothing
else has allocated the host port that you specified in the replication
controller pod template).

On Mon, Jan 26, 2015 at 3:35 PM, Brandon notifications@github.com wrote:

@jdef https://github.com/jdef Turns out I was mistaken about the cause
of the error. I was trying to assign a port on the host machine which was
outside of the allowable range. Changing that made everything move smoothly
enough.

This has raised a similar issue however. It seems like when I try to
launch a replication controller, it tries to use the same host ports for
each replicated container within each pod. That is to say, if I describe a
replicationController with two replicas, one container each, with the
template describing the port bindings, it creates one then spins
indefinitely trying to bind to the same host port for the other replica. Is
this expected behavior?


Reply to this email directly or view it on GitHub
#131 (comment)
.

James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)

@brandon-adams
Copy link
Author

So if I understand you correctly, I always need to have more minions/slaves than replicated pods? I've only tried a controller of size 2 with a cluster of size 3, and the error still persists. So that means it should try and place on pod on each slave, and allocate identical ports on those slaves. Is this where a service would come into play? Would I have the service describe the port mapping, so that only the host on which the service lives would have the external-facing port declared?

@jdef
Copy link

jdef commented Jan 27, 2015

Hmm. Controller of size 2 with 3 slaves should work. Would you mind sending
a copy of your replication controller configs? Also, would you be willing
to dump your logs somewhere that I could review them?

A bit about services:

The Kubernetes service model, by default, allocates an IP per service where
the IP is pulled from somewhere in the CIDR range specified by the
-portal_net parameter. Ports declared for services are currently
invisible to mesos scheduling. A simple way to think of a kubernetes
service: a combination of pod-selector-filter and load-balancer: the
service specification identifies pods that will handle traffic (selector)
and then the realization of that service acts as a load balancer, proxying
traffic between the service 'portal' and the pods (iptables + kube-proxy).
Right now service portals are fronted by iptables NAT rules, so there's no
actual NICs with IP addresses in the range of -portal_net.

I've commented on the service spec structure below, with respect to k8sm:

service {
port           // port advertised to other pods via SERVICENAME_PORT
variable
selector       // identify the pods that back this service
portalIP       // (don't set this) managed by apiserver, IP pulled from
-portal_net range
proxyPort      // (optional) ephemeral, target of iptables nat rules
publicIPs      // (optional) ip address(es) of load balancer(s)
containerPort  // (optional) should match a
selectedPod.container[x].port[y].{ hostPort | name }, so it's an int or
string.
                   // unspecified => use first hostPort of first container
in matching pods
...
}

On Tue, Jan 27, 2015 at 10:53 AM, Brandon notifications@github.com wrote:

So if I understand you correctly, I always need to have more
minions/slaves than replicated pods? I've only tried a controller of size 2
with a cluster of size 3, and the error still persists. So that means it
should try and place on pod on each slave, and allocate identical ports on
those slaves. Is this where a service would come into play? Would I have
the service describe the port mapping, so that only the host on which the
service lives would have the external-facing port declared?


Reply to this email directly or view it on GitHub
#131 (comment)
.

James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)

@brandon-adams
Copy link
Author

Sure.
Here's my replication controller.

Here's a piece of my scheduler log file. This is only a snippet, right now my log file is about 25mb large. You can see it just continuously cycling between the three slaves, trying to allocate ports already in use. When I look at the slaves, I can see the actual pods running totally fine.

It may be worth noting that I'm also running into #135 when I start single pods as well as replication controllers. So is it possible that the framework is seeing the pod as a failure, trying to fill in the number of required replications, having those fail, and just looping?

Edit: I went back into the log file and found where it starts to try and schedule the pods. I went down a bit so you can see more of the looping.

@brandon-adams
Copy link
Author

As an update, I updated the framework to the latest pull from git, and seemed to see some changes. Unfortunately, my error still persists. Before, when creating pods, they immediately entered the Failed state. After an update, I am able to see them enter the Pending state, before ultimately still failing. I'm still not sure what's causing them to enter the Failed state. I think that this is the issue behind everything here.

@mikesplain
Copy link

I'm seeing the same thing too, built everything clean still seeing the issues from #135. I'm going to dig and see what I can find.

@jdef
Copy link

jdef commented Jan 28, 2015

Looks like the jboss container is flapping because of a missing directory. From the docker log for the jboss container:

[Host Controller] 22:17:14,126 ERROR [stderr] (main) java.lang.IllegalStateException: JBAS016504: Could not create log directory: /opt/jboss/wildfly/domain/log
[Host Controller] 22:17:14,127 ERROR [stderr] (main)    at org.jboss.as.host.controller.HostControllerEnvironment.<init>(HostControllerEnvironment.java:415)
[Host Controller] 22:17:14,128 ERROR [stderr] (main)    at org.jboss.as.host.controller.Main.determineEnvironment(Main.java:435)
[Host Controller] 22:17:14,128 ERROR [stderr] (main)    at org.jboss.as.host.controller.Main.boot(Main.java:129)
[Host Controller] 22:17:14,129 ERROR [stderr] (main)    at org.jboss.as.host.controller.Main.create(Main.java:124)
[Host Controller] 22:17:14,129 ERROR [stderr] (main)    at org.jboss.as.host.controller.Main.main(Main.java:113)
[Host Controller] 22:17:14,130 ERROR [stderr] (main)    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[Host Controller] 22:17:14,130 ERROR [stderr] (main)    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
[Host Controller] 22:17:14,130 ERROR [stderr] (main)    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[Host Controller] 22:17:14,131 ERROR [stderr] (main)    at java.lang.reflect.Method.invoke(Method.java:606)
[Host Controller] 22:17:14,131 ERROR [stderr] (main)    at org.jboss.modules.Module.run(Module.java:312)
[Host Controller] 22:17:14,132 ERROR [stderr] (main)    at org.jboss.modules.Main.main(Main.java:460)

I added a couple of volume mounts so that the proc would not die immediately, and the tasks both seem to run OK (they still complain about missing things but they don't die instantly) for me:

 {
    "id": "jboss-controller",
    "kind": "ReplicationController",
    "apiVersion": "v1beta1",
    "desiredState": {
      "replicas": 2,
      "replicaSelector": {"name": "jboss"},
      "podTemplate": {
        "desiredState": {
           "manifest": {
             "version": "v1beta1",
             "id": "jboss-wildfly-controller",
             "volumes": [
               { "name": "wildfly-log", "source": { "emptyDir": {} } },
               { "name": "wildfly-servers", "source": { "emptyDir": {} } }
             ],
             "containers": [{
               "name": "jboss-wildfly",
               "image": "vnguyen/jboss-wildfly-admin",
               "ports": [{"containerPort": 8080, "hostPort": 31008}, {"containerPort": 9990, "hostPort": 31009}],
               "volumeMounts": [
                 { "name": "wildfly-log", "path": "/opt/jboss/wildfly/domain/log" },
                 { "name": "wildfly-servers", "path": "/opt/jboss/wildfly/domain/servers" }
               ],
             }]
           }
         },
         "labels": {"name": "jboss"}
        }},
    "labels": {"name": "jboss-wildfly"}
}

Something else to check: does your mesos master report any orphan tasks? If so, they may be falsely consuming resources (like host ports).

@brandon-adams
Copy link
Author

I've never gotten an error like that from running the container I'm using. In fact, the containers themselves are definitely being created without error. The entire pod is being created, just reported as Failed, so the controller tries to create it again, thus running into the port conflict. The pod is reported as failed no matter what container I use. So far I've tried a few Jboss, RabbitMQ, nginx, MongoDB, others.

My slave is reporting the resource as consumed, and the docker daemon is showing the containers as running, but kubectl is showing the pod as Failed.

I tried your config file just to be sure, and it did the same thing.

@jdef
Copy link

jdef commented Jan 29, 2015

Can you try querying etcd and the kubelet APIs? For example:

:; curl http://$servicehost:4001/v2/keys/registry/pods/default/
{"action":"get","node":{"key":"/registry/pods/default","dir":true,"nodes":[{"key":"/registry/pods/default/nginx-id-01","value":"{\"kind\":\"Pod\",\"id\":\"nginx-id-01\",\"uid\":\"da4a6804-a775-11e4-9155-04012f416701\",\"creationTimestamp\":\"2015-01-29T05:15:34Z\",\"resourceVersion\":15,\"apiVersion\":\"v1beta1\",\"namespace\":\"default\",\"labels\":{\"cluster\":\"gce\",\"name\":\"foo\"},\"desiredState\":{\"manifest\":{\"version\":\"v1beta2\",\"id\":\"\",\"volumes\":null,\"containers\":[{\"name\":\"nginx-01\",\"image\":\"dockerfile/nginx\",\"ports\":[{\"hostPort\":31000,\"containerPort\":80,\"protocol\":\"TCP\"}],\"livenessProbe\":{\"httpGet\":{\"path\":\"/index.html\",\"port\":\"8081\"},\"initialDelaySeconds\":30},\"imagePullPolicy\":\"\"}],\"restartPolicy\":{\"always\":{}}}},\"currentState\":{\"manifest\":{\"version\":\"\",\"id\":\"\",\"volumes\":null,\"containers\":null,\"restartPolicy\":{}},\"status\":\"Waiting\",\"host\":\"10.132.189.240\"}}","modifiedIndex":17,"createdIndex":15}],"modifiedIndex":15,"createdIndex":15}}

:; curl http://$slaveip:10250/podInfo?podID=nginx-id-01\&podNamespace=default

{"net":{"state":{"running":{"startedAt":"2015-01-29T05:15:40.220382019Z"}},"restartCount":0,"podIP":"172.17.36.49","image":"kubernetes/pause:latest"},"nginx-01":{"state":{"running":{"startedAt":"2015-01-29T05:15:40.463949269Z"}},"restartCount":0,"image":"dockerfile/nginx"}}

... and your minions...

:; curl http://$servicehost:8888/api/v1beta2/minions
{
  "kind": "MinionList",
  "creationTimestamp": null,
  "selfLink": "/api/v1beta2/minions",
  "resourceVersion": 24,
  "apiVersion": "v1beta2",
  "items": [
    {
      "id": "10.132.189.243",
      "uid": "b7af1eb2-a775-11e4-9155-04012f416701",
      "creationTimestamp": "2015-01-29T05:14:36Z",
      "selfLink": "/api/v1beta2/minions/10.132.189.243",
      "resourceVersion": 6,
      "resources": {}
    },
    {
      "id": "10.132.189.240",
      "uid": "b7b11029-a775-11e4-9155-04012f416701",
      "creationTimestamp": "2015-01-29T05:14:36Z",
      "selfLink": "/api/v1beta2/minions/10.132.189.240",
      "resourceVersion": 7,
      "resources": {}
    },
    {
      "id": "10.132.189.242",
      "uid": "b7b36c72-a775-11e4-9155-04012f416701",
      "creationTimestamp": "2015-01-29T05:14:36Z",
      "selfLink": "/api/v1beta2/minions/10.132.189.242",
      "resourceVersion": 8,
      "resources": {}
    }
  ]
}

@brandon-adams
Copy link
Author

curl http://$servicehost:4001/v2/keys/registry/pods/default/

{"action":"get","node":{"key":"/registry/pods/default","dir":true,"nodes":[{"key":"/registry/pods/default/jboss-pod","value":"{\"kind\":\"Pod\",\"id\":\"jboss-pod\",\"uid\":\"26bc871c-a7bb-11e4-a805-fa163e3c002e\",\"creationTimestamp\":\"2015-01-29T13:31:37Z\",\"resourceVersion\":199,\"apiVersion\":\"v1beta1\",\"namespace\":\"default\",\"labels\":{\"name\":\"jboss\"},\"desiredState\":{\"manifest\":{\"version\":\"v1beta2\",\"id\":\"\",\"volumes\":null,\"containers\":[{\"name\":\"wildfly\",\"image\":\"vnguyen/jboss-wildfly-admin\",\"ports\":[{\"hostPort\":31000,\"containerPort\":8080,\"protocol\":\"TCP\"},{\"hostPort\":31010,\"containerPort\":9990,\"protocol\":\"TCP\"}],\"imagePullPolicy\":\"\"}],\"restartPolicy\":{\"always\":{}},\"dnsPolicy\":\"ClusterFirst\"}},\"currentState\":{\"manifest\":{\"version\":\"\",\"id\":\"\",\"volumes\":null,\"containers\":null,\"restartPolicy\":{}},\"status\":\"Waiting\",\"host\":\"mesos-slave-2\"}}","modifiedIndex":200,"createdIndex":199}],"modifiedIndex":10,"createdIndex":10}}

curl http://mesos-slave-2:10250/podInfo?podID=jboss-pod\&podNamespace=default

{"net":{"state":{"running":{"startedAt":"2015-01-29T13:31:38Z"}},"restartCount":1,"podIP":"172.17.0.4","image":"kubernetes/pause:latest","containerID":"docker://ec1892cd54f4db2be49025556db0b467501210a5289ffe9439c0a8a0a8cbc597"},"wildfly":{"state":{"running":{"startedAt":"2015-01-29T13:31:38Z"}},"restartCount":1,"image":"vnguyen/jboss-wildfly-admin","containerID":"docker://036a01d5975cd74bf89f28f61b60b83c0ba79920b848500ec7866184c4c6186f"}}

curl http://$servicehost:8888/api/v1beta2/minions

{
  "kind": "MinionList",
  "creationTimestamp": null,
  "selfLink": "/api/v1beta2/minions",
  "resourceVersion": 207,
  "apiVersion": "v1beta2",
  "items": [
    {
      "id": "23.23.23.56",
      "uid": "adc3b379-a720-11e4-a805-fa163e3c002e",
      "creationTimestamp": "2015-01-28T19:05:52Z",
      "selfLink": "/api/v1beta2/minions/23.23.23.56",
      "resourceVersion": 60,
      "hostIP": "23.23.23.56",
      "resources": {},
      "status": {}
    },
    {
      "id": "23.23.23.50",
      "uid": "a5372e50-a716-11e4-a805-fa163e3c002e",
      "creationTimestamp": "2015-01-28T17:54:03Z",
      "selfLink": "/api/v1beta2/minions/23.23.23.50",
      "resourceVersion": 7,
      "hostIP": "23.23.23.50",
      "resources": {},
      "status": {}
    },
    {
      "id": "23.23.23.51",
      "uid": "3d1f3cc4-a721-11e4-a805-fa163e3c002e",
      "creationTimestamp": "2015-01-28T19:09:52Z",
      "selfLink": "/api/v1beta2/minions/23.23.23.51",
      "resourceVersion": 62,
      "hostIP": "23.23.23.51",
      "resources": {},
      "status": {}
    }
  ]
}

@jdef
Copy link

jdef commented Jan 29, 2015

I suspect that the problem may be that the -hostname flag of the slave is an actual hostname instead of an IP address. The minion list shows no hostnames and all IP addresses, which probably tricks the kubernetes apiserver into thinking there's a problem with the minion (see https://github.com/GoogleCloudPlatform/kubernetes/blob/v0.8.2/pkg/master/pod_cache.go#L141).

Try converting your slaves to use -hostname=$(their ip address).

The real "bug" here is probably the way I've implemented the mesos cloud provider to use IP addresses and not hostnames (https://github.com/mesosphere/kubernetes-mesos/blob/master/pkg/cloud/mesos/client.go#L40).

@jdef jdef added the class/bug label Jan 29, 2015
@brandon-adams
Copy link
Author

@jdef That was it, I can see running pods in the list now. Replication works now as well. That seems like a relatively easy fix for as much headache it was giving me. I guess this also solves #135. Thanks for the help!

@jdef
Copy link

jdef commented Jan 29, 2015

you're welcome, and thanks for reporting the problem.

@mikesplain
Copy link

Ahh yes that worked for me too. Thanks @jdef!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants