Ports values were swapped in the UI based on Mesos data #4781

marcomonaco · 2016-12-06T13:51:56Z

copy of https://mesosphere.atlassian.net/browse/DCOS-9629
reported by @lloesche

I have a service that uses persistent volumes and two tcp ports.
When restarting the service it's being restarted on the same host using the same random host ports. However the mapping of container to host port might be shuffled.
E.g.
container 9090 -> host 21926
container 9093 -> host 21927
RESTART
container 9090 -> host 21927
container 9093 -> host 21926

However the DC/OS UI shows the host to container port mapping wrong every now and then.
It's correct after initial start but sometimes incorrect after the ports have been shuffled due to a restart.

state.json.txt

Example /service/marathon/v2/apps/prometheus/server output fetched when the screenshot above was taken.

{
  "app": {
    "id": "/prometheus/server",
    "cmd": null,
    "args": null,
    "user": null,
    "env": {
      "PAGERDUTY_KEY": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
      "NODE_EXPORTER_SRV": "_node-exporter.prometheus._tcp.marathon.mesos"
    },
    "instances": 1,
    "cpus": 0.5,
    "mem": 2048,
    "disk": 0,
    "gpus": 0,
    "executor": "",
    "constraints": [],
    "uris": [],
    "fetch": [],
    "storeUrls": [],
    "backoffSeconds": 1,
    "backoffFactor": 1.15,
    "maxLaunchDelaySeconds": 3600,
    "container": {
      "type": "DOCKER",
      "volumes": [
        {
          "containerPath": "prometheus",
          "mode": "RW",
          "persistent": {
            "size": 1024
          }
        }
      ],
      "docker": {
        "image": "lloesche/prometheus-dcos",
        "network": "BRIDGE",
        "portMappings": [
          {
            "containerPort": 9090,
            "hostPort": 0,
            "servicePort": 10001,
            "protocol": "tcp",
            "labels": {
              "VIP_0": "/prometheus/server:9090"
            }
          },
          {
            "containerPort": 9093,
            "hostPort": 0,
            "servicePort": 10002,
            "protocol": "tcp",
            "labels": {
              "VIP_1": "/prometheus/server:9093"
            }
          }
        ],
        "privileged": false,
        "parameters": [],
        "forcePullImage": true
      }
    },
    "healthChecks": [
      {
        "path": "/metrics",
        "protocol": "HTTP",
        "portIndex": 0,
        "gracePeriodSeconds": 300,
        "intervalSeconds": 60,
        "timeoutSeconds": 20,
        "maxConsecutiveFailures": 3,
        "ignoreHttp1xx": false
      }
    ],
    "readinessChecks": [],
    "dependencies": [],
    "upgradeStrategy": {
      "minimumHealthCapacity": 0.5,
      "maximumOverCapacity": 0
    },
    "labels": {},
    "acceptedResourceRoles": null,
    "ipAddress": null,
    "version": "2016-09-06T16:07:36.977Z",
    "residency": {
      "relaunchEscalationTimeoutSeconds": 3600,
      "taskLostBehavior": "WAIT_FOREVER"
    },
    "secrets": {},
    "taskKillGracePeriodSeconds": null,
    "ports": [
      10001,
      10002
    ],
    "portDefinitions": [
      {
        "port": 10001,
        "protocol": "tcp",
        "labels": {}
      },
      {
        "port": 10002,
        "protocol": "tcp",
        "labels": {}
      }
    ],
    "requirePorts": false,
    "versionInfo": {
      "lastScalingAt": "2016-09-06T16:07:36.977Z",
      "lastConfigChangeAt": "2016-09-06T13:33:54.912Z"
    },
    "tasksStaged": 0,
    "tasksRunning": 1,
    "tasksHealthy": 1,
    "tasksUnhealthy": 0,
    "deployments": [],
    "tasks": [
      {
        "id": "prometheus_server.9f5e60f6-7435-11e6-b7c8-70b3d5800001",
        "slaveId": "cdf14879-60f4-484e-8179-b8962c61322e-S5",
        "host": "167.114.254.10",
        "state": "TASK_RUNNING",
        "startedAt": "2016-09-06T16:07:51.070Z",
        "stagedAt": "2016-09-06T16:07:42.513Z",
        "ports": [
          21927,
          21926
        ],
        "version": "2016-09-06T16:07:36.977Z",
        "ipAddresses": [
          {
            "ipAddress": "172.17.0.2",
            "protocol": "IPv4"
          }
        ],
        "localVolumes": [
          {
            "containerPath": "prometheus",
            "persistenceId": "prometheus_server#prometheus#9f5debc5-7435-11e6-b7c8-70b3d5800001"
          }
        ],
        "appId": "/prometheus/server",
        "healthCheckResults": [
          {
            "alive": true,
            "consecutiveFailures": 0,
            "firstSuccess": "2016-09-06T16:08:37.051Z",
            "lastFailure": null,
            "lastSuccess": "2016-09-06T17:02:38.127Z",
            "lastFailureCause": null,
            "taskId": "prometheus_server.9f5e60f6-7435-11e6-b7c8-70b3d5800001"
          }
        ]
      }
    ],
    "lastTaskFailure": {
      "appId": "/prometheus/server",
      "host": "167.114.254.10",
      "message": "Container terminated",
      "state": "TASK_FAILED",
      "taskId": "prometheus_server.9f5e60f6-7435-11e6-b7c8-70b3d5800001",
      "timestamp": "2016-09-06T16:07:38.346Z",
      "version": "2016-09-06T16:07:36.977Z",
      "slaveId": "cdf14879-60f4-484e-8179-b8962c61322e-S5"
    }
  }
}

The Load balancer UI shows the correct mapping btw. it's only the Service view that doesn't

The text was updated successfully, but these errors were encountered:

jdef · 2016-12-07T14:11:50Z

it sounds like this is a DCOS UI bug vs. a Marathon bug

unterstein · 2016-12-07T20:37:26Z

I think the data source for this display is the groups api (the UI always use the groups endpoint) inside the nodes under $appId > container. I don`t think the ui manipulates this data. @wavesoft @orlandohohmeier please confirm :)

unterstein · 2016-12-07T20:47:20Z

Talked to @lloesche, he said that this happens only for resident tasks and this was an ui issue. The api worked as expected. I tried to reproduce this with current testing/master cluster but it did not happen within 20 restarts.

unterstein · 2016-12-08T17:55:40Z

With the following app definition this behavior is reproducible:

{
  "id": "/sleepy",
  "cmd": "sleep 1000",
  "cpus": 1,
  "mem": 128,
  "disk": 100,
  "instances": 1,
  "executor": null,
  "fetch": null,
  "constraints": null,
  "acceptedResourceRoles": null,
  "user": null,
  "container": {
    "docker": {
      "image": "ubuntu",
      "forcePullImage": false,
      "privileged": false,
      "portMappings": [
        {
          "containerPort": 80,
          "protocol": "tcp"
        },
        {
          "containerPort": 443,
          "protocol": "tcp"
        }
      ],
      "network": "BRIDGE"
    },
    "type": "DOCKER",
    "volumes": [
      {
        "containerPath": "data",
        "persistent": {
          "size": 100
        },
        "mode": "RW"
      }
    ]
  },
  "updateStrategy": {
    "maximumOverCapacity": 0,
    "minimumHealthCapacity": 0
  },
  "residency": {
    "relaunchEscalationTimeoutSeconds": 10,
    "taskLostBehavior": "WAIT_FOREVER"
  },
  "healthChecks": null,
  "env": null
}

UI:

{
    "type": "DOCKER",
    "docker": {
        "image": "ubuntu",
        "network": "BRIDGE",
        "port_mappings": [
            {
                "host_port": 24781,
                "container_port": 80,
                "protocol": "tcp"
            },
            {
                "host_port": 24782,
                "container_port": 443,
                "protocol": "tcp"
            }
        ],
        "privileged": false,
        "force_pull_image": false
    }
}

Docker deamon:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                           NAMES
eb9d70971e7f        ubuntu              "/bin/sh -c 'sleep 10"   48 seconds ago      Up 48 seconds       0.0.0.0:24782->80/tcp, 0.0.0.0:24781->443/tcp   mesos-a6893e50-40a4-48cb-b4c2-b9e63bd2a08d-S0.2afc67d1-a6c9-44ae-92c5-784cdc56fcfa

unterstein · 2016-12-09T12:51:03Z

Ok, investigated again:

Marathon re-uses TaskIds for resident Tasks, this means that the mesos state.json will return something like this:

[  
  {  
    "id":"m2.6d62fd63-be00-11e6-ba0c-d23a4f1e7619",
    ...
    "port_mappings":[  
      {  
        "host_port":17294,
        "container_port":80,
        "protocol":"tcp"
      },
      {  
        "host_port":17295,
        "container_port":443,
        "protocol":"tcp"
      }
    ]
  },
  "id":"m2.6d62fd63-be00-11e6-ba0c-d23a4f1e7619",
  ...
  "port_mappings":[  
    {  
      "host_port":17295,
      "container_port":80,
      "protocol":"tcp"
    },
    {  
      "host_port":17294,
      "container_port":443,
      "protocol":"tcp"
    }
  ]
}
]

and when the users clicks on a particular task in the ui, the ui tries to request the information for the task with the id m2.6d62fd63-be00-11e6-ba0c-d23a4f1e7619. But this id is present multiple times within this json array and it is not possible to decide which one to choose to display the data. Closing this in favor of this GH issue, addressing the root cause: #4819

marcomonaco added the bug label Dec 6, 2016

marcomonaco added this to the Marathon 1.4 milestone Dec 6, 2016

aquamatthias added the ready label Dec 6, 2016

jdef added the gui label Dec 7, 2016

unterstein self-assigned this Dec 8, 2016

unterstein closed this as completed Dec 8, 2016

unterstein removed the ready label Dec 8, 2016

unterstein reopened this Dec 8, 2016

unterstein closed this as completed Dec 9, 2016

mesosphere locked and limited conversation to collaborators Mar 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ports values were swapped in the UI based on Mesos data #4781

Ports values were swapped in the UI based on Mesos data #4781

marcomonaco commented Dec 6, 2016 •

edited by jdef

Loading

jdef commented Dec 7, 2016

unterstein commented Dec 7, 2016 •

edited

Loading

unterstein commented Dec 7, 2016

unterstein commented Dec 8, 2016 •

edited

Loading

unterstein commented Dec 9, 2016

Ports values were swapped in the UI based on Mesos data #4781

Ports values were swapped in the UI based on Mesos data #4781

Comments

marcomonaco commented Dec 6, 2016 • edited by jdef Loading

jdef commented Dec 7, 2016

unterstein commented Dec 7, 2016 • edited Loading

unterstein commented Dec 7, 2016

unterstein commented Dec 8, 2016 • edited Loading

unterstein commented Dec 9, 2016

marcomonaco commented Dec 6, 2016 •

edited by jdef

Loading

unterstein commented Dec 7, 2016 •

edited

Loading

unterstein commented Dec 8, 2016 •

edited

Loading