Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

Task not found response from VGM #39

Closed
ghost opened this issue Jul 11, 2017 · 8 comments
Closed

Task not found response from VGM #39

ghost opened this issue Jul 11, 2017 · 8 comments

Comments

@ghost
Copy link

ghost commented Jul 11, 2017

I'm using DC/OS 1.9 and Vault 0.7.3. When deploying my container an entrypoint sh script is run that tries to grab the wrapped token from VGM but unfortunately I only get a "task not found" error. If I wait for the deploy to finish and then issue the same command manually from one of the nodes, VGM works as expected returning the wrapped token. Got any ideas on this?

@nemosupremo
Copy link
Owner

  1. Can you dump the environment in your entry point script to ensure that the MESOS_TASK_ID variable is coming through?

  2. Another issue is there's a sort of a race condition from between when the task actually starts, and when the tasks' RUNNING status is recognized in mesos. See this comment for how we try and work around this. If this is the issue you are facing, you could just sleep for a bit before trying to get a token.

@ghost
Copy link
Author

ghost commented Jul 11, 2017

  1. This is the output. I echo the MESOS_TASK_ID right at the beginning:

    Starting task php5-base.8dd913f4-65fb-11e7-b921-22898b90ba0e
    php5-base.8dd913f4-65fb-11e7-b921-22898b90ba0e
    ...
    sleeping for 60 seconds
    {"status":"Unsealed","ok":false,"error":"No such task."}
    
  2. I've already tried sleeping for as long as 60 seconds. Nothing changes. Here is part of my script.

     echo $MESOS_TASK_ID;
     echo $VAULT_URL;
     echo $VGM_URL;
    
     #get the temp token from VGM, extract the token with jq and clean the quotes with tr
     #cubbytok=`curl --request POST "$VGM_URL" -d '{"task_id":"$MESOS_TASK_ID"}' | jq '.token' | tr -d '"'`;
     echo "sleeping for 60 seconds";
     sleep 60s;
     cubby_tok=`curl --request POST "$VGM_URL" -d '{"task_id":"$MESOS_TASK_ID"}'`;
     echo $cubby_tok;
    

@ghost
Copy link
Author

ghost commented Jul 11, 2017

The task is also running. I did a curl -iv http://leader.mesos:5050/state.json while it was sleeping and the statuses array turns out OK. See below (the id is for another deployed instance).

   {
      "id": "php5-base.41f631bd-6606-11e7-b921-22898b90ba0e",
      "name": "php5-base",
      "framework_id": "44852eb3-ac9c-4e15-9a50-f30097b65f3b-0000",
      "executor_id": "",
      "slave_id": "44852eb3-ac9c-4e15-9a50-f30097b65f3b-S2",
      "state": "TASK_RUNNING",
      "resources": {
        "disk": 0.0,
        "mem": 1024.0,
        "gpus": 0.0,
        "cpus": 0.2,
        "ports": "[28782-28782]"
      },
      "statuses": [
        {
          "state": "TASK_RUNNING",
          "timestamp": 1499756273.33255,
          "container_status": {
            "container_id": {
              "value": "2bac9204-d7d1-4de6-9924-e1d698e668bf"
            },
            "network_infos": [
              {
                "ip_addresses": [
                  {
                    "ip_address": "10.0.0.134"
                  }
                ]
              }
            ]
          }
        }
      ],
      "discovery": {
        "visibility": "FRAMEWORK",
        "name": "php5-base",
        "ports": {
          "ports": [
            {
              "number": 28782,
              "name": "default",
              "protocol": "tcp"
            }
          ]
        }
      },
      "container": {
        "type": "DOCKER",
        "docker": {
          "image": "10.11.0.10/php5-base",
          "network": "HOST",
          "privileged": false,
          "parameters": [
            {
              "key": "label",
              "value": "MESOS_TASK_ID=php5-base.41f631bd-6606-11e7-b921-22898b90ba0e"
            }
          ],
          "force_pull_image": true
        }
      }
    }

@nemosupremo
Copy link
Owner

Can you also your post the Gatekeeper logs?

@ghost
Copy link
Author

ghost commented Jul 11, 2017

I think I found it. The frameworks > framework > tasks array is empty. It appears that the DC/OS - Mesos guys moved them under completed_tasks and now the state.json looks like this:

...
"frameworks": [
  {
    "id": "44852eb3-ac9c-4e15-9a50-f30097b65f3b-0004",
    "name": "executor",
    "pid": "scheduler-b14bd8bc-f4c5-4a69-a4a3-8a4358fd1016@10.0.1.4:44865",
    "used_resources": {
      "disk": 0.0,
      "mem": 0.0,
      "gpus": 0.0,
      "cpus": 0.0
    },
    "offered_resources": {
      "disk": 0.0,
      "mem": 0.0,
      "gpus": 0.0,
      "cpus": 0.0
    },
    "capabilities": [],
    "hostname": "ip-....ec2.internal",
    "webui_url": "http://10...4:17867/",
    "active": true,
    "connected": true,
    "recovered": false,
    "user": "root",
    "failover_timeout": 0.0,
    "checkpoint": true,
    "registered_time": 1499329105.63113,
    "unregistered_time": 0.0,
    "principal": "jenkins",
    "resources": {
      "disk": 0.0,
      "mem": 0.0,
      "gpus": 0.0,
      "cpus": 0.0
    },
    "role": "*",
    "tasks": [],
    "unreachable_tasks": [],
    "completed_tasks": [
      {**!!!THE TASK IS HERE!!!**}
    ],
    "offers": [],
    "executors": []
  },
  ...

@nemosupremo
Copy link
Owner

I'm not sure if the DC/OS guys did this - this looks like it may have been a change in Mesos 1.2 -
MESOS-6619.

Although it isn't immediately clear to me why they could name that field completed_tasks if the task is currently running, or why the tasks array is now empty.

@ghost
Copy link
Author

ghost commented Jul 11, 2017

You're right, my bad. It's in the tasks array.

@ghost
Copy link
Author

ghost commented Jul 11, 2017

Found it after checking the logs. It was a mistake on my side. I forgot to add some quotes to the curl post payload.

Incorrect

cubby_tok=`curl --request POST "$VGM_URL" -d '{"task_id":"$MESOS_TASK_ID"}' | jq '.token' | tr -d '"'`;

Correct:

cubby_tok=`curl --request POST "$VGM_URL" -d '{"task_id":"'"$MESOS_TASK_ID"'"}' | jq '.token' | tr -d '"'`;

@ghost ghost closed this as completed Jul 11, 2017
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant