Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job stopped and not restarted when vault timeout #8556

Open
chris93111 opened this issue Jul 29, 2020 · 1 comment
Open

job stopped and not restarted when vault timeout #8556

chris93111 opened this issue Jul 29, 2020 · 1 comment

Comments

@chris93111
Copy link

chris93111 commented Jul 29, 2020

Nomad version

Nomad v0.11.3 (8918fc8)

Operating system and Environment details

redhat 7

Issue

when vault request timeout the job down and he not try to restart or recheck vault available , job need to be restarted manually to work again

Reproduction steps

start job and stop vault

Job file (if appropriate)

group "node" {
restart {
attempts = 3
delay = "120s"
}
task "web" {
template {
data = <<EOF
SECRET_KEY = "{{with secret "config/production"}}{{.Data.data.secret_key}}{{end}}"
DATABASE_NAME = "{{with secret "config/production"}}{{.Data.data.pg_db_name}}{{end}}"
DATABASE_PASSWORD = "{{with secret "config/production"}}{{.Data.data.pg_db_pass}}{{end}}"
DATABASE_USER = "{{with secret "config/production"}}{{.Data.data.pg_db_user}}{{end}}"
DATABASE_PORT = "{{with secret "config/production"}}{{.Data.data.pg_db_port}}{{end}}"
DATABASE_HOST = "{{with secret "config/production"}}{{.Data.data.pg_db_host}}{{end}}"
ADMIN_PASSWORD = "{{with secret "config/production"}}{{.Data.data.admin_default_pass}}{{end}}"
VERSION = "{{with secret "config/production"}}{{.Data.data.version}}{{end}}"

    EOF
    
    destination = "secrets/file.env"
    change_mode = "restart"
    env = true
  }

Nomad Server logs (if appropriate)

Jul 28 09:38:11 nomad01 nomad[14097]: 2020/07/28 09:38:11.533272 [WARN] (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 2 after "500ms")
Jul 28 09:38:11 nomad01 nomad[14097]: (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 2 after "500ms")
Jul 28 09:38:12 nomad01 nomad[14097]: (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 3 after "1s")
Jul 28 09:38:12 nomad01 nomad[14097]: 2020/07/28 09:38:12.035696 [WARN] (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 3 after "1s")
Jul 28 09:38:13 nomad01 nomad[14097]: 2020/07/28 09:38:13.037049 [WARN] (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 4 after "2s")
Jul 28 09:38:13 nomad01 nomad[14097]: (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 4 after "2s")
Jul 28 09:38:15 nomad01 nomad[14097]: 2020/07/28 09:38:15.038490 [WARN] (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 5 after "4s")
Jul 28 09:38:15 nomad01 nomad[14097]: (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 5 after "4s")
Jul 28 09:38:19 nomad01 nomad[14097]: 2020/07/28 09:38:19.039867 [WARN] (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 6 after "8s")
Jul 28 09:38:19 nomad01 nomad[14097]: (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 6 after "8s")
Jul 28 09:38:27 nomad01 nomad[14097]: (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 7 after "16s")
Jul 28 09:38:27 nomad01 nomad[14097]: 2020/07/28 09:38:27.043118 [WARN] (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 7 after "16s")
Jul 28 09:38:43 nomad01 nomad[14097]: (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 8 after "32s")
Jul 28 09:38:43 nomad01 nomad[14097]: 2020/07/28 09:38:43.047283 [WARN] (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 8 after "32s")
Jul 28 09:39:15 nomad01 nomad[14097]: 2020/07/28 09:39:15.048652 [WARN] (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 9 after "1m0s")
Jul 28 09:39:15 nomad01 nomad[14097]: (view) catalog.nodes: Unexpected response code: 500 (No known Consul servers) (retry attempt 9 after "1m0s")
Jul 28 09:39:37 nomad01 nomad[14097]: client.fingerprint_mgr.vault: Vault is unavailable
Jul 28 09:39:37 nomad01 nomad[14097]: 2020-07-28T09:39:37.461+0200 [INFO] client.fingerprint_mgr.vault: Vault is unavailable
Jul 28 09:39:43 nomad01 nomad[14097]: 2020-07-28T09:39:43.708+0200 [INFO] client: node registration complete
Jul 28 09:43:15 nomad01 nomad[14097]: 2020/07/28 09:43:15.639889 [INFO] (runner) stopping
Jul 28 09:43:15 nomad01 nomad[14097]: 2020-07-28T09:43:15.839+0200 [INFO] client.gc: marking allocation for GC: alloc_id=cdab424e-0d9c-ec2b-8a49-43f93ba26dae
Jul 28 09:43:15 nomad01 nomad[14097]: client.gc: marking allocation for GC: alloc_id=cdab424e-0d9c-ec2b-8a49-43f93ba26dae
Jul 28 09:44:21 nomad01 nomad[14097]: nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "c7fdac8a-694c-1cf5-1112-ff64db63a165", node: "783807b5-
Jul 28 09:44:21 nomad01 nomad[14097]: 2020-07-28T09:44:21.005+0200 [WARN] nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "c7fdac8a-694c-1cf5-
Jul 28 09:44:21 nomad01 nomad[14097]: 2020-07-28T09:44:21.005+0200 [WARN] nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "23fee0b1-0700-2e64-
Jul 28 09:44:21 nomad01 nomad[14097]: 2020-07-28T09:44:21.005+0200 [WARN] nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "cdab424e-0d9c-ec2b-
Jul 28 09:44:21 nomad01 nomad[14097]: 2020-07-28T09:44:21.005+0200 [WARN] nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "bcfe9a88-6ac3-f2d1-
Jul 28 09:44:21 nomad01 nomad[14097]: nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "23fee0b1-0700-2e64-831b-f29a3fcf0387", node: "d003b8ed-
Jul 28 09:44:21 nomad01 nomad[14097]: 2020-07-28T09:44:21.005+0200 [WARN] nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "859ebf6b-7c18-8ff1-
Jul 28 09:44:21 nomad01 nomad[14097]: nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "cdab424e-0d9c-ec2b-8a49-43f93ba26dae", node: "0d509c3e-
Jul 28 09:44:21 nomad01 nomad[14097]: nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "bcfe9a88-6ac3-f2d1-ce7a-9a4ed47956e2", node: "d7051094-
Jul 28 09:44:21 nomad01 nomad[14097]: nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "859ebf6b-7c18-8ff1-d624-ceadcc48b86d", node: "3a3d4f32-
Jul 28 09:57:14 nomad01 nomad[14097]: nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "aae00bd5-34fa-3713-4b43-8436e0c4c6bf", node: "3a3d4f32-
Jul 28 09:57:14 nomad01 nomad[14097]: 2020-07-28T09:57:14.729+0200 [WARN] nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "aae00bd5-34fa-3713-
Jul 28 09:57:14 nomad01 nomad[14097]: 2020-07-28T09:57:14.730+0200 [WARN] nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "936703ca-7462-ea0d-
Jul 28 09:57:14 nomad01 nomad[14097]: nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "936703ca-7462-ea0d-4580-29f595559c55", node: "d7051094-
Jul 28 09:57:47 nomad01 nomad[14097]: nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "e4e74e06-560e-5b4f-3f41-6b213c0a8e1b", node: "d003b8ed-
Jul 28 09:57:47 nomad01 nomad[14097]: 2020-07-28T09:57:47.516+0200 [WARN] nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "e4e74e06-560e-5b4f-
Jul 28 09:59:39 nomad01 nomad[14097]: 2020-07-28T09:59:39.270+0200 [WARN] nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "e4a07819-fcc6-9b7f-
Jul 28 09:59:39 nomad01 nomad[14097]: nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "e4a07819-fcc6-9b7f-2a98-522f99758b3b", node: "783807b5-
Jul 28 10:03:40 nomad01 nomad[14097]: 2020-07-28T10:03:40.706+0200 [INFO] client.fingerprint_mgr.vault: Vault is available
Jul 28 10:03:40 nomad01 nomad[14097]: client.fingerprint_mgr.vault: Vault is available
Jul 28 10:03:46 nomad01 nomad[14097]: client: node registration complete
Jul 28 10:03:46 nomad01 nomad[14097]: 2020-07-28T10:03:46.727+0200 [INFO] client: node registration complete
Jul 28 10:49:09 nomad01 nomad[14097]: client.gc: garbage collecting allocation: alloc_id=cdab424e-0d9c-ec2b-8a49-43f93ba26dae reason="forced collection"

Nomad Client logs (if appropriate)

l 28 09:55:55 node01 nomad[15942]: (view) vault.read(config/production): vault.read(config/production): Get "http:/vault:8200/v1/daXXXXXXXXXXXXXXXXXX i/o timeout
Jul 28 09:55:55 node01 nomad[15942]: 2020/07/28 09:55:55.751116 [WARN] (view) vault.read(config/production): vault.read(config/production): Get "http://vault
Jul 28 09:55:55 node01 nomad[15942]: 2020/07/28 09:55:55.751118 [WARN] (view) vault.read(config/production): vault.read(config/production): Get "http://vault
Jul 28 09:55:55 node01 nomad[15942]: 2020/07/28 09:55:55.751210 [ERR] (runner) watcher reported error: vault.read(config/production): vault.read(config/produc XXXXXXXXXXX i/o timeout
Jul 28 09:55:55 node01 nomad[15942]: (view) vault.read(config/production): vault.read(config/production): Get "http:/vault:8200/v1/da
Jul 28 09:55:55 node01 nomad[15942]: (view) vault.read(config/production): vault.read(config/production): Get "http:/vault:8200/v1/da
Jul 28 09:55:55 node01 nomad[15942]: (runner) watcher reported error: vault.read(config/production): vault.read(config/production): Get "http://vault
Jul 28 09:56:00 node01 nomad[15942]: client.driver_mgr.docker: stopped container: container_id=d7005b117baa2520d93913e676adba5fb7d6336ee218af6fba228af90e2c8616 driver=docker
Jul 28 09:56:00 node01 nomad[15942]: 2020-07-28T09:56:00.944+0200 [INFO] client.driver_mgr.docker: stopped container: container_id=d7005b117baa2520d93913e676adba5fb7d6336ee218af6fba228af90e2c8616
Jul 28 09:56:00 node01 nomad[15942]: (runner) stopping
Jul 28 09:56:00 node01 nomad[15942]: 2020/07/28 09:56:00.990275 [INFO] (runner) stopping
Jul 28 09:56:01 node01 nomad[15942]: (runner) stopping
Jul 28 09:56:01 node01 nomad[15942]: 2020/07/28 09:56:01.296966 [INFO] (runner) stopping
Jul 28 09:56:01 node01 nomad[15942]: 2020/07/28 09:56:01.297036 [INFO] (runner) received finish
Jul 28 09:56:01 node01 nomad[15942]: (runner) received finish
Jul 28 09:56:03 node01 nomad[15942]: 2020/07/28 09:56:03.506544 [INFO] (runner) stopping
Jul 28 09:56:03 node01 nomad[15942]: (runner) stopping
Jul 28 09:56:03 node01 nomad[15942]: 2020/07/28 09:56:03.506680 [INFO] (runner) received finish
Jul 28 09:56:03 node01 nomad[15942]: 2020-07-28T09:56:03.506+0200 [INFO] client.gc: marking allocation for GC: alloc_id=aae00bd5-34fa-3713-4b43-8436e0c4c6bf
Jul 28 09:56:03 node01 nomad[15942]: (runner) received finish

@chris93111
Copy link
Author

#2689

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs Roadmapping
Development

No branches or pull requests

2 participants