Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.20.0 -once doesn't wait for templates to finish #1196

Closed
errnoh opened this issue Mar 5, 2019 · 14 comments
Closed

v0.20.0 -once doesn't wait for templates to finish #1196

errnoh opened this issue Mar 5, 2019 · 14 comments
Labels
Milestone

Comments

@errnoh
Copy link

errnoh commented Mar 5, 2019

Consul Template version

consul-template v0.20.0 (b709612)

Configuration

reload_signal = "SIGHUP"
kill_signal = "SIGINT"
max_stale = "10m"
log_level = "warn"
pid_file = "/tmp/consul-template.pid"

wait {
  min = "5s"
  max = "10s"
}

vault {
  address = "http://127.0.0.1:8200"
  grace = "5m"
  unwrap_token = false
  renew_token = false

  retry {
    enabled = true
    attempts = 0
    backoff = "250ms"
    max_backoff = "1m"
  }
}

template {
        contents = "{{ with secret \"pki_test/issue/foobar\" \"common_name=foobar.baz.consul\" \"alt_names=localhost\" \"ttl=10h\" \"ip_sans=192.168.1.2,127.0.0.1\" }}{{ .Data.certificate }}\n{{ .Data.issuing_ca }}{{ end }}"
        destination      = "/tmp/foobar.pem"
        perms = 0644
}
# Template embedded in the config, but same result happens even if using a template file.

Command

consul-template -once -log-level=trace -config="/tmp/consul-template.hcl"

Debug output

2019/03/05 12:15:40.814135 [INFO] (runner) creating watcher
2019/03/05 12:15:40.814995 [INFO] (runner) starting
2019/03/05 12:15:40.815926 [INFO] creating pid file at "/tmp/consul-template.pid"
2019/03/05 12:15:40.816414 [DEBUG] (runner) running initial templates
2019/03/05 12:15:40.816695 [DEBUG] (runner) initiating run
2019/03/05 12:15:40.817154 [DEBUG] (runner) checking template c8d01b3b00a0457ee4318bc007dc95f4
2019/03/05 12:15:40.817833 [DEBUG] (runner) was not watching 1 dependencies
2019/03/05 12:15:40.818064 [DEBUG] (watcher) adding vault.write(pki_test/issue/foobar -> 3386df62)
2019/03/05 12:15:40.818956 [TRACE] (watcher) vault.write(pki_test/issue/foobar -> 3386df62) starting
2019/03/05 12:15:40.819121 [DEBUG] (runner) diffing and updating dependencies
2019/03/05 12:15:40.819860 [DEBUG] (runner) enabling global quiescence for "c8d01b3b00a0457ee4318bc007dc95f4"
2019/03/05 12:15:40.819892 [DEBUG] (runner) watching 1 dependencies
2019/03/05 12:15:40.820053 [TRACE] (view) vault.write(pki_test/issue/foobar -> 3386df62) starting fetch
2019/03/05 12:15:40.820116 [TRACE] vault.write(pki_test/issue/foobar -> 3386df62): PUT /v1/pki_test/issue/foobar?stale=true&wait=1m0s
2019/03/05 12:15:41.028365 [TRACE] (view) vault.write(pki_tests/issue/foobar -> 3386df62) marking successful data response
2019/03/05 12:15:41.028447 [TRACE] (view) vault.write(pki_test/issue/foobar -> 3386df62) successful contact, resetting retries
2019/03/05 12:15:41.028484 [TRACE] (view) vault.write(pki_test/issue/foobar -> 3386df62) received data
2019/03/05 12:15:41.028518 [DEBUG] (runner) receiving dependency vault.write(pki_test/issue/foobar -> 3386df62)
2019/03/05 12:15:41.028549 [DEBUG] (runner) initiating run
2019/03/05 12:15:41.028572 [DEBUG] (runner) checking template c8d01b3b00a0457ee4318bc007dc95f4
2019/03/05 12:15:41.028798 [DEBUG] (runner) diffing and updating dependencies
2019/03/05 12:15:41.028813 [DEBUG] (runner) vault.write(pki_test/issue/foobar -> 3386df62) is still needed
2019/03/05 12:15:41.028825 [DEBUG] (runner) watching 1 dependencies
2019/03/05 12:15:41.028833 [DEBUG] (runner) all templates rendered
2019/03/05 12:15:41.028842 [INFO] (runner) once mode and all templates rendered
2019/03/05 12:15:41.028849 [INFO] (runner) stopping
2019/03/05 12:15:41.028856 [DEBUG] (runner) stopping watcher
2019/03/05 12:15:41.028863 [DEBUG] (watcher) stopping all views
2019/03/05 12:15:41.028870 [TRACE] (watcher) stopping vault.write(pki_test/issue/foobar -> 3386df62)
2019/03/05 12:15:41.028923 [DEBUG] removing pid file at "/tmp/consul-template.pid"

Running without -once pauses for five seconds after the all templates rendered line and then continues with the following:

2019/03/05 12:18:50.846151 [DEBUG] (runner) all templates rendered
2019/03/05 12:18:55.846404 [DEBUG] (runner) received template "c8d01b3b00a0457ee4318bc007dc95f4" from quiescence
2019/03/05 12:18:55.846435 [DEBUG] (runner) initiating run
2019/03/05 12:18:55.846445 [DEBUG] (runner) checking template c8d01b3b00a0457ee4318bc007dc95f4
2019/03/05 12:18:55.846682 [DEBUG] (runner) rendering "(dynamic)" => "/tmp/foobar.pem"
2019/03/05 12:18:55.849289 [INFO] (runner) rendered "(dynamic)" => "/tmp/foobar.pem"
2019/03/05 12:18:55.849309 [DEBUG] (runner) diffing and updating dependencies
2019/03/05 12:18:55.849319 [DEBUG] (runner) vault.write(pki_test/issue/foobar -> 3386df62) is still needed
2019/03/05 12:18:55.849326 [DEBUG] (runner) enabling global quiescence for "c8d01b3b00a0457ee4318bc007dc95f4"
2019/03/05 12:18:55.849332 [DEBUG] (runner) watching 1 dependencies
2019/03/05 12:18:55.849336 [DEBUG] (runner) all templates rendered

This results with the template properly rendered to target location.

Expected behavior

/tmp/foobar.pem is written to disk. This works fine in v0.19.5.

Actual behavior

consul-template -once exits before the template is rendered.

Steps to reproduce

  1. Setup simple PKI
  2. Try to render the template with consul-template -once with version v0.20.0

References

@errnoh
Copy link
Author

errnoh commented Mar 5, 2019

Tested this against master ( ebab455 ), not working there either.

@errnoh
Copy link
Author

errnoh commented Mar 5, 2019

git bisect: ef54f96 is the first bad commit

@errnoh
Copy link
Author

errnoh commented Mar 5, 2019

@pearkes any ideas what could be causing this?

@errnoh
Copy link
Author

errnoh commented Mar 9, 2019

Did some testing and turns out -once is returning before draining quiescenceCh:

errnoh@a9939ab

(The branch is not really PR quality, but provides a proof-of-concept for fixing the bug.)

EDIT:
one of the main issues seems to be that allTemplatesRendered() considers event.ForQuiescence to already be rendered and there's an end condition if once and allTemplatesRendered()

@errnoh
Copy link
Author

errnoh commented Mar 9, 2019

Another view of the issue is this: errnoh@633c4f9 (this change fixes my issue and enables -once render everything properly before returning.

(But this probably causes issues somewhere else in the code)

@pearkes
Copy link
Contributor

pearkes commented Mar 14, 2019

@errnoh interesting, yeah when I tried similar approaches was met with breaking other things that assumed this loop. The test suite is pretty solid, so running that on your branch should validate beyond the use-case quite well.

@adjavaherian
Copy link

adjavaherian commented Apr 15, 2019

Seeing the same behavior with --once. Looks like process completes properly without actually writing output. Same config produces output without --once flag. Interestingly -once will not complain about write permissions on output either. In this case /etc/haproxy/haproxy.cfg should not be writeable and consul-template recognizes this without -once flag.

2019/04/15 15:45:38.195442 [INFO] (runner) creating watcher
2019/04/15 15:45:38.210924 [INFO] (runner) starting
2019/04/15 15:45:38.210936 [DEBUG] (runner) running initial templates
2019/04/15 15:45:38.210941 [DEBUG] (runner) initiating run
2019/04/15 15:45:38.210976 [DEBUG] (runner) checking template 16d5900abcded9dbd7
2019/04/15 15:45:38.211208 [DEBUG] (runner) was not watching 1 dependencies
2019/04/15 15:45:38.211226 [DEBUG] (watcher) adding kv.block(web/webapps/busybox)
2019/04/15 15:45:38.211232 [TRACE] (watcher) kv.block(web/webapps/busybox) starting
2019/04/15 15:45:38.211238 [DEBUG] (runner) checking template cf132b31147013b714
2019/04/15 15:45:38.211325 [DEBUG] (runner) was not watching 1 dependencies
2019/04/15 15:45:38.211331 [DEBUG] (watcher) adding kv.block(web/test)
2019/04/15 15:45:38.211336 [TRACE] (watcher) kv.block(web/test) starting
2019/04/15 15:45:38.211341 [DEBUG] (runner) diffing and updating dependencies
2019/04/15 15:45:38.211351 [DEBUG] (runner) enabling global quiescence for "10590030d69abcded9dbd7"
2019/04/15 15:45:38.211357 [DEBUG] (runner) enabling global quiescence for "cf1320c71147013b714"
2019/04/15 15:45:38.211361 [DEBUG] (runner) watching 2 dependencies
2019/04/15 15:45:38.211404 [TRACE] (view) kv.block(web/test) starting fetch
2019/04/15 15:45:38.211424 [TRACE] kv.block(web/test): GET /v1/kv/web/test?stale=true&wait=1m0s
2019/04/15 15:45:38.211897 [TRACE] (view) kv.block(web/webapps/busybox) starting fetch
2019/04/15 15:45:38.211912 [TRACE] kv.block(web/webapps/busybox): GET /v1/kv/web/webapps/busybox?stale=true&wait=1m0s
2019/04/15 15:45:38.228984 [TRACE] kv.block(web/webapps/busybox): returned "10.x.x.x:80"
2019/04/15 15:45:38.228998 [TRACE] (view) kv.block(web/webapps/busybox) marking successful data response
2019/04/15 15:45:38.229006 [TRACE] (view) kv.block(web/webapps/busybox) successful contact, resetting retries
2019/04/15 15:45:38.229011 [TRACE] (view) kv.block(web/webapps/busybox) received data
2019/04/15 15:45:38.229018 [DEBUG] (runner) receiving dependency kv.block(web/webapps/busybox)
2019/04/15 15:45:38.229028 [DEBUG] (runner) initiating run
2019/04/15 15:45:38.229032 [DEBUG] (runner) checking template 16d0c605ded9dbd7
2019/04/15 15:45:38.229143 [DEBUG] (runner) checking template cf1320c713b714
2019/04/15 15:45:38.229219 [DEBUG] (runner) missing data for 1 dependencies
2019/04/15 15:45:38.229225 [DEBUG] (runner) diffing and updating dependencies
2019/04/15 15:45:38.229229 [DEBUG] (runner) kv.block(web/webapps/busybox) is still needed
2019/04/15 15:45:38.229233 [DEBUG] (runner) kv.block(web/test) is still needed
2019/04/15 15:45:38.229238 [DEBUG] (runner) watching 2 dependencies
2019/04/15 15:45:38.229295 [TRACE] kv.block(web/test): returned "foozballzzzz"
2019/04/15 15:45:38.229301 [TRACE] (view) kv.block(web/test) marking successful data response
2019/04/15 15:45:38.229307 [TRACE] (view) kv.block(web/test) successful contact, resetting retries
2019/04/15 15:45:38.229311 [TRACE] (view) kv.block(web/test) received data
2019/04/15 15:45:38.229317 [DEBUG] (runner) receiving dependency kv.block(web/test)
2019/04/15 15:45:38.229321 [DEBUG] (runner) initiating run
2019/04/15 15:45:38.229324 [DEBUG] (runner) checking template 16d0c6059ed9dbd7
2019/04/15 15:45:38.229415 [DEBUG] (runner) checking template cf1320c785d73b714
2019/04/15 15:45:38.229488 [DEBUG] (runner) diffing and updating dependencies
2019/04/15 15:45:38.229494 [DEBUG] (runner) kv.block(web/webapps/busybox) is still needed
2019/04/15 15:45:38.229498 [DEBUG] (runner) kv.block(web/test) is still needed
2019/04/15 15:45:38.229503 [DEBUG] (runner) watching 2 dependencies
2019/04/15 15:45:38.229506 [DEBUG] (runner) all templates rendered
2019/04/15 15:45:38.229510 [INFO] (runner) once mode and all templates rendered
2019/04/15 15:45:38.229513 [INFO] (runner) stopping
2019/04/15 15:45:38.229517 [DEBUG] (runner) stopping watcher
2019/04/15 15:45:38.229520 [DEBUG] (watcher) stopping all views
2019/04/15 15:45:38.229524 [TRACE] (watcher) stopping kv.block(web/webapps/busybox)
2019/04/15 15:45:38.229528 [TRACE] (watcher) stopping kv.block(web/test)

note: hashes above have been munged. to be clear, works fine in daemon mode.

@karl-tpio
Copy link

I am also getting this same issue. I have a few consul-template config files that are meant to fetch certificates from vault/pki and write them to disk.

## There are no files on disk, as expected
root@com:/etc/vault.d/tls# pwd
/etc/vault.d/tls
root@com:/etc/vault.d/tls# ls -lah
total 8.0K
drwxr-x--- 2 vault vault 4.0K Apr 25 21:35 .
drwxr-x--- 3 vault vault 4.0K Apr 25 21:25 ..

## Run consul-template to get the files on disk
root@com:/etc/vault.d/tls# /usr/local/bin/consul-template -once -config=/etc/consul-template.d/config.hcl -config=/etc/consul-template.d/templates/vault-server-tls.hcl -config=/etc/consul-template.d/templates/consul-agent-tls.hcl
2019/04/25 21:37:11.808507 [INFO] consul-template v0.20.0 (b709612c)
2019/04/25 21:37:11.808523 [INFO] (runner) creating new runner (dry: false, once: true)


2019/04/25 21:37:11.808782 [DEBUG] (runner) final config: <omitted, can provide if needed>


2019/04/25 21:37:11.808848 [INFO] (runner) creating watcher
2019/04/25 21:37:11.808872 [DEBUG] (watcher) adding vault-agent.token
2019/04/25 21:37:11.808881 [TRACE] (watcher) vault-agent.token starting
2019/04/25 21:37:11.809033 [TRACE] (view) vault-agent.token starting fetch
2019/04/25 21:37:11.809046 [TRACE] vault-agent.token: READ /root/.vault-token
2019/04/25 21:37:11.809075 [TRACE] vault-agent.token: reported change
2019/04/25 21:37:11.809118 [TRACE] (view) vault-agent.token marking successful data response
2019/04/25 21:37:11.809124 [TRACE] (view) vault-agent.token successful contact, resetting retries
2019/04/25 21:37:11.809128 [TRACE] (view) vault-agent.token received data
2019/04/25 21:37:11.809137 [INFO] (runner) starting
2019/04/25 21:37:11.809141 [DEBUG] (runner) running initial templates
2019/04/25 21:37:11.809143 [DEBUG] (runner) initiating run
2019/04/25 21:37:11.809166 [DEBUG] (runner) checking template 5dbdded7c8b61ac3d1a3d48772b341e4
2019/04/25 21:37:11.809455 [DEBUG] (runner) was not watching 1 dependencies
2019/04/25 21:37:11.809464 [DEBUG] (watcher) adding vault.write(pki/issue/vault-server-node -> 3be23df0)
2019/04/25 21:37:11.809472 [TRACE] (watcher) vault.write(pki/issue/vault-server-node -> 3be23df0) starting
2019/04/25 21:37:11.809481 [DEBUG] (runner) checking template ef69817fd374d52a02e650a5cf6370ae
2019/04/25 21:37:11.809586 [DEBUG] (runner) missing data for 1 dependencies
2019/04/25 21:37:11.809595 [DEBUG] (runner) checking template 24eb969ef5b8d0e163f317ebb41cafe5
2019/04/25 21:37:11.809648 [TRACE] (view) vault.write(pki/issue/vault-server-node -> 3be23df0) starting fetch
2019/04/25 21:37:11.809708 [TRACE] vault.write(pki/issue/vault-server-node -> 3be23df0): PUT /v1/pki/issue/vault-server-node?stale=true&wait=1m0s
2019/04/25 21:37:11.809711 [DEBUG] (runner) missing data for 1 dependencies
2019/04/25 21:37:11.809875 [DEBUG] (runner) diffing and updating dependencies
2019/04/25 21:37:11.809916 [DEBUG] (runner) enabling global quiescence for "5dbdded7c8b61ac3d1a3d48772b341e4"
2019/04/25 21:37:11.809960 [DEBUG] (runner) enabling global quiescence for "ef69817fd374d52a02e650a5cf6370ae"
2019/04/25 21:37:11.810003 [DEBUG] (runner) enabling template-specific quiescence for "24eb969ef5b8d0e163f317ebb41cafe5"
2019/04/25 21:37:11.810044 [DEBUG] (runner) watching 2 dependencies
2019/04/25 21:37:11.810087 [DEBUG] (runner) initiating run
2019/04/25 21:37:11.810123 [DEBUG] (runner) checking template 5dbdded7c8b61ac3d1a3d48772b341e4
2019/04/25 21:37:11.810294 [DEBUG] (runner) missing data for 1 dependencies
2019/04/25 21:37:11.810341 [DEBUG] (runner) checking template ef69817fd374d52a02e650a5cf6370ae
2019/04/25 21:37:11.810514 [DEBUG] (runner) missing data for 1 dependencies
2019/04/25 21:37:11.810574 [DEBUG] (runner) checking template 24eb969ef5b8d0e163f317ebb41cafe5
2019/04/25 21:37:11.810747 [DEBUG] (runner) missing data for 1 dependencies
2019/04/25 21:37:11.810793 [DEBUG] (runner) diffing and updating dependencies
2019/04/25 21:37:11.810831 [DEBUG] (runner) vault.write(pki/issue/vault-server-node -> 3be23df0) is still needed
2019/04/25 21:37:11.810867 [DEBUG] (runner) watching 2 dependencies
2019/04/25 21:37:12.369992 [TRACE] (view) vault.write(pki/issue/vault-server-node -> 3be23df0) marking successful data response
2019/04/25 21:37:12.370170 [TRACE] (view) vault.write(pki/issue/vault-server-node -> 3be23df0) successful contact, resetting retries
2019/04/25 21:37:12.370183 [TRACE] (view) vault.write(pki/issue/vault-server-node -> 3be23df0) received data
2019/04/25 21:37:12.370203 [DEBUG] (runner) receiving dependency vault.write(pki/issue/vault-server-node -> 3be23df0)
2019/04/25 21:37:12.370217 [DEBUG] (runner) initiating run
2019/04/25 21:37:12.370232 [DEBUG] (runner) checking template 5dbdded7c8b61ac3d1a3d48772b341e4
2019/04/25 21:37:12.370928 [DEBUG] (runner) checking template ef69817fd374d52a02e650a5cf6370ae
2019/04/25 21:37:12.371178 [DEBUG] (runner) checking template 24eb969ef5b8d0e163f317ebb41cafe5
2019/04/25 21:37:12.371410 [DEBUG] (runner) diffing and updating dependencies
2019/04/25 21:37:12.372053 [DEBUG] (runner) vault.write(pki/issue/vault-server-node -> 3be23df0) is still needed
2019/04/25 21:37:12.372131 [DEBUG] (runner) watching 2 dependencies
2019/04/25 21:37:12.372201 [DEBUG] (runner) all templates rendered
2019/04/25 21:37:12.372261 [INFO] (runner) once mode and all templates rendered
2019/04/25 21:37:12.372313 [INFO] (runner) stopping
2019/04/25 21:37:12.372359 [DEBUG] (runner) stopping watcher
2019/04/25 21:37:12.372435 [DEBUG] (watcher) stopping all views
2019/04/25 21:37:12.372480 [TRACE] (watcher) stopping vault-agent.token
2019/04/25 21:37:12.372540 [TRACE] (watcher) stopping vault.write(pki/issue/vault-server-node -> 3be23df0)

<proces quits>

## At this point, i'd expect that `/etc/vault.d/tls` would have three files in it (ca.pem, cert.pem, cert.key). No such luck; there are no files written!

## Config file dump:
root@com:/etc/vault.d/tls# cat /etc/consul-template.d/config.hcl
# basic consul-template config; points to bootstrap vault/consul
consul {
  address = "https://bootstrap:8501"

  retry {
    enabled = true
    attempts = 12
    backoff = "250ms"
    max_backoff = "1m"
  }
  ssl {
    enabled = true
    verify = true
    server_name = "bootstrap"
  }
}


reload_signal = "SIGHUP"
kill_signal = "SIGINT"

max_stale = "10m"

# TODO: revert when i have the -once flag figured out!
log_level = "trace"

wait {
  min = "5s"
  max = "10s"
}

vault {
  address = "https://bootstrap.my-corp.com:8200"
  vault_agent_token_file = "/root/.vault-token"

  ssl {
    enabled = true
    verify  = true
  }
}
root@com:/etc/vault.d/tls# cat /etc/consul-template.d/templates/vault-server-tls.hcl

<three template blocks that all look like the one below>


template {
  # Write this...
  contents = "{{ with secret \"pki/issue/vault-server-node\" \"alt_names=vault.service.us-west-1.consul\" }}{{ .Data.private_key }}{{ end }}"

  # ... to here
  destination = "/etc/vault.d/tls/key.pem"

  command = "chown -R vault:vault /etc/vault.d/tls/"

  command_timeout = "60s"

  # Be loud and cause problems if there's a problem contacting vault!
  error_on_missing_key = true

  perms = 0600

  # If we're rendering a new certificate out... it's because the old one expired or was revoked.
  backup = false

  wait {
    min = "2s"
    max = "10s"
  }
}
root@com:/etc/vault.d/tls# cat /etc/consul-template.d/templates/consul-agent-tls.hcl
<nearly identical to the above>

Please let me know if there's anything else i can do to help.

@sodabrew
Copy link

sodabrew commented May 3, 2019

@pearkes @errnoh What would help in moving this issue and the two suggested-hack changesets to PR-quality status?

@eikenb eikenb added the bug label Jun 15, 2019
@eikenb eikenb added this to the v0.20.1 milestone Jun 15, 2019
@eikenb
Copy link
Contributor

eikenb commented Jun 25, 2019

Hey @errnoh and everyone.

While playing with this I got to thinking about when you would want to use wait and once together and haven't really been able to come up with a good reason. Assuming wait was added to prevent template re-rendering and possible subsequent reloading of the process when a watched value was flapping... how does this make sense when using once?

Basically I'm trying to come up with a reason why the fix for this couldn't just be to make -once disable waits/quiescence. Thanks.

@sodabrew
Copy link

As for my comment on the related cases, I think there’s a complex of problems here ever since the change queue code was refactored around the 0.19.2 release.

Personally I agree that -once could disable -wait; this would put the onus of splay times back onto a caller script for example.

But I don’t think it solves any of the related problems where the quiesce times aren’t being respected or rendering gets stuck, etc.

@eikenb
Copy link
Contributor

eikenb commented Jun 25, 2019

I 100% agree that there are more problems here than just this once/wait issues. I'm not looking to address it here for a couple reasons. First is that I don't think I've had enough time working with the code to really feel confident about tackling that just yet. I'm working up to it. Second, I do think that having once disable wait/quiescence is the best way to address this bug as it eliminates a code path (simplifies things) and is closer to what I'd think the expected behavior would be.

Thanks. The feedback is much appreciated.

@eikenb
Copy link
Contributor

eikenb commented Jun 26, 2019

As for my comment on the related cases, I think there’s a complex of problems here ever since the change queue code was refactored around the 0.19.2 release.

Are there any particular issues, aside from these wait/once ones (#1196, #1207, #1209), that you would call out as fallout from this refactor? Are there behaviors or problems that you'd call out that don't have issues filed for them yet? Thanks.

@sodabrew
Copy link

sodabrew commented Jul 1, 2019

Yes see comment #1043 (comment) for some more details.

Other relevant issues: #988, #991, #995, #998, #1140, #1174, #1195.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants