Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus does not reload certificates on config reload #4155

Closed
roidelapluie opened this Issue May 11, 2018 · 20 comments

Comments

Projects
None yet
2 participants
@roidelapluie
Copy link
Contributor

roidelapluie commented May 11, 2018

Proposal

When rolling out now SSL private keys or keychain, a SIGHUP in not enough to make Prometheus rename the new ones.

Bug Report

What did you do?

Change private key or key chain and SIGHUP prometheus

What did you expect to see?

Prometheus updates private key/keychain

What did you see instead? Under which circumstances?

Prometheus uses the old keys/keychain

  • Prometheus version:

    2.2.1

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 11, 2018

Would you mind checking this on master? The code has changed recently.

@roidelapluie

This comment has been minimized.

Copy link
Contributor Author

roidelapluie commented Jun 12, 2018

I can not reproduce this

@roidelapluie

This comment has been minimized.

Copy link
Contributor Author

roidelapluie commented Jun 12, 2018

(even in 2.0.0 and 2.2.1). I dont know what happened

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 12, 2018

Thanks for the followup.

@roidelapluie

This comment has been minimized.

Copy link
Contributor Author

roidelapluie commented Jun 15, 2018

This is driving me crazy.

  • It works on my machine
  • It does not work on my 2 prod prometheus servers
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 15, 2018

Could you be bind mounting files rather than directories?

@roidelapluie roidelapluie reopened this Jun 15, 2018

@roidelapluie

This comment has been minimized.

Copy link
Contributor Author

roidelapluie commented Jun 15, 2018

Nope, here it is:

When the target is in a static_config in the prometheus.yml, SIGHUP is sufficient.

When the target is in a sd_file config, then restart is needed. Reproducible.

@roidelapluie

This comment has been minimized.

Copy link
Contributor Author

roidelapluie commented Jun 15, 2018

Now I will try with 2.3.0

@roidelapluie

This comment has been minimized.

Copy link
Contributor Author

roidelapluie commented Jun 15, 2018

Still the case.

It looks like we do not re-read the certificates when there is just file_sd in the job. When I add a static_config and a file_sd, certificate is reloaded.

@roidelapluie

This comment has been minimized.

Copy link
Contributor Author

roidelapluie commented Jun 15, 2018

@brian-brazil Could you be bind mounting files rather than directories?

What do you mean? We are using plain files, on xfs or ext4

@roidelapluie

This comment has been minimized.

Copy link
Contributor Author

roidelapluie commented Jun 15, 2018

Screenshot: One job with sd_file and one with sd_static.

sd_static reloaded the cert, sd_file did not. The target and CA is the same.

foo

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 15, 2018

No, you want to bind mount directories. Bind mounts are tied to inodes, so if you replace the file with a new one any bind mounts won't get updated.

@roidelapluie

This comment has been minimized.

Copy link
Contributor Author

roidelapluie commented Jun 15, 2018

well we do not mount anything really. and we use wildcards for file discovery.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 15, 2018

When you are changing the files, are you changing the filenames in the config file - or just the ca/keys on disk but with the same filenames?

@roidelapluie

This comment has been minimized.

Copy link
Contributor Author

roidelapluie commented Jun 15, 2018

We have that either when we change the CA "in place" OR when we change the CA "in place" and add a new target.

@roidelapluie

This comment has been minimized.

Copy link
Contributor Author

roidelapluie commented Jun 15, 2018

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    scheme: https
    tls_config:
      cert_file: cert.pem
      key_file: key.pem
      ca_file: tls/ca.pem
    file_sd_configs:
          - files:
                    - tls_sql-agent_*.yml

tls_sql-agent_1.yml

- targets:                                                                                                                                
  - 127.0.0.1:9100
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 15, 2018

So I think this is an interaction with #4214. It only works for static_configs due to a bug in static_configs, which is now fixed by #4245.

The code as it stands does not ever reload tls files, they're only loaded when a new target is created. So as it stands if you want to change the CA or cert, you need to change the filename in the config.

@roidelapluie

This comment has been minimized.

Copy link
Contributor Author

roidelapluie commented Jun 15, 2018

Thank you @brian-brazil !!! Renaming the CA seems to be a workaround.

@roidelapluie

This comment has been minimized.

Copy link
Contributor Author

roidelapluie commented Jun 15, 2018

NB: we have 100's of jobs who uses the same CA. Might matter for the implementation of the fix.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 15, 2018

This is more a feature request, and I'm not sure how practical it'll be to implement it as this is unlikely to interact well with reusing http connections across scrapes nor with us avoiding recreating scrape loops on a HUP unless the config file changes. Either it'll end up just working, or you'll always need to change the filename and then HUP.

roidelapluie added a commit to roidelapluie/common that referenced this issue Jun 16, 2018

Ensure configuration reload upon certificate change
Fixes prometheus/prometheus#4155

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

roidelapluie added a commit to roidelapluie/common that referenced this issue Jun 16, 2018

Ensure configuration reload upon certificate change
Fixes prometheus/prometheus#4155

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

roidelapluie added a commit to roidelapluie/common that referenced this issue Jun 16, 2018

Reload TLS certificates when needed
Fixes prometheus/prometheus#4155

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

roidelapluie added a commit to roidelapluie/common that referenced this issue Jun 16, 2018

Reload TLS certificates when needed
Fixes prometheus/prometheus#4155

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.