Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

several instances behind layer 4 load-balancer / new "event" when challenges are created #237

Closed
daum3ns opened this issue Jan 28, 2021 · 6 comments

Comments

@daum3ns
Copy link

daum3ns commented Jan 28, 2021

as discussed a few times, it is quite hard to set up mod_md when running multiple instances behind a load balancer.

in our current setup, certificates are renewed manually and the distribution is then done via our configuration management tool..
now we want to enable mod_md to automate cert renewals and have ocsp handled correctly.

the main problem is:

  • as the certificate renewal request is initiated by apache, and there is a load balancer in front, it is not guaranteed that the follow-up requests for the challenge end up on the same host.

now, i see two possible approaches for this problem:

  1. share the knowledge between the hosts, so that it doesn't matter which one will answer the challenge
  2. ensure that the follow-up requests alway reach the correct host..

1
the latest work on the "event like" behavior of MDMessageCmd, especially the new "renewing" event allows to share the knowledge between the hosts via NFS as @root360-AndreasUlm described here: #234 (comment)

Now when NFS is not an option, the problem still exists, because the "renewing" event is triggered before the challenges are created, so it is not possible to manually distribute it among the nodes (via tha called script) at this time.

With another new event "challange available" introduced, our workflow would look like this:

  • a node starts the renewal process, the script then gets called with the new "challenge" event
  • the script can now trigger synchronization of the MDStoreDir among the nodes.
  • at the point the script returns all nodes are ready to answer the challenge
  • the node that answers the challenge triggers the synchronization again after successful creation of the certificates
  • step by step the nodes are reloaded.

then, by slightly different renew windows, we can avoid the situation where this whole process starts on two nodes simultaneously, but we are still able to renew certificate as long as one host is up and running...

so i think this would be a solution..

2

the second approach seems almost impossible. i was thinking about some combination of:

  • a new directive to define a "acme master node" on all other nodes
  • an ap_internal_login_redirect to a special location (which exists on each md.enabled vhost), in case a "non master node" gets a challenge request
  • and in this location a proxypass to the "acme master node"

but i'm not sure whether it is possible for mod_md to detect an "unexpected" challenge request. also it is a problem in case the "master node" is unavailable..

@icing
Copy link
Owner

icing commented Jan 28, 2021

I think cluster setups are very specific and as long as the server is not aware of them, what can a little module do?

Therefore I prefer the approach where MDMessageCmd gets invoked when a challenge has been created (and maybe also when a challenge is done, I have to look into the code more for this). The script needs only to return when the cluster sync is done.

As you wrote, there is ideally only one cluster node that is attempting a renewal at the same time. To prevent simultanous efforts, the new "renewing" event can suppress it. How you want to realize that behaviour on your cluster, you can judge better than me.

Maybe you could synchronize that info as well and attach a timestamp on it, so that a burning node will not block renewals indefinitely.

@icing
Copy link
Owner

icing commented Feb 2, 2021

In v2.3.7 BETA release I added the challenge-setup event that calls MDMessageCmd when the files have been created, but before the ACME server is asked to verify the challenges.

That means MDMessageCmd can distribute the files in your cluster and the module will continue when the command is done. For the exact event sent, please see the description in the README.md.

@daum3ns
Copy link
Author

daum3ns commented Sep 8, 2021

@icing while testing i see the following behaviour:
when the script returns an error (i.e. not 0) mod_md still continues to renew the certificate, and will call the script again shortly after with the "renewed" event. Is this intentionally? in my case the script returns an error if for some reason distribution to another node failed, so i would like to abort the process at this time..

@icing
Copy link
Owner

icing commented Sep 8, 2021

You are correct. Right now, the 'challenge-setup' is an event without a return code. Someone is listening on that to trigger the script. But the return code does not propagate back into the process.

I agree that this should be changed.

icing pushed a commit that referenced this issue Sep 17, 2021
…!= 0 exit),

   the renewal process is aborted and an error is reported for the MDomain.
   As discussed in #237, this provides scripts that distribute information
   in a cluster to abort early with bothering an ACME server to validate
   a dns name that will not work. The common retry logic will make another
   attempt in the future, as with other failures.
@icing
Copy link
Owner

icing commented Sep 17, 2021

I changed this in master and also added a test case for it. Will make a new release soon.

@icing
Copy link
Owner

icing commented Sep 17, 2021

Release in v2.4.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants