Race condition? #12

Closed
xdgc opened this Issue Nov 23, 2016 · 6 comments

Comments

Projects
None yet
2 participants
@xdgc

xdgc commented Nov 23, 2016

Thanks for your work on this. It looks very promising.

I'm running -portable on Linux. I've gotten it built and am able to run a command to register and request some certs from the staging service. However, during this run I get the following:

./acme-client -s -vvNnm -C /acme/challenges -c /acme/public -k /acme/private/domain.pem -f /acme/account/account.pem example.com
acme-client: /acme/account/account.pem: generated RSA account key
acme-client: /acme/private/domain.pem: generated RSA domain key
acme-client: https://acme-staging.api.letsencrypt.org/directory: directories
acme-client: acme-staging.api.letsencrypt.org: DNS: 104.96.225.80
acme-client: acme-staging.api.letsencrypt.org: DNS: 2001:418:1416:284::3d5
acme-client: acme-staging.api.letsencrypt.org: DNS: 2001:418:1416:29d::3d5
acme-client: transfer buffer: [{ "key-change": "https://acme-staging.api.letsencrypt.org/acme/key-change", "new-authz": "https://acme-staging.api.letsencrypt.org/acme/new-authz", "new-cert": "https://acme-staging.api.letsencrypt.org/acme/new-cert", "new-reg": "https://acme-staging.api.letsencrypt.org/acme/new-reg", "revoke-cert": "https://acme-staging.api.letsencrypt.org/acme/revoke-cert" }] (372 bytes)
acme-client: https://acme-staging.api.letsencrypt.org/acme/new-authz: req-auth: [mydomain]
acme-client: acme-staging.api.letsencrypt.org: cached
acme-client: acme-staging.api.letsencrypt.org: cached
acme-client: transfer buffer: [{ "type": "urn:acme:error:unauthorized", "detail": "No registration exists matching provided key", "status": 403 }] (120 bytes)
acme-client: bad exit: netproc(15477): 1

It appears that the client is trying to authorize to request certificates before it's registered.

I've only begun digging into what's happening, but in the course of diagnosing, I encountered a Heisenbug: some of my debug printfs caused the whole routine to work. Reducing this, if I put a usleep(500) at the top of xrun(), I get a successful run every time. If I remove the usleep(), it fails every time. (I am clearing and replacing the whole of /acme with each run.)

This begins to smell of a race condition, so I thought I'd go ahead and report it even though I don't have a solution yet.

@kristapsdz

This comment has been minimized.

Show comment
Hide comment
@kristapsdz

kristapsdz Nov 23, 2016

Owner

Yes, I think you're right. I'll implement a fix as soon as I can. (Basically, the key is created and then the process checks if it's there: if it is, the -N or -n flags are ignored.)

As a workaround, you can create the keys yourself and not use the -N or -n flags for now. This is only blowing up now because fork+exec causes the check to occur twice.

Owner

kristapsdz commented Nov 23, 2016

Yes, I think you're right. I'll implement a fix as soon as I can. (Basically, the key is created and then the process checks if it's there: if it is, the -N or -n flags are ignored.)

As a workaround, you can create the keys yourself and not use the -N or -n flags for now. This is only blowing up now because fork+exec causes the check to occur twice.

@xdgc

This comment has been minimized.

Show comment
Hide comment
@xdgc

xdgc Nov 23, 2016

That's what it seemed like, but orchestrating a fix among the co-processes is probably better left to you than me. :) Thanks, much appreciated!

xdgc commented Nov 23, 2016

That's what it seemed like, but orchestrating a fix among the co-processes is probably better left to you than me. :) Thanks, much appreciated!

kristapsdz pushed a commit to kristapsdz/acme-client that referenced this issue Nov 24, 2016

kristaps
Add a fix for
kristapsdz/acme-client-portable#12 that pushes
the check for file pre-existence with -n and -N only into the main
process, stripping out the arguments before passing them to the child.
This fixes a race condition.
@kristapsdz

This comment has been minimized.

Show comment
Hide comment
@kristapsdz

kristapsdz Nov 24, 2016

Owner

I just merged the above fix (in the main codebase) into the portable one. It pushes the check into the main process, splicing out the arguments on failure. Does this fix your issue?

Owner

kristapsdz commented Nov 24, 2016

I just merged the above fix (in the main codebase) into the portable one. It pushes the check into the main process, splicing out the arguments on failure. Does this fix your issue?

@xdgc

This comment has been minimized.

Show comment
Hide comment
@xdgc

xdgc Nov 25, 2016

Yes, this is working for me. Thanks for the quick fix!

xdgc commented Nov 25, 2016

Yes, this is working for me. Thanks for the quick fix!

@xdgc xdgc closed this Nov 25, 2016

@kristapsdz

This comment has been minimized.

Show comment
Hide comment
@kristapsdz

kristapsdz Nov 25, 2016

Owner

Still needs work, though, for situations like -nN.

Owner

kristapsdz commented Nov 25, 2016

Still needs work, though, for situations like -nN.

@kristapsdz kristapsdz reopened this Nov 25, 2016

@kristapsdz

This comment has been minimized.

Show comment
Hide comment
@kristapsdz

kristapsdz Nov 25, 2016

Owner

Works now for all possible invocations. Can you test it again? If it works fine, close out and it'll be in the next release.

Owner

kristapsdz commented Nov 25, 2016

Works now for all possible invocations. Can you test it again? If it works fine, close out and it'll be in the next release.

@kristapsdz kristapsdz closed this Dec 2, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment