Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition between keygen and update, resulting in "Key derivation key not available!" #52

Closed
cbiedl opened this issue Nov 24, 2020 · 8 comments

Comments

@cbiedl
Copy link

cbiedl commented Nov 24, 2020

TIL: The two After=... declarations in tangd.socket are being started in parallel. So if tangd-update starts checking @jwkdir@ before tangd-keygen wrote both files, the .jws in @cachedir@ will be incomplete. This happened here with relatively slow armhf hardware.

In that situation, an attempt to use that tang server with "clevis encrypt tang" will trigger a message "Key derivation key not available!", Debian Bug report is https://bugs.debian.org/975343

As a solution I suggest to move the

Requires=tangd-keygen.service
After=tangd-keygen.service

from tangd.socket to tangd-update.service, that worked for me.

Related, the entire logic around the keygen script seems a little fragile if operation is interrupted mid-way:

Writing the data to a temporary file first and atomically move them to the final location - as seen in the update script - avoids creation of zero-sized files. Alternatively, that job could already be done by jose, see latchset/jose#88.

Still, in case of an interruption, key generation will not be resumed since the ConditionDirectoryNotEmpty= in tangd-keygen.service will no longer apply. Perhaps there is a systemd way to deal with that, I'd just touch a "key-created" semaphore in @jwkdir@ - and as a next step merge keygen and update into a single script since detecting the necessity of having to create key is easy then. But perhaps I missed a use case here.

@cbiedl
Copy link
Author

cbiedl commented Nov 26, 2020

Strike that, I missed the fact the dependencies have been reworked in commit 7778512
So I'll have to re-check why this still happens.

@sergio-correia
Copy link
Collaborator

Strike that, I missed the fact the dependencies have been reworked in commit 7778512
So I'll have to re-check why this still happens.

Did you have that commit in your build? It's not part of any release, so it's likely you don't have it applied there.

@cbiedl
Copy link
Author

cbiedl commented Nov 28, 2020

Yeah, I realized too late this bug report was based on the latest release. However, it still happens if I include that commit, so it's certainly still an issue. At the moment however I'm stuck figuring what triggers tangd-update invocation now. It does happen, but why?

@sergio-correia
Copy link
Collaborator

Could you try #53 and report back, please?

@cbiedl
Copy link
Author

cbiedl commented Nov 28, 2020

Will do in a moment, just want to share the analysis I did:

Starting with @jwkdir@ and @cachedir@ empty.

Requesting an advertisement using

echo foo | clevis encrypt tang '{"url": "http://<tang-server>/"}' > bar.txt

which results in an error:

Unable to fetch advertisement: 'http://<tang-server>/adv/'!

This is the output of "journalctl -o short-precise", with timestamps of the created files merged, ordered by time.

Nov 28 11:38:25.272465 freekeh systemd[1]: Starting Tang Server key generation script...

-rw-r--r-- 1 root root 354 2020-11-28 11:38:25.401147740 +0100 @dbdir@/OCWpHA3_4-E9mUpyW_0N3-Crcoc.jwk

Nov 28 11:38:25.457995 freekeh systemd[1]: Starting Tang Server key update script...

So update is started ...

-rw-r--r-- 1 root root 349 2020-11-28 11:38:25.531138284 +0100 @dbdir@/tPfTOakfqz0qMjjloGn6v989ttc.jwk

... before keygen finished the job. Also, update will ignore the second .jwk (not shown here)

Nov 28 11:38:25.557283 freekeh systemd[1]: tangd-keygen.service: Succeeded.
Nov 28 11:38:25.575785 freekeh systemd[1]: Started Tang Server key generation script.
Nov 28 11:38:25.597302 freekeh systemd[1]: Started Tang Server (<client>:54812).

And now the server is started although update is still running.

Nov 28 11:38:25.630481 freekeh tangd[2709]: <client> GET /adv/ => 404 (src/tangd.c:70)

... hence the 404

Nov 28 11:38:25.639115 freekeh systemd[1]: tangd@1-<server>:80-<client>:54812.service: Succeeded.

-rw-r--r-- 1 root root 618 2020-11-28 11:38:25.761121554 +0100 @cachedir@/default.jws

The update script needed 180ms to parse the .jwk and create the first .jws. This is the earliest moment where the server may be run. But to be safe, this should rather happen after update finished the job which took ...

(...)
lrwxrwxrwx 1 root root  44 2020-11-28 11:38:28.280938310 +0100 @cachedir@/MQHhVWoc4arWBQYK1wHf69l4dnXsX7fC2nZ7JAem1-3zyU53bUBjcOqtMCaB8WFRrvXX6g9pSz2AcgWRpWMAbw.jwk -> @dbdir@/tPfTOakfqz0qMjjloGn6v989ttc.jwk

... another 2520ms.

Nov 28 11:38:28.558664 freekeh systemd[1]: tangd-update.service: Succeeded.
Nov 28 11:38:28.561176 freekeh systemd[1]: Started Tang Server key update script.

So far I'm not convinced using the systemd semantics to resolve dependencies are the best idea.

@cbiedl
Copy link
Author

cbiedl commented Nov 28, 2020 via email

@sergio-correia
Copy link
Collaborator

sergio-correia commented Dec 10, 2020

@cbiedl: can close this, now that #53 has been merged?

@cbiedl
Copy link
Author

cbiedl commented Dec 24, 2020

ACK, none of this still applies.

@cbiedl cbiedl closed this as completed Dec 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants