New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cryptsetup: keyfile activation at boot time sometimes fails with "Invalid argument" (after an unclean shutdown) #11090

Closed
intelfx opened this Issue Dec 8, 2018 · 9 comments

Comments

2 participants
@intelfx
Copy link
Contributor

intelfx commented Dec 8, 2018

systemd version the issue has been seen with

$ systemctl --version
systemd 239
+PAM +AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid

Used distribution

Arch

Expected behaviour you didn't see

LUKS volume is expected to activate on boot.

Unexpected behaviour you saw

$ LC_TIME=C systemctl status systemd-cryptsetup@storage1.service
● systemd-cryptsetup@storage1.service - Cryptography Setup for storage1
   Loaded: loaded (/etc/crypttab; generated)
   Active: failed (Result: exit-code) since Sat 2018-12-08 06:26:33 MSK; 25min ago
     Docs: man:crypttab(5)
           man:systemd-cryptsetup-generator(8)
           man:systemd-cryptsetup@.service(8)
  Process: 269 ExecStart=/usr/lib/systemd/systemd-cryptsetup attach storage1 /dev/disk/by-uuid/6ff0637e-bb58-4ecb-bd0e-e1e6cb9bc7ce /etc/admin/keyfiles/storage1  (code=exited, status=1/FAILURE)
 Main PID: 269 (code=exited, status=1/FAILURE)
       IP: 0B in, 0B out
      CPU: 14ms

Dec 08 06:26:28 stratofortress systemd-cryptsetup[269]: Set cipher aes, mode xts-plain64, key size 256 bits for device /dev/disk/by-uuid/6ff0637e-bb58-4ecb-bd0e-e1e6cb9bc7ce.
Dec 08 06:26:28 stratofortress systemd-cryptsetup[269]: Failed to activate with key file '/etc/admin/keyfiles/storage1': Invalid argument
<...>

At the same time, similar volumes storage{2..4} activate correctly. Sometimes all 4 volumes activate correctly (i. e. the problem is not 100% reproducible).

Steps to reproduce the problem

  1. Make a regular LUKS volume on top of bare partition with a keyfile on root filesystem, ex.:

    $ sudo cat /etc/crypttab
    [sudo] пароль для operator:
    BOB says:  You seem to have forgotten your passwd, enter another!
    [sudo] пароль для operator:
    # Configuration for encrypted block devices.
    # See crypttab(5) for details.
    
    # <name>       <device>                                     <password>              <options>
    storage1       UUID=6ff0637e-bb58-4ecb-bd0e-e1e6cb9bc7ce    /etc/admin/keyfiles/storage1
    storage2       UUID=19a4f5a6-7ef9-4b76-ae95-bc2c4f03d8fe    /etc/admin/keyfiles/storage2
    storage3       UUID=4781976a-58b9-40ad-b490-d7269304b56a    /etc/admin/keyfiles/storage3
    storage4       UUID=dfdac573-0f19-40eb-ad29-1d0a6599b222    /etc/admin/keyfiles/storage4
    
  2. Boot cleanly, ensure the volume is opened.

  3. Reboot uncleanly, ex.:

    $ echo b | sudo tee /proc/sysrq-trigger
    

@intelfx intelfx changed the title cryptsetup: keyfile activation at boot time sometimes fails with "Invalid argument" cryptsetup: keyfile activation at boot time sometimes fails with "Invalid argument" (after an unclean shutdown) Dec 8, 2018

@intelfx

This comment has been minimized.

Copy link
Contributor Author

intelfx commented Dec 8, 2018

Clarification: after some experimentation, looks like this failure mode is triggered after an unclean shutdown (but, again, not always).

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Dec 10, 2018

Urks, crypt_activate_by_keyfile_offset() returns that, i.e. libcryptsetup, and we don#t really know why... Can you set SYSTEMD_LOG_LEVEL= for systemd-cryptsetup@.service, and reproduce this? This should generate debug logging for both our code and libcryptsetup itself and hopefully helps us tracking down what is going on?

@intelfx

This comment has been minimized.

Copy link
Contributor Author

intelfx commented Jan 7, 2019

@poettering I finally caught this again.

$ LC_ALL=C journalctl -b-1 -u systemd-cryptsetup@storage1
-- Logs begin at Sat 2019-01-05 14:00:12 MSK, end at Mon 2019-01-07 19:19:40 MSK. --
Jan 07 19:09:22 stratofortress systemd-cryptsetup[324]: Set cipher aes, mode xts-plain64, key size 256 bits for device /dev/disk/by-uuid/6ff0637e-bb58-4ecb-bd0e-e1e6cb9bc7ce.
Jan 07 19:10:23 stratofortress systemd-cryptsetup[324]: Failed to acquire read lock on device /dev/disk/by-uuid/6ff0637e-bb58-4ecb-bd0e-e1e6cb9bc7ce.
Jan 07 19:10:23 stratofortress systemd-cryptsetup[324]: Failed to activate with key file '/etc/admin/keyfiles/storage1': Device or resource busy
Jan 07 19:10:55 stratofortress systemd-cryptsetup[324]: Failed to add password to keyring: Invalid argument
Jan 07 19:10:55 stratofortress systemd-cryptsetup[324]: Failed to acquire read lock on device /dev/disk/by-uuid/6ff0637e-bb58-4ecb-bd0e-e1e6cb9bc7ce.
Jan 07 19:10:55 stratofortress systemd-cryptsetup[324]: crypt_load() failed on device /dev/disk/by-uuid/6ff0637e-bb58-4ecb-bd0e-e1e6cb9bc7ce.
Jan 07 19:10:55 stratofortress systemd-cryptsetup[324]: Failed to activate: Device or resource busy
Jan 07 19:10:55 stratofortress systemd[1]: systemd-cryptsetup@storage1.service: Main process exited, code=exited, status=1/FAILURE
Jan 07 19:10:55 stratofortress systemd[1]: systemd-cryptsetup@storage1.service: Failed with result 'exit-code'.
Jan 07 19:10:55 stratofortress systemd[1]: Failed to start Cryptography Setup for storage1.
Jan 07 19:10:55 stratofortress systemd[1]: systemd-cryptsetup@storage1.service: Consumed 6.286s CPU time, received 0B IP traffic, sent 0B IP traffic
$ LC_ALL=C journalctl -b-1 -u systemd-cryptsetup@storage2
-- Logs begin at Sat 2019-01-05 14:00:12 MSK, end at Mon 2019-01-07 19:20:12 MSK. --
Jan 07 19:09:22 stratofortress systemd-cryptsetup[326]: WARNING: Locking directory /run/cryptsetup is missing!
Jan 07 19:09:22 stratofortress systemd-cryptsetup[326]: Set cipher aes, mode xts-plain64, key size 256 bits for device /dev/disk/by-uuid/19a4f5a6-7ef9-4b76-ae95-bc2c4f03d8fe.
Jan 07 19:10:23 stratofortress systemd-cryptsetup[326]: Failed to acquire read lock on device /dev/disk/by-uuid/19a4f5a6-7ef9-4b76-ae95-bc2c4f03d8fe.
Jan 07 19:10:23 stratofortress systemd-cryptsetup[326]: Failed to activate with key file '/etc/admin/keyfiles/storage2': Device or resource busy
Jan 07 19:10:23 stratofortress systemd[1]: systemd-cryptsetup@storage2.service: Main process exited, code=killed, status=15/TERM
Jan 07 19:10:23 stratofortress systemd[1]: systemd-cryptsetup@storage2.service: Failed with result 'signal'.
Jan 07 19:10:23 stratofortress systemd[1]: Stopped Cryptography Setup for storage2.
Jan 07 19:10:23 stratofortress systemd[1]: systemd-cryptsetup@storage2.service: Consumed 6.476s CPU time, received 0B IP traffic, sent 0B IP traffic
Jan 07 19:11:53 stratofortress systemd[1]: Dependency failed for Cryptography Setup for storage2.
Jan 07 19:11:53 stratofortress systemd[1]: systemd-cryptsetup@storage2.service: Job systemd-cryptsetup@storage2.service/start failed with result 'dependency'.
$ LC_ALL=C journalctl -b-1 -u systemd-cryptsetup@storage3
-- Logs begin at Sat 2019-01-05 14:00:12 MSK, end at Mon 2019-01-07 19:20:43 MSK. --
Jan 07 19:09:22 stratofortress systemd-cryptsetup[325]: Set cipher aes, mode xts-plain64, key size 256 bits for device /dev/disk/by-uuid/4781976a-58b9-40ad-b490-d7269304b56a.
Jan 07 19:10:23 stratofortress systemd-cryptsetup[325]: Failed to acquire read lock on device /dev/disk/by-uuid/4781976a-58b9-40ad-b490-d7269304b56a.
Jan 07 19:10:23 stratofortress systemd-cryptsetup[325]: Failed to activate with key file '/etc/admin/keyfiles/storage3': Device or resource busy
Jan 07 19:10:23 stratofortress systemd[1]: systemd-cryptsetup@storage3.service: Main process exited, code=killed, status=15/TERM
Jan 07 19:10:23 stratofortress systemd[1]: systemd-cryptsetup@storage3.service: Failed with result 'signal'.
Jan 07 19:10:23 stratofortress systemd[1]: Stopped Cryptography Setup for storage3.
Jan 07 19:10:23 stratofortress systemd[1]: systemd-cryptsetup@storage3.service: Consumed 6.566s CPU time, received 0B IP traffic, sent 0B IP traffic
Jan 07 19:11:53 stratofortress systemd[1]: Dependency failed for Cryptography Setup for storage3.
Jan 07 19:11:53 stratofortress systemd[1]: systemd-cryptsetup@storage3.service: Job systemd-cryptsetup@storage3.service/start failed with result 'dependency'.
$ LC_ALL=C journalctl -b-1 -u systemd-cryptsetup@storage4
-- Logs begin at Sat 2019-01-05 14:00:12 MSK, end at Mon 2019-01-07 19:20:57 MSK. --
Jan 07 19:09:22 stratofortress systemd-cryptsetup[327]: Set cipher aes, mode xts-plain64, key size 256 bits for device /dev/disk/by-uuid/dfdac573-0f19-40eb-ad29-1d0a6599b222.
Jan 07 19:10:23 stratofortress systemd-cryptsetup[327]: Failed to acquire read lock on device /dev/disk/by-uuid/dfdac573-0f19-40eb-ad29-1d0a6599b222.
Jan 07 19:10:23 stratofortress systemd-cryptsetup[327]: Failed to activate with key file '/etc/admin/keyfiles/storage4': Device or resource busy
Jan 07 19:10:55 stratofortress systemd-cryptsetup[327]: Failed to add password to keyring: Invalid argument
Jan 07 19:10:55 stratofortress systemd-cryptsetup[327]: Failed to acquire read lock on device /dev/disk/by-uuid/dfdac573-0f19-40eb-ad29-1d0a6599b222.
Jan 07 19:10:55 stratofortress systemd-cryptsetup[327]: crypt_load() failed on device /dev/disk/by-uuid/dfdac573-0f19-40eb-ad29-1d0a6599b222.
Jan 07 19:10:55 stratofortress systemd-cryptsetup[327]: Failed to activate: Device or resource busy
Jan 07 19:10:55 stratofortress systemd[1]: systemd-cryptsetup@storage4.service: Main process exited, code=exited, status=1/FAILURE
Jan 07 19:10:55 stratofortress systemd[1]: systemd-cryptsetup@storage4.service: Failed with result 'exit-code'.
Jan 07 19:10:55 stratofortress systemd[1]: systemd-cryptsetup@storage4.service: Consumed 6.473s CPU time, received 0B IP traffic, sent 0B IP traffic
Jan 07 19:11:53 stratofortress systemd[1]: Dependency failed for Cryptography Setup for storage4.
Jan 07 19:11:53 stratofortress systemd[1]: systemd-cryptsetup@storage4.service: Job systemd-cryptsetup@storage4.service/start failed with result 'dependency'.
@intelfx

This comment has been minimized.

Copy link
Contributor Author

intelfx commented Jan 8, 2019

I believe that's simply due to a race between systemd-cryptsetup@.service and systemd-tmpfiles-setup.service, which is supposed to create /run/cryptsetup:

$ grep -r cryptsetup /lib/tmpfiles.d 
/lib/tmpfiles.d/cryptsetup.conf:d /run/cryptsetup 0700 root root -

Note that cryptsetup started before tmpfiles:

$ LC_ALL=C journalctl -b-1 -t systemd | grep -E 'cryptsetup|tmpfiles'                                                                                                     
Jan 07 19:10:23 stratofortress systemd[1]: systemd-cryptsetup@storage2.service: Main process exited, code=killed, status=15/TERM                                          
Jan 07 19:10:23 stratofortress systemd[1]: systemd-cryptsetup@storage2.service: Failed with result 'signal'.                                                              
Jan 07 19:10:23 stratofortress systemd[1]: systemd-cryptsetup@storage2.service: Consumed 6.476s CPU time, received 0B IP traffic, sent 0B IP traffic                      
Jan 07 19:10:23 stratofortress systemd[1]: systemd-cryptsetup@storage3.service: Main process exited, code=killed, status=15/TERM                                          
Jan 07 19:10:23 stratofortress systemd[1]: systemd-cryptsetup@storage3.service: Failed with result 'signal'.                                                              
Jan 07 19:10:23 stratofortress systemd[1]: systemd-cryptsetup@storage3.service: Consumed 6.566s CPU time, received 0B IP traffic, sent 0B IP traffic                      
Jan 07 19:10:55 stratofortress systemd[1]: systemd-cryptsetup@storage1.service: Main process exited, code=exited, status=1/FAILURE                                        
Jan 07 19:10:55 stratofortress systemd[1]: systemd-cryptsetup@storage1.service: Failed with result 'exit-code'.                                                           
Jan 07 19:10:55 stratofortress systemd[1]: systemd-cryptsetup@storage1.service: Consumed 6.286s CPU time, received 0B IP traffic, sent 0B IP traffic                      
Jan 07 19:10:55 stratofortress systemd[1]: systemd-cryptsetup@storage4.service: Main process exited, code=exited, status=1/FAILURE                                        
Jan 07 19:10:55 stratofortress systemd[1]: systemd-cryptsetup@storage4.service: Failed with result 'exit-code'.                                                           
Jan 07 19:10:55 stratofortress systemd[1]: systemd-cryptsetup@storage4.service: Consumed 6.473s CPU time, received 0B IP traffic, sent 0B IP traffic                      
Jan 07 19:11:53 stratofortress systemd[1]: systemd-cryptsetup@storage2.service: Job systemd-cryptsetup@storage2.service/start failed with result 'dependency'.            
Jan 07 19:11:53 stratofortress systemd[1]: systemd-cryptsetup@storage3.service: Job systemd-cryptsetup@storage3.service/start failed with result 'dependency'.            
Jan 07 19:11:53 stratofortress systemd[1]: systemd-cryptsetup@storage4.service: Job systemd-cryptsetup@storage4.service/start failed with result 'dependency'.            
Jan 07 19:13:25 stratofortress systemd[1]: Removed slice system-systemd\x2dcryptsetup.slice.                                                                              
Jan 07 19:13:25 stratofortress systemd[1]: system-systemd\x2dcryptsetup.slice: Consumed 25.804s CPU time, received 0B IP traffic, sent 0B IP traffic                      
Jan 07 19:13:29 stratofortress systemd[1]: systemd-tmpfiles-setup.service: Consumed 32ms CPU time, received 0B IP traffic, sent 0B IP traffic                             
Jan 07 19:13:29 stratofortress systemd[1]: systemd-tmpfiles-setup-dev.service: Consumed 15ms CPU time, received 0B IP traffic, sent 0B IP traffic

Now the question: should we continue to use tmpfiles (and order systemd-cryptsetup@.service after systemd-tmpfiles-setup.service) or switch to RuntimeDirectory=?

@intelfx

This comment has been minimized.

Copy link
Contributor Author

intelfx commented Jan 8, 2019

Further thinking: I guess /run/cryptsetup is not just used by systemd-cryptsetup@.service, but also by the regular cryptsetup tools? If so, we have to stay with tmpfiles for this one. I'll send a PR, just in case this reasoning is correct.

Further thinking[2]: I don't see many units using After=systemd-tmpfiles-setup.service, won't this introduce too strict a dependency (maybe causing circulars in some setups)?

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Jan 8, 2019

Hmm, so this sounds like a bug in cryptsetup: it should create the dir if it's missing.

tmpfiles runs after all mounts are established, since it is likely used to set stuff up on those mounts. This means that it also runs after the cryptsetup stuff is done, since otherwise those file systems cannot be mounted. and due to that cryptsetup must make sure it can deal with environments before tmpfiles.d ran. The rule is: anything that shall be able to run during early boot (i.e. before basic.target) cannot rely on tmpfiles.d generated stuff. Only regular services that run after basic.target can rely on that.

This should be filed against cryptsetup upstream hence.

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Jan 8, 2019

(and no, adding deps between cryptsetup and tmpfiles is not appropriate, that would just create a cyclic dep for many people)

@intelfx

This comment has been minimized.

Copy link
Contributor Author

intelfx commented Jan 9, 2019

Fair enough — just like I suspected.

However, this adds code duplication at the very place we're trying to reduce it. Won't it be better to do the same thing we're doing for /dev — run tmpfiles for /run separately? Or, better yet, maybe let's extend systemd-tmpfiles-setup-dev.service to all API filesystems?

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Jan 17, 2019

However, this adds code duplication at the very place we're trying to reduce it. Won't it be better to do the same thing we're doing for /dev — run tmpfiles for /run separately? Or, better yet, maybe let's extend systemd-tmpfiles-setup-dev.service to all API filesystems?

Well, I mean, tmpfiles is a definitely a useful tool, and there are plenty reasons to use it, for example because you need something privileged to create dirs properly owned so that later unpriv stuff can write to it. Or because you want some name to be taken early on to avoid nameclashes later and so on. However, I don't think cryptsetup really has a good reason to rely on tmpfiles for this, it can also just create the dir itself easily, since it is privileged, and since it generally runs early enough. Hence, use tmpfiles when you really have to, but not when you don't need it.

I mean, I think it's good to be robust, hence I think cryptsetup should just create whtever it needs, it's both simpler and more robust that way. They can keep the tmpfiles stuff in addition to the internal code to make the dir if they like, but it shouldn't be necessary to have it I think.

And hence, because doing dir creation internally if possible is preferable anyway, and in this case we can get away with it, I don't think we should duplicate the s-t-s-d.s efforts for this case, if you follow what I mean...

(But dunno, maybe I am missing something there, and there's a reason why they can't create the dir themselves, but afaics there is none.)

Anyway, let's close this here. Please report this to the cryptsetup folks. And if there's a really strong reason why they cannot just create the dir themselves if it is missing we can look into this again, and reopen, but I wouldn't expect this to happen. Hope that makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment