Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added argon2id as KDF. #14836

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

oromenahar
Copy link
Contributor

@oromenahar oromenahar commented May 6, 2023

Added Argon2 as a KDF which is memory and cpu intensive to prevent brute force attacks.
There are a few more options available now: keykdf, argon2_t_cost, argon2_m_cost, argon2_parallelism, argon2_version (hiddden). The options are stored on the disks and the load-key function uses the correct kdf based on the information stored on the disks.

Important note: Please be patient, it is my first PR on a big project and I tried my best, also I have just a little bit more than basic understanding of crypto.

Motivation and Context

PBKDF2 is a KDF which can be brute forced by ASICs or (GP)GPUs, encryption of data is getting more important but not all PCs or server provide a TPM to store secrets for the data encryption or the TPM can't be used for any reason. A KDF derives a key from a password provided by the user, which ZFS uses to enrypt the data. The KDF should be safe against several different types of attacks and Argon2 solves this problem. It is a modern Memory-Hard KDF which can use multiple CPUs and configurable RAM to prevent brute force attacks and it won the P-H-C. I think it should be possible to use a modern KDF directly in ZFS and not using a extra script which generates a raw key for ZFS.
Two issues for more information: 14762, 10764.

Description

This features adds some options to store the configuration of the Argon2 options on disk, there are four new options.
keykdf, argon2_t_cost, argon2_m_cost, argon2_parallelism and argon2_version which is hiddden. The keykdf-option stores a none, PBKDF2 or Argon2id, the argon2_t_cost-option is the equivalent to pbkdf2iters but for Argon2id. I think it's better to use a extra iter-option for this, because based on the other options you configure it can be pretty hard to derive a key. For example 35,000 rounds and 1<<20KiB of memory will run forever, but 10 rounds will derive a key in a 'small' amount of time based on the hardware. The option argon2_m_cost configures how much memory is need to perform the key derivation it is set in KiB. The option argon2_parallelism stores how many CPUs should be used by the algorithm. The option argon2_version just stores the Argon2-Version and is not visible to the user, just for the programmer. This version can be changed and it should be stored I think. I have choosed some default options for CPU and memory.

#define DEFAULT_ARGON2_T_COST 4
#define DEFAULT_ARGON2_M_COST (1<<19)
#define DEFAULT_ARGON2_PARALLELISM 4
#define DEFAULT_ARGON2_VERSION ARGON2_VERSION_13

Openssl -> Substantial features doesn't support Argon2 until now or within a short amount of time, for that reason another library is needed. I searched for other crypto libs, which support Argon2 and found Argon2Lib. It looks like, that Cryptsetup also uses a copy of this for there Argon2 support. There are some libraries in other languages, like C++ or Rust but I followed Cryptsetup and used the C-Lib Argon2Lib.

Discussion

A second KDF is not a big problem right now, but if more and more KDFs are added the PROP-List is getting longer. I think it's good to have just a few but state of the art encryption/KDF algorithm.
I thought about the default options for Argon2 but I'm not sure, for small hardware it could be to much but for bigger hardware it is not enough. If some datasets are allready open and in use, maybe there is not that much memory/CPU left to decrypt the next dataset. I think the default memory costs should be at 1<<20KiB (1GiB). Also I think the T_COST should be at least 10, but this need to be discussed with others as well.

TODO

  • Docs: Argon2 no boot option, because of complexity

How Has This Been Tested?

I tested it with virtual debian and arch-linux machines and virtual disks. I have tested to upgrade the pool from an older version and also tried to use a older version to read a dataset which was used but not upgraded by the new version. If the dataset is enrypted with a password and PBKDF2 it is backward compatible. I ran some of the zfs-test and zpool-test scripts as well, the scripts I've ran finished without any error. I will continue to run the tests. I don't know if this works under other operating systems.

Future work

  • include the ArgonLib to the source code to remove an extra dependency
  • check fips standard?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the OpenZFS code style requirements. (I 'm not sure, there are line > 80, but my editor doesn't think so?!)
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

Added Argon2 as a KDF which is memory and cpu intensive to prevent
brute force attacks.
There are a few more options available now: `keykdf`, `argon2_t_cost`,
`argon2_m_cost`, `argon2_parallelism`, `argon2_version` (hiddden).
The options are stored on the disks and the load-key function
uses the correct kdf based on the information stored on the disks.

Signed-off-by: Kay Pedersen <mail@mkwg.de>
@oromenahar oromenahar marked this pull request as draft May 6, 2023 13:10
@rincebrain
Copy link
Contributor

You don't seem to have added a pool feature flag for this, which would cause things to break pretty badly on older systems.

@robn
Copy link
Contributor

robn commented May 7, 2023

You don't seem to have added a pool feature flag for this, which would cause things to break pretty badly on older systems.

I don't think that's true. Nothing changes in the on-disk format; its just a few extra properties on the dataset. Older ZFS userspace won't necessarily be able to derive the key to unlock the dataset, but can still work with the pool, and the operator could just load the raw key in instead. For old pools on new userspace, it just means userspace needs to understand that a missing ZFS_PROP_KEY_KDF should be taken to to mean ZFS_KEY_KDF_PBKDF2.

@rincebrain
Copy link
Contributor

I'm going to strongly disagree that "it just fails to unlock the datasets and you could find a way to extract and feed in the raw key" is a reasonable "we don't need a feature flag" reason.

@robn
Copy link
Contributor

robn commented May 7, 2023

I'm going to strongly disagree that "it just fails to unlock the datasets and you could find a way to extract and feed in the raw key" is a reasonable "we don't need a feature flag" reason.

This does not change the on-disk format. It does not prevent the pool being imported, scrubbed, or datasets being sent and recieved. It does not prevent datasets without encryption or using a non-argon2 key method from being mounted (or prevent using an alternate method to derive the key or even a newer userspace). Stopping the pool from being imported at all seems heavy-handed, and "break pretty badly" seems exaggerated.

@rincebrain
Copy link
Contributor

It turns out, writing new values where they did not exist before is a format change.

It may be a compatible one, but "is not a format change" is inaccurate.

It's possible the project will change its stance, but historically, "cannot mount the filesystem if in use" would be sufficient criteria for a feature flag.

If you think that's too heavy handed, a more fine-grained feature approach might be nice, but currently, I would suggest this is definitely past the criteria of needing something to indicate a change, and it's not read-only compatible, so it would, indeed, require blocking import.

@robn
Copy link
Contributor

robn commented May 7, 2023

I'm not trying to relitigate anything; I'm still fairly new here, and I'm happy to accept that my understanding is wrong. My understanding comes from reading docs and code and seeing that missing features prevent pool import entirely, and so should be avoided unless obviously necessary. This seems especially true for encryption when one of the selling points is that you don't need the keys in order to to recieve streams and manage the pool. I'm more than comfortable accepting that may be a wrong or incomplete conclusion though.

I will note that similar observations that led me to not add a feature flag in #14249, and add #14577 to help in the future.

Per-dataset feature flags are probably what we want here, and for my thing too. I've got a sketch of that on a branch, which I'll see if I can polish up soon. I wouldn't wait for me though; like everyone, I have limited time at the moment.

@oromenahar
Copy link
Contributor Author

I'm not sure if a feature-flag is important or not. I totally understand @rincebrain that the datasets are limited on older ZFS versions. Not all of them can be read and I don't know if send and recv is working without any problems. I made some test to check what is working and what isn't working, but not enough.
If there are enough people who want Argon2 as a KDF and a feature flag is necessary I would add a feature-flag to the pool.
I mean kind of it is breaking backwards compatibility. Not that much, that the pool can not be used any more, but some parts on the pool can only be used with limits or some tricks (raw key and so on), like @robn suggested. For that reason I think a feature flag wouldn't be bad.

@behlendorf behlendorf added the Status: Design Review Needed Architecture or design is under discussion label May 8, 2023
@debdrup
Copy link
Contributor

debdrup commented Jun 29, 2023

It's not a blocker for now, but I'm just making a note of this:
There should probably be some kind of documentation that notes that this can't be used for FDE, since it's unlikely to be possible to implement anything like argon in a memory-constrained environment like a boot loader.

@oromenahar
Copy link
Contributor Author

Oh, yes you are right. I will create a TODO List and add it to the first post later. So we can track the TODO's that are important or necessary to finish the pull request.

@ongardie
Copy link

ongardie commented Feb 6, 2024

Hi @oromenahar, thanks for working on this. Any thoughts on this PR vs #15682? It looks to me (as an outsider) like this PR neatly exposes the important parameters, but then the PR seems to have lost steam. The other PR looks like it would need some more work to get to about this point.

[edit: fixed link]

@oromenahar
Copy link
Contributor Author

Hi @ongardie I am not happy with #15682 for the same reasons I'm not happy with my pr. For that reason this lost a little bit of steam.
I can add some tests and a feature flag pretty fast. And so on.
But for me adding more and more kdf-flags and properties doesn't make that much sense, especially the filesystem/kernel doesn't really work with these parameters. The filesystem just store it on disk. This will change over time because adding a new kdf has the same result like this one. You need to add unnecessary code to the kernel. The kernel just need a derived key. Yeah right now the kernel is checking that the PBKDF2 iterations are higher than a certain number of iterations. That's fine but not really necessary.
I wanna pass the data in a more abstract way to the kernel and wanna create a serialized byte array in the user space. This array should be passed to the filesystem. If we wanna decrypt the filesystem, we will get the byte array back, deserialize it and derive the key, which is than passed to the kernel to decrypt.
With this kind of interface you can add pretty easy a new kdf in the userspace only or add custom kdfs. That's some kind of possible today but I don't think in an easy way. For example using argon2 is possible by writing a custom script which derives a key from your password and feed it to zfs raw key encrypted dataset. That's work, it's not easy and you need to store the parameters anywhere and so on. This work need to be handled by zfs otherwise nobody gonna use it.
I'm currently working on multiple key slot version of zfs while I had that kind of thoughts a few month ago. That work had lost some steam for sure. But the reason is I wanna contact @tcaputi to ask him a few questions about master key encryption in zfs. The most work is done for that multi kdf and multi key slot version in a draft where we could talk about the design, not about the code quality for sure but the higher design. Just need a few quick answers from @tcaputi and time for that.
That are my thoughts to the key handling in zfs. I hope I can finish the work in a future which is not in the next century so multi key slots could be merged to zfs this year ;) but you know there is never time for cool and nice open source projects.
I'm happy to hear about any thoughts to the points I criticize and happy to discuss it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Design Review Needed Architecture or design is under discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants