Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS Encryption #4329

Closed
wants to merge 10 commits into from
Closed

ZFS Encryption #4329

wants to merge 10 commits into from

Conversation

tcaputi
Copy link
Contributor

@tcaputi tcaputi commented Feb 11, 2016

Native encryption in zfsonlinux (See issue #494)

The change incorporates 2 major pieces:

The first feature is a keystore that manages wrapping and encryption keys for encrypted datasets. The commands are similar to that of Solaris but with a few key enhancements to make it more predictable, more consistent, and require less manual maintenance. It is fully integrated with the existing zfs create functions and zfs clone functions. It also exposes a new set of commands via zfs key for managing the keystore. For more info on the issues with the Solaris implementation see my comments here and here. The keystore operates on a few rules.

  • All wrapping keys are 32 bytes (256 bits), even for 128 and 192 bit encryption types.
  • Encryption must be specified at dataset creation time.
  • Specifying a keysource while creating a dataset causes the dataset to become the root of an encryption tree.
  • All members of an encryption tree share the same wrapping key.
  • Each dataset can have up to 1 keychain (if it is encrypted) that is not shared with anybody.

The second feature is the actual data and metadata encryption. All user data in an encrypted dataset is stored encrypted on-disk. User-provided metadata is also encrypted, but metadata structures have been left plain so that scrubbing and resilvering still works without the keys loaded. The design was originallly inspired by this article but has been changed fairly significantly since.

Implementation details that should be looked at

  • Encrypting data going to disk requires creating a key_mapping_t during dsl_dataset_tryown(). I added a flag to this function for code that wishes to own the dataset, but that does not require encrypted data, such as the scrub functions. I did my best to confirm that all owners set this flag correctly, but someone should confirm them, just to be sure.
  • zfs send and zfs recv do not currently do anything special with regards to encryption. The format of the send file has not changed and zfs send requires the keys to be loaded in order to work. At some point there should probably be a way to do raw sends.
  • I altered the prototype of lzc_create() and lzc_clone() to support hidden arguments. I understand that the purpose of libzfs_core is to have a stable api interacting with the ZFS ioctls. However, these functions need to accept wrapping keys separately from the rest of their parameters because they need to use the (new) hidden_args framework to support hiding arguments from the logs. Without this, the wrapping keys would get printed to the zpool history.

EDIT 5/4/16: Updated to reflect the current state of the PR
EDIT 1/3/17: Updated to reflect the current state of the PR

@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 12, 2016

I see my builds all failed because I left in a few bad asserts and I didn't have debugging enabled so they didn't cause an issue on my end. Oops. I will fix this tonight.

@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 12, 2016

I also see that the style issues are causing the builds to fail. I will fix that tonight as well.

@ilovezfs
Copy link
Contributor

Does none of this belong in spl?

@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 12, 2016

All of the code in module/icp (with the exception of the algs directory which has specific cipher implementations) acts as a single frameowrk and should be kept together. Personally, I view it as a standalone module that relies on the spl which is why I put it where it is. To me it seemed like it was along the same lines as the nvpair module. I can move it if the community feels there is a better place for it.

@ryao
Copy link
Contributor

ryao commented Feb 13, 2016

@ilovezfs Putting any of this into the SPL would require implementing new encryption routines because the Linux kernel GPL symbol exports its encryption routines and the SPL is not allowed to touch that.

@ilovezfs
Copy link
Contributor

@ryao Then perhaps SPL needs to be renamed LPL "License Porting Layer" since a piece of Solaris porting cannot go in the SPL. Ahem.

@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 13, 2016

@ryao Thank you for mentioning that. I had forgotten about the licensing aspect.

@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 16, 2016

does anyone know how to re-trigger the automated build tests? The issues from above should have been fixed by my reformatting commit, but I'd like to be sure.

@behlendorf
Copy link
Contributor

@tcaputi if you force update your branch on Github it will rerun all the builds and tests. Usually, I just rebase my work on master and then force update it.

As for putting this work in the SPL I'd prefer to keep in the ZFS source tree since it's all CDDL and originated from illumos. The SPL itself will in due course get moved in to a directory of the ZFS source tree for convenience so let's try and avoid adding more to it if we don't have too.

@tcaputi tcaputi force-pushed the master branch 2 times, most recently from 6e4b1a8 to 7557c83 Compare February 18, 2016 02:26
@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 18, 2016

@behlendorf It doesn't look like the tests ran. I did a merge against upstream/master and then git push -f origin master to push to my master branch. Is this correct?

@l1k
Copy link
Contributor

l1k commented Feb 18, 2016

@tcaputi: You need to rebase instead of merge, so assuming your index and working tree are clean:

git reset --hard f702c83
git fetch upstream
git rebase -i upstream/master
git push -f origin master

@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 18, 2016

I looked at the logs here. The builds are failing now because zfs requires the (few) changes I made to the SPL. I will make a PR for that too, but I suppose this won't work on the automated builds until then (which will probably take a little while)

@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 18, 2016

I made a (small) corresponding PR against the spl: openzfs/spl#533

@behlendorf
Copy link
Contributor

@tcaputi if you add the line Requires-spl: refs/pull/533/head to commit message of the top patch in this stack then the builtbot will use that PR instead of master when checking out the spl for testing.

@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 19, 2016

@behlendorf I will do that and fix the SPL PR tonight. Thanks for the advice and patience.

@tcaputi tcaputi force-pushed the master branch 3 times, most recently from 31f3656 to 9ec5427 Compare February 21, 2016 05:53
@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 21, 2016

I see that there are a few problems on non x86_64 architectures. I will fix these soon, but I don't have any local machines to test against. I hope its ok if I end up hitting the build system a few more times for testing.

@behlendorf
Copy link
Contributor

@tcaputi by all means keep submitting things until the buildbot is happy. That's what it's there for.

* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header in this file and any other new files that are not derived from existing files under the unbound CDDL (i.e. no version peg) should state Common Development and Distribution License, Version 1.0 only. That way Oracle cannot exercise sections 4.1 and 4.2 of the CDDL to make them available under an open source incompatible license. There are already a few files in the repository like this, such as ./lib/libspl/xdr.c.

I ran this by @ahrens before posting it. He agrees that we should be going with Version 1.0 only in new files.

@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 26, 2016

@ryao Ok. Thanks. I just copies the license from an existing file. I'lll make the change.

@grahamperrin grahamperrin mentioned this pull request Feb 26, 2016
@tcaputi tcaputi force-pushed the master branch 3 times, most recently from 005638a to 55ef824 Compare March 7, 2016 04:27
@@ -727,26 +802,41 @@ dsl_dataset_disown(dsl_dataset_t *ds, void *tag)
ASSERT(ds->ds_dbuf != NULL);

mutex_enter(&ds->ds_lock);
if (ds->ds_dir && ds->ds_dir->dd_crypto_obj) {
(void) spa_keystore_remove_mapping(ds->ds_dir->dd_pool->dp_spa,
ds, ds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that spa_keystore_add_mapping and spa_keystore_remove_mapping need to be paired. Since dsl_dataset_own can conditionally add, doesn't dsl_dataset_disown also need to conditionally remove? Otherwise we may remove when we didn't add.

Since there's only one owner, we should be able to keep track in the ds whether the owner did an add, in which case we should do a remove. That way we don't need to change the args to _disown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of the dsl_dataset_hold_crypt() change I mentioned above, the dsl_dataset_rele() and dsl_dataset_disown() will also accept flags to properly manage releasing the keys.

int
dsl_dataset_own_obj(dsl_pool_t *dp, uint64_t dsobj,
dsl_dataset_hold_crypt(dsl_pool_t *dp, const char *name,
void *tag, dsl_dataset_t **dsp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to help the reader understand when they would use hold vs hold_crypt. E.g:

Hold the dataset and ensure that the decryption key is available (otherwise, the call will fail and return ENOKEY). Callers must use dsl_dataset_rele_crypt() to release this hold. If callers need access to the decrypted contents, they must use this routine, otherwise they can use dsl_dataset_hold().

Copy link
Contributor Author

@tcaputi tcaputi Jan 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am getting this rewritten now to accommodate this. The new version will have flags passed to dsl_dataset_rele_flags() and dsl_dataset_hold_flags() and friends. The non-crypt version will be a thin wrapper around this.

if (os->os_encrypted &&
(spa_keystore_lookup_key(os->os_spa,
os->os_dsl_dataset->ds_object, NULL, NULL) == 0))
key_needed = B_TRUE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know that this ownership needed the key? See similar comment in dsl_dataset_disown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already have this fixed locally (and the other one as well). I will push it with the rest of these fixes.

@@ -267,7 +267,7 @@ calculate_range(const char *dataset, err_type_t type, int level, char *range,
* size.
*/
if ((err = dmu_objset_own(dataset, DMU_OST_ANY,
B_TRUE, FTAG, &os)) != 0) {
B_TRUE, B_TRUE, FTAG, &os)) != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that we actually need any decrypted data here. If we do require it to be decrypted, how does zinject get the keys?

@@ -179,7 +179,7 @@ object_from_path(const char *dataset, const char *path, struct stat64 *statbuf,
*/
sync();

err = dmu_objset_own(dataset, DMU_OST_ZFS, B_TRUE, FTAG, &os);
err = dmu_objset_own(dataset, DMU_OST_ZFS, B_TRUE, B_TRUE, FTAG, &os);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that we actually need any decrypted data here. If we do require it to be decrypted, how does zinject get the keys?

Copy link
Contributor Author

@tcaputi tcaputi Jan 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to err on the side of caution here, and so anything I wasn't sure of required the keys be loaded. I will change this to B_FALSE. (same with the other instance in zinject).

.ad
.sp .6
.RS 4n
Indicates that the zpool command will request encryption keys for all encrypted datasets it attempts to mount as it is bringing the pool online. This is equivalent to running \fBzfs mount\fR on each encrypted dataset immediately after the pool is imported. If any datasets have a \fBprompt\fR keysource this command will block waiting for the key to be entered. Otherwise, encrypted datasets will be left unavailable until the keys are loaded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

equivalent to running \fBzfs mount\fR on each encrypted dataset

So, it mounts them? That seems counterintuitive. Seems like this should be equivalent to zfs key -l on all encrypted filesystems.

waiting for the key

I think this should be keys (plural).

Otherwise,

I think you mean, If \fB-l\fR is not specified,, not "If no datasets have a prompt keysource"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct on both points, will fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this some more, on the surface it seems like -l does approximately nothing, because we already mount filesystems when the pool is imported, and presumably that will load keys for the filesystems we mount. I think the exception is filesystems with mountpoint=legacy or canmount=off | noauto. If that's right, I think it would be worth mentioning here - that -l is used to ensure that all keys are loaded, even for filesystems that are not mounted because they have mountpoint=legacy or canmount=off | noauto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zfs key -l and zpool import -l also will result in encrypted zvols appearing in /dev/, by the way. To me, these commands allow the user to access the dataset normally doing whatever without worrying about encryption anymore.

That said, I can verify the behavior of this command and make the comment more specific accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more context about the reason for zpool import -l after looking back at the code:

Thinking about this some more, on the surface it seems like -l does approximately nothing, because we already mount filesystems when the pool is imported, and presumably that will load keys for the filesystems we mount.

Right now, zpool import (without -l) will actually leave encrypted datasets unmounted. The reason for this was compatibility. I was afraid that someone might add an encrypted dataset to a pool that is mounted via an automated script. When the pool is re-imported, the script would hit the prompt for the encrypted dataset and hang indefinitely. As a result, I wanted people to have to opt into the automatic key loading.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, zpool import (without -l) will actually leave encrypted datasets unmounted.

OK, let's document that in the manpage.

hierarchy, file size, file holes, and dedup tables. Key rotation is managed
internally by the ZFS kernel module and changing the user's key does not
require re-encrypting the entire dataset. Datasets can be scrubbed, resilvered,
moved, and deleted without the encryption keys being loaded (see the zfs key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

renamed?

zfs key

\fBzfs key\fR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

provides additional protection against maliciously altered data. Deduplication
is still possible with encryption enabled but for security, datasets will only
dedup against themselves, their snapshots, and their clones. Encrypted data
cannot be embedded via the \fIembedded_data\fR feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph belongs in the zfs.8 manpage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

encryption datasets may be vulnerable to a CRIME-like attack if applications
accessing the data allow for it. Deduplication with encryption will leak
information about which blocks are equivalent in a dataset and will incur an
extra CPU cost per block written.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph belongs in the zfs.8 manpage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

internally by the ZFS kernel module and changing the user's key does not
require re-encrypting the entire dataset. Datasets can be scrubbed, resilvered,
moved, and deleted without the encryption keys being loaded (see the zfs key
subcommand for more info).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the information in here should also be in the zfs.8 manpage, perhaps in a new subsection of the DESCRIPTION (peer to Clones, Mount Points, etc). The zpool-features manpage should primarily document why you would want to enable this feature. See for example the documentation of the bookmarks feature. In this case:

Enabling this property allows setting the \fBencryption\fR property to values other than \fBoff\fR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix. Would you want me to move some of this information or make a copy of it in zfs.8 (perhaps with different wording)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think move it, and leave just the minimal description of what this property does (allows setting encryption), and when it becomes active and enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Will fix.

@tcaputi
Copy link
Contributor Author

tcaputi commented Jan 14, 2017

@ahrens Thanks for helping with the review. I should be able to address all of the comments and make another push early next week.


.LP
.nf
\fB\fBzfs key -l\fR \fIfilesystem\fR | \fIvolume\fR\fR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be easier to read (e.g. in a script or an email) if we also have long opts for the "verbs" here:

zfs key --load
zfs key --unload
zfs key --change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZoL actually doesn't have any longopts handling at all. I can add it though, if it will help.

This (the wording of this command) is actually the biggest complaint I've received about the patch so far. I've heard from several people that they don't like that key isn't a verb like create or destroy. That said, I have been unable to come up with any alternatives that dont seem tedious (like zfs loadkey, zfs unloadkey, zfs changekey`. If you have any thoughts here now would probably be a good time while I'm cleaning up all of this other stuff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thoughts... nothing better jumps to mind but I'll keep pondering...

FWIW, I think that the tedious zfs loadkey etc are in keeping with the existing interface, which does have some collections of verbs that all operate on the same concepts (e.g. zfs allow, zfs unallow; and zfs hold, zfs release, zfs holds).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change this if you'd like (its a bit tedius to make every one of these a separate funnction in both kernel and userspace. My only other qualm here is that people have asked for a few other key-related subcommands like zfs verifykey which would tell you if a key is correct (even if its already loaded).

Another one I was going to add while doing all these fixups is zfs linkkey which would tell a dataset to inherit the wrapping keys of a parent. Currently, you can break wrapping key inheritance with zfs key -c but there's no way to relink them later so this would add that functionality. So at that point there are now 5 key commands which seems like a lot of bloat in both zfs_cmd.c and zfs_ioctl.c.

I'm fine doing it either way. Let me know what you think is best. Maybe @behlendorf could weigh in here too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do create lots of subcommands, I don't think that makes it necessary to create lots of ioctls; you can keep using one ioctl if that's cleaner/simpler.

zfs linkkey

Too bad key change doesn't really fit in with the property scheme, otherwise zfs inherit would be the obvious choice. I'd suggest that if we want this functionality, it may make sense to borrow the "inherit" terminology here, e.g. zfs inheritkey or zfs key --inherit.

.ne 2
.mk
.na
\fB\fBkeysource\fR=<\fBraw\fR | \fBhex\fR | \fBpassphrase\fR>,<\fBprompt\fR | \fBfile://\fR\fI<absolute file path\fR>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get wanting to use a URI in case we later add extensions, like to contact some kind of keyserver. But it seems like it would be handy if we could specify an absolute path as a shortcut for file://<path>. This seems non-ambiguous since AFAIK a URI can't start with a /.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another leftover from trying to stick to the Solaris UI. I actually kind of like the file URI, but that's just me. Maybe @behlendorf could be a tie-breaker here?

.ne 2
.mk
.na
\fB\fBkeysource\fR=<\fBraw\fR | \fBhex\fR | \fBpassphrase\fR>,<\fBprompt\fR | \fBfile://\fR\fI<absolute file path\fR>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do keysource=raw,prompt, is it really possible to type/paste the raw data (including nul characters) into the terminal? If not, I assume this is still a valid setting because you can pipe raw bytes into zfs key -l? That might be worth mentioning in the "STDIN" section of zfs key -l docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually is possible to type paste raw characters in (at least in my terminal maybe not all). But yes, this is still valid for the use-case you described. I can add a sentence or 2 to the man page.

@grahamperrin
Copy link
Contributor

Back to basics for a moment, and apologies if I read these points out of context:

… can't create an unencrypted fs under an encrypted …

– and from related https://github.com/tcaputi/zfs/blob/2f7c9cb5e27c4feef62971d0c2f802735d60a4e8/man/man8/zfs.8#L1021:

… Regardless, all children of an encrypted dataset must also be encrypted.

Children and creation of children aside, for a moment.

What, if anything, exists – or should exist – to prevent simple mount of a non-encrypted filesystem at a point within an encrypted filesystem?

(Probably treat that as a question about documentation. Not an expression of paranoia.)

.ad
.sp .6
.RS 4n
Unloads a key from ZFS, removing the ability to access the dataset and all of its children that inherit the \fBencryption\fR property. This requires that the dataset is not currently open or mounted. When a key is unloaded the \fBkeystatus\fR property will be set to \fBunavailable\fR.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about \fBkeystatus\fR property is \fBunavailable\fR. It seems strange to say that it's "set" because it isn't a settable property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

.ad
.sp .6
.RS 4n
Allows a user to change the encryption key used to access a dataset. This command requires that the existing key for the dataset is already loaded into ZFS. This command may also be used to change the \fBpbkdf2iters\fR and / or \fBkeysource\fR properties as needed. If the dataset was previously inheriting the \fBencryption\fR property when this command is run it will now be locally set, indicating that this dataset must have its key loaded separately from the parent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale behind requiring the key already be loaded for zfs key -c, compared with zfs mount which will load the key for you? I think you mentioned in another comment that there was a concern about what state you're left in (loaded or not) when zfs key -c completes. Maybe it should be the same as if you do zfs mount; zfs unmount, which I think would leave the key loaded.

The philosophy around key loaded-ness seems to be that in general we attempt to load when the keys are needed, and unload only when explicitly requested (zfs key -u). In the current implementation, I'm imaging user interaction:

$ zfs key -c ...
Sorry, you can't change keys because they aren't loaded.  Run "zfs key -l ..." to load them.
# OK fine I'll copy/paste what you said to do, but if you know what I have to do, why didn't you do it for me?
$ zfs key -l ...
$ zfs key -c ...

Seems like we should have a good reason for annoying them in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a good middle ground is that the keys will remain in whatever state they were in before the zfs key -c command was run? Is that reasonable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this comment I made above.

@tcaputi
Copy link
Contributor Author

tcaputi commented Jan 14, 2017

@grahamperrin

What, if anything, exists – or should exist – to prevent simple mount of a non-encrypted filesystem at a point within an encrypted filesystem?

Nothing prevents a mount like that from being done right now, but as far as I am aware that will not automount when the parent is mounted where as child datasets are.

I can look into adding a sentence or 2 to the docs about this

@mailinglists35
Copy link

if this PR good for testing by unpatient end-users? is the on-disk format finished?

@tcaputi
Copy link
Contributor Author

tcaputi commented Jan 17, 2017

@mailinglists35 and everyone else:

if this PR good for testing by unpatient end-users? is the on-disk format finished?

I would say patch 89b4e7a is good enough to test. This week I will be addressing @ahrens comments and I will be using buildbot here for testing, so I wouldn't use any of these commits until theyre back to being stable (I will squash them all at that time).

The on-disk format has not been verified yet. We are working to get there.

@lundman
Copy link
Contributor

lundman commented Jan 18, 2017

I'm still getting patches on a silver-platter once the dust settles, right? :)

@tcaputi
Copy link
Contributor Author

tcaputi commented Jan 18, 2017

@lundman Yes. Let me get all of these changes stable first (hopefully sometime this week) and then I'll squash them and I'll send you a nice and tidy patch


/*
* With the advent of encrypted data in the ARC it is now possible for
* legitimate errors to arise while transforming data into its desired format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the style/tone of comments should be to describe the code as it is. Consider someone reading this in 5 years time - the "advent" making it "now possible" will seem misplaced. Instead, consider something like:
Because the ARC can store encrypted data, errors (not due to bugs) may arise while transforming data into its desired format - specifically, when decrypting, the key may not be present, or the HMAC may not be correct, which signifies deliberate tampering with the on-disk state (assuming that the checksum was correct). The "error" parameter will be nonzero in this case, even if there is no associated zio.
Not sure if the technical details above are correct, but hopefully the style makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. I will also fix the comment.

@@ -112,20 +121,21 @@ typedef enum arc_flags
ARC_FLAG_L2_WRITING = 1 << 11, /* write in progress */
ARC_FLAG_L2_EVICTED = 1 << 12, /* evicted during I/O */
ARC_FLAG_L2_WRITE_HEAD = 1 << 13, /* head of write list */
ARC_FLAG_ENCRYPT = 1 << 14, /* encrypted on disk */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might add something like may or may not be encrypted in memory, to emphasize this fact (lest anyone wonder if encrypted on disk is what you precisely meant).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

* disk as they appear in the main pool. In order for this to work we
* need to pass around the encryption parameters so they can be used
* to write data to the L2ARC. This struct is only defined in the
* arc_buf_hdr_t if the L1 header is defined and the has the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: remove extraneous the.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

@@ -144,7 +154,8 @@ typedef enum arc_flags

typedef enum arc_buf_flags {
ARC_BUF_FLAG_SHARED = 1 << 0,
ARC_BUF_FLAG_COMPRESSED = 1 << 1
ARC_BUF_FLAG_COMPRESSED = 1 << 1,
ARC_BUF_FLAG_ENCRYPTED = 1 << 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here contrasting this with ARC_FLAG_ENCRYPTED.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

@@ -1331,13 +1391,30 @@ arc_buf_lsize(arc_buf_t *buf)
return (HDR_GET_LSIZE(buf->b_hdr));
}

boolean_t
arc_is_encrypted(arc_buf_t *buf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining what this means. In particular, this tells if the buf is stored encrypted in the ARC, and therefore ... [can't be read unless X flag passed to arc_read()? unless key loaded via foobar()?]. It returns false if the data is encrypted on disk but decrypted in memory.

(The same should type of comment should probably have been added above arc_get_compression.)

Copy link
Contributor Author

@tcaputi tcaputi Jan 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix. should i add the one for compression as well while I'm here? Or should that be left to a different PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with adding the compression comment while you're here.

@@ -208,7 +209,7 @@ typedef struct zio_cksum_salt {
* G gang block indicator
* B byteorder (endianness)
* D dedup
* X encryption (on version 30, which is not supported)
* X encryption
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should say set to zero; see diagram below for encrypted blocks, and then there should be a separate diagram that shows the layout for encrypted blocks, similar to how we explain embedded BP's (IIRC, checksum[2-3], fill, and dva[2] have different meanings when the X bit is set). The legend of the new diagram might omit fields that are already described here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I think the in-memory representations of the IV and salt (and MAC?) are treated as byte arrays, be sure to explain how those are encoded into the 64-bit words (e.g. low 8 bits of the word are the low byte of the array, etc). As you probably know, we can't simply treat the byte array as a uint64_t*, because of endianness concerns.

}

#define BP_GET_IV2(bp) BF64_GET((bp)->blk_fill, 32, 32)
#define BP_SET_IV2(bp, iv2) BF64_SET((bp)->blk_fill, 32, 32, iv2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should assert that the BP is encrypted (you can use the comma operator to do the assertion in the GET, see BPE_GET_ETYPE() for an example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

}

#define BP_GET_IV2(bp) BF64_GET((bp)->blk_fill, 32, 32)
#define BP_SET_IV2(bp, iv2) BF64_SET((bp)->blk_fill, 32, 32, iv2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful (for readability) to add accessor macros for the other encryption-specific fields (even if they are entire words). Those macros could also assert that they are only used on encrypted BP's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have full functions for these (see zio_crypt_encode_params_bp() in zio_crypt.c for instance). Let me know if you think they should be changed to macros and brought to spa.h

@@ -269,6 +270,7 @@ struct spa {
spa_avz_action_t spa_avz_action; /* destroy/rebuild AVZ? */
uint64_t spa_errata; /* errata issues detected */
spa_stats_t spa_stats; /* assorted spa statistics */
spa_keystore_t spa_keystore; /* loaded crypto keys */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align the comment with the adjacent ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

@@ -86,7 +87,7 @@ typedef struct zil_header {
* number passed in the blk_cksum field of the blkptr_t
*/
typedef struct zil_chain {
uint64_t zc_pad;
uint64_t zc_mac; /* mac for encryption */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align the comment with the adjacent ones

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

.ad
.sp .6
.RS 4n
Indicates that the zpool command will request encryption keys for all encrypted datasets it attempts to mount as it is bringing the pool online. This is equivalent to running \fBzfs mount\fR on each encrypted dataset immediately after the pool is imported. If any datasets have a \fBprompt\fR keysource this command will block waiting for the key to be entered. Otherwise, encrypted datasets will be left unavailable until the keys are loaded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, zpool import (without -l) will actually leave encrypted datasets unmounted.

OK, let's document that in the manpage.

@@ -1331,13 +1391,30 @@ arc_buf_lsize(arc_buf_t *buf)
return (HDR_GET_LSIZE(buf->b_hdr));
}

boolean_t
arc_is_encrypted(arc_buf_t *buf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with adding the compression comment while you're here.

/*
* After encrypting many blocks with the same salt we may start to run
* up against the theoretical limits of how much data can securely be
* encrypted a single key using the supported encryption modes. To
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: encrypted *with* a single key

* counteract this we generate a new salt after writing
* ZIO_CRYPT_MAX_SALT_USAGE blocks of data, tracked by zk_salt_count.
* The current value was chosen because it is approximately the number
* of blocks that would have to be written in order to acheive a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: achieve

* 1 / 1 trillion chance of having an IV collision. Developers looking to
* change this number should make sure they take into account the
* birthday problem in regards to IV generation and the limits of what the
* underlying mode can actually handle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment should lay out the math behind this number. That would make it clear that the birthday problem is accounted for. Also we should simply say that we account for the birthday problem rather than condescend to future readers. E.g. This protects against a birthday attack. With n = 400 million blocks encrypted with the same key and salt, and d = the number of IV's = 2^96, the probability of two blocks using the same IV is: 1 - 1 * (1 - 1 / d ) * ( 1 - 2 / d ) * … * (1 - (n - 1) / d) which is approximated by ... (according to [citation]), which is 10^-12.


/* utility macros */
#define BITS_TO_BYTES(x) (((x) + 7) >> 3)
#define BYTES_TO_BITS(x) (x << 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the compiler can do the optimization for / 8. How about x * NBBY and (x + NBBY - 1) / NBBY.

#define DSL_CRYPTO_KEY_IV "DSL_CRYPTO_IV"
#define DSL_CRYPTO_KEY_MAC "DSL_CRYPTO_MAC"
#define DSL_CRYPTO_KEY_MASTER_BUF "DSL_CRYPTO_MASTER"
#define DSL_CRYPTO_KEY_HMAC_KEY_BUF "DSL_CRYPTO_HMAC_KEY"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for the macro names and values to not be consistent, at least DSL_CRYPTO_KEY_XXX -> "DSL_CRYPTO_XXX" (this applies to the last 2 which differ in the _BUF suffix)

extern zio_crypt_info_t zio_crypt_table[ZIO_CRYPT_FUNCTIONS];

/* ZAP entry keys for DSL Encryption Keys stored on disk */
#define DSL_CRYPTO_KEY_CRYPT "DSL_CRYPTO_CRYPT"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"crypto crypt" is not the most descriptive name. Maybe CRYPTO_FUNC?

@@ -208,7 +209,7 @@ typedef struct zio_cksum_salt {
* G gang block indicator
* B byteorder (endianness)
* D dedup
* X encryption (on version 30, which is not supported)
* X encryption
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I think the in-memory representations of the IV and salt (and MAC?) are treated as byte arrays, be sure to explain how those are encoded into the 64-bit words (e.g. low 8 bits of the word are the low byte of the array, etc). As you probably know, we can't simply treat the byte array as a uint64_t*, because of endianness concerns.

* encrypted, this stage determines how the encryption metadata is stored in
* the bp. Decryption is performed during ZIO_STAGE_READ_BP_INIT as a transform
* callback. Encryption is also mutually exclusive with nopwrite, because
* encrypted blocks with the same plaintext will not have matching ciphertexts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about because blocks with the same plaintext will be encrypted with different salts and therefore different IV's (if dedup is off), and therefore have different ciphertexts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

zfeature_register(SPA_FEATURE_ENCRYPTION,
"com.datto:encryption", "encryption",
"Support for dataset level encryption",
0, encryption_deps);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encryption feature is is deactivated when it's no longer used, but it isn't PER_DATASET? I'll look for where this is implemented... it may be simpler to use the PER_DATASET flag here.

Copy link
Contributor Author

@tcaputi tcaputi Feb 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the comments around this flag it looks like this might be the case. I'm not familiar with this flag but it looks like all I need to do is set ds->ds_feature_activation_needed[feature] = B_TRUE when the dataset starts using the feature. Is this correct? Currently, the implementation increments the feature count when it creates a key object in the spa and decrements it when that object is destroyed.

ZIO_CRYPT_DEFAULT, PROP_ONETIME, ZFS_TYPE_DATASET,
"on | off | aes-128-ccm | aes-192-ccm | aes-256-ccm | "
"aes-128-gcm | aes-192-gcm | aes-256-gcm", "ENCRYPTION",
crypto_table);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you could argue that this property is inherited, but from the prop management infrastructure, it's ONETIME, so it should probably be registered in the "set once index properties" section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

boolean_t
zfs_prop_valid_keylocation(const char *str)
{
if (strlen(str) == 6 && strncmp("prompt", str, 6) == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent to strcmp("prompt", str) == 0, but more brittle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

{
if (strlen(str) == 6 && strncmp("prompt", str, 6) == 0)
return (B_TRUE);
else if (strlen(str) > 8 && strncmp("file:///", str, 8) == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having the literal 8's, how about something like:

#define FILE_URI_PREFIX "file:///"
#define FILE_URI_PREFIX_LEN strlen(FILE_URI_PREFIX)

Also I'm not sure how valuable the "len>8" check is, since file:/// is just as wrong as file:///etc, but we aren't checking for that here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is something I was trying to think of ways to improve or possibly remove. Realistically, zfs_prop_valid_keylocation() is just a sanity check. The create, clone, and change-key code all will actually need to load the key from the given keylocation before they can even attempt to talk to the kernel. This is all done in userspace, however, so the kernel can't really be sure if the keylocation it receives is valid. As a result, I added this function as a quick smoke test for the kernel. Can you think of a better way to check this?

@@ -1527,13 +1536,14 @@ dmu_recv_begin_sync(void *arg, dmu_tx_t *tx)
const char *tofs = drba->drba_cookie->drc_tofs;
dsl_dataset_t *ds, *newds;
uint64_t dsobj;
int flags = DS_HOLD_FLAG_DECRYPT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe name this dsflags for consistency with dmu_recv_begin_check()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only called it dsflags there because flags was already being used. Would you like me to change this everywhere?

@@ -2454,14 +2466,16 @@ receive_free(struct receive_writer_arg *rwa, struct drr_free *drrf)
static void
dmu_recv_cleanup_ds(dmu_recv_cookie_t *drc)
{
int flags = DS_HOLD_FLAG_DECRYPT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dsflags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above

*/
dsl_dataset_disown(drc->drc_ds, dmu_recv_tag);
(void) spa_keystore_remove_mapping(dmu_tx_pool(tx)->dp_spa,
drc->drc_ds, drc->drc_ds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so that zfs receive unloads the key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that it releases the key mapping hold it created in dmu_recv_begin(). Loading and unloading is for wrapping keys, technically.

@@ -196,8 +197,11 @@ traverse_prefetch_metadata(traverse_data_t *td,
if (BP_GET_LEVEL(bp) == 0 && BP_GET_TYPE(bp) != DMU_OT_DNODE)
return;

if ((td->td_flags & TRAVERSE_NO_DECRYPT) && BP_IS_ENCRYPTED(bp))
zio_flags |= ZIO_FLAG_RAW;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth a comment somewhere that this is all about dnode blocks (since traversal doesn't read any other encrypted blocks).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

int32_t i;
int32_t epb = BP_GET_LSIZE(bp) >> DNODE_SHIFT;
dnode_phys_t *child_dnp;

/*
* dnode blocks might have their bonus buffers encrypted, so
* we must be careful to honor TRAVERSE_NO_DECRYPT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we assert that other cases here are not encrypted (and therefore it's OK to ignore NO_DECRYPT)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. will fix.

@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 9, 2017

Thank you very much to everyone who has looked at and commented on this PR. As some of you may have noticed, this page is getting a bit too big for github to handle effectively so I will be moving it to a new PR in a few minutes. I will link it here when it has been made.

@tcaputi tcaputi closed this Feb 9, 2017
@tcaputi tcaputi mentioned this pull request Feb 9, 2017
@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 9, 2017

The new PR has been opened at #5769. Please leave any future comments and reviews there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet