Skip to content
This repository was archived by the owner on Nov 7, 2019. It is now read-only.

8727 Native data and metadata encryption for zfs #489

Closed
wants to merge 2 commits into from

Conversation

lundman
Copy link

@lundman lundman commented Nov 1, 2017

Re-opening the ZOL crypto commit for final reviews and testing. Original upstream commit is openzfs/zfs#5769

This will replace, and close, PR #124

This commit contains all additional PRs, fixes and corrections from ZOL, in one commit.

The only difference to upstream, is userland ztest do not test crypto for IllumOS as the crypto framework is not compiled for userland. A decision should perhaps be made here to leave it as it is, or create and compile a library to link ztest with, or compile directly into ztest.

The ZOL message follows:

This change incorporates three major pieces:

The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.

The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.

The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Send / Recv Fixes following b52563

This patch fixes several issues discovered after
the encryption patch was merged:

Fixed a bug where encrypted datasets could attempt
to receive embedded data records.

Fixed a bug where dirty records created by the recv
code wasn't properly setting the dr_raw flag.

Fixed a typo where a dmu_tx_commit() was changed to
dmu_tx_abort()

Fixed a few error handling bugs unrelated to the
encryption patch in dmu_recv_stream()

Signed-off-by: Tom Caputi <tcaputi@datto.com>

Encryption patch follow-up

  • HKDF implementation moved to its own file and tests added to ensure
    correctness.

  • Ztest can now create and test encrypted datasets. This is currently
    disabled until issue ZOL #6526 is resolved, but otherwise functions as
    advertised.

  • Several small bug fixes discovered after enabling ztest to run on
    encrypted datasets.

  • Fixed coverity defects added by the encryption patch.

  • Updated man pages for encrypted send / receive behavior.

  • Fixed a bug where encrypted datasets could receive
    DRR_WRITE_EMBEDDED records.

  • Minor code cleanups / consolidation.

  • Includes fix in dmu_free_long_object_impl

IllumOS considerations:

Disable crypto tests in ztest
Unless permission is given to compile the crypto framework in userland
the crypto tests in ztest are disabled on IllumOS.

@ahrens
Copy link
Member

ahrens commented Nov 1, 2017

This is great! Can anyone share additional testing that they've done? Even just banging on it with manual testing would help build confidence. I know the Linux community did a bunch of pre-integration testing and shook out several issues with the PR.

@rmustacc
Copy link

rmustacc commented Nov 1, 2017

@ahrens I know that @arekinath has been doing a bunch of testing of this and should be able to provide some additional details about what he's seen and done.

@lundman lundman force-pushed the illumos-8727 branch 3 times, most recently from bbdfac9 to c174b02 Compare November 9, 2017 08:24
@lundman
Copy link
Author

lundman commented Nov 10, 2017

openzfs/zfs#6845

@behlendorf
Copy link

openzfs/zfs#6848

@lundman
Copy link
Author

lundman commented Nov 10, 2017

@behlendorf Thanks, now included.

@lundman
Copy link
Author

lundman commented Jan 11, 2018

This now includes the ZOL openzfs/zfs#6864 PR as well, so we can test zfs send --raw fixes.

@dariusc93
Copy link

Very surprise to even seen this PR. Hopefully it will get fixed up, tested, and merged.

@lundman lundman force-pushed the illumos-8727 branch 2 times, most recently from 5809edd to 22f72c9 Compare January 12, 2018 23:46
kithrup added a commit to truenas/os that referenced this pull request Jan 25, 2018
(openzfs/openzfs#489).

This does not build, but it compiles as reasonably could be
expected for the moment.  crypto decisions need to be made.
@GernotS
Copy link

GernotS commented Feb 5, 2018

I get a reproducible panic using this patch on smartos.
Steps to reproduce:

zpool create -f test c1t0d0
zfs create -o encryption=aes-256-ccm -o keyformat=passphrase -o compression=off test/ccm
dd if=/dev/zero of=/test/ccm/disk

wait untit it dumps, will need a few GB of data written
I can provide the dump if needed.

panic[cpu1]/thread=ffffff000fb16c40:
BAD TRAP: type=e (#pf Page fault) rp=ffffff000fb16970 addr=2e occurred in module "zfs" due to a NULL pointer dereference

sched:
#pf Page fault
Bad kernel fault at addr=0x2e
pid=0, pc=0xfffffffff7d49475, sp=0xffffff000fb16a60, eflags=0x10246
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: 2e
cr3: 1dc00000
cr8: c

    rdi: ffffff03d3e09c40 rsi:            6e800 rdx:                1
    rcx:               18  r8: ffffff03d3fa42e8  r9: ffffff03df7e461c
    rax:                0 rbx: ffffff03d3e09c40 rbp: ffffff000fb16a70
    r10: fffffffffb857b60 r11:                0 r12: ffffffffc001b020
    r13:                0 r14: ffffff03d3fa3000 r15:                0
    fsb:                0 gsb: ffffff03c9b89080  ds:               4b
     es:               4b  fs:                0  gs:              1c3
    trp:                e err:                0 rip: fffffffff7d49475
     cs:               30 rfl:            10246 rsp: ffffff000fb16a60
     ss:               38

ffffff000fb16850 unix:real_mode_stop_cpu_stage2_end+b30c ()
ffffff000fb16960 unix:trap+e08 ()
ffffff000fb16970 unix:_cmntrap+e6 ()
ffffff000fb16a70 zfs:arc_buf_size+15 ()
ffffff000fb16ab0 zfs:arc_evictable_space_increment+a8 ()
ffffff000fb16b00 zfs:remove_reference+65 ()
ffffff000fb16b40 zfs:arc_buf_destroy+77 ()
ffffff000fb16ba0 zfs:dbuf_destroy+38 ()
ffffff000fb16bc0 zfs:dbuf_evict_one+ae ()
ffffff000fb16c20 zfs:dbuf_evict_thread+16f ()
ffffff000fb16c30 unix:thread_start+8 ()

@lundman
Copy link
Author

lundman commented Feb 6, 2018

I was unable to replicate:

write: No space left on device
8245278720 bytes transferred in 1880.011589 secs (4385760 bytes/sec)

But that could simply be that I don't get dbuf eviction triggered.

However I did find some differences between the two versions, perhaps it will help.

@GernotS
Copy link

GernotS commented Feb 6, 2018

This change did not cure my issue yet.
I managed to write 15GB using dd until it crashed again.

@GernotS
Copy link

GernotS commented Feb 7, 2018

I´m sorry to report the latest commit didnt fix my problem yet, but it indeed cured a long open issue with compressed send/receive, see also https://www.illumos.org/issues/7252
This is now the first time I was able to use zfs send -c sucessfully for more than just a few GB of data.
Thanks for that Jorgen!

@lundman
Copy link
Author

lundman commented Feb 7, 2018

Neat, I didn't mention the commit today, since the ZOL tickets just talked about send|recv :)

Still unable to reproduce dnode_destroy panic, but the tester here seems to die though

@lundman
Copy link
Author

lundman commented Mar 16, 2019

OK there it is - have a look at it again. It is not awaiting any commits currently.

@lundman lundman force-pushed the illumos-8727 branch 4 times, most recently from 1f60bc0 to 8941fe3 Compare April 6, 2019 00:23
@Formalizer
Copy link

Hey guys :) I'd love to give it a try. Please let me know when it is a good time for testing!

@andy-js
Copy link

andy-js commented Apr 10, 2019

libzfs now has a bunch of extra dependencies but usr/src/lib/Makefile wasn't updated.

@@ -77,4 +77,7 @@ typeset -a properties=(
"feature@obsolete_counts"
"feature@zpool_checkpoint"
"feature@spacemap_v2"
"feature@large_dnode"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate

@lundman
Copy link
Author

lundman commented Apr 16, 2019

OK, the latest thing we were waiting on is in, and squashed. It is time to test everyone.

@GernotS
Copy link

GernotS commented Apr 24, 2019 via email

@gdamore
Copy link

gdamore commented May 3, 2019

We can confirm GernotS's crash is quite reproducible. It is significant that we don't have:

openzfs/zfs@2c33b91

I think vdev_top_lookup() is failing for some reason, and perhaps the assumption is bad here. A test to disable the code to calculate the DDT stats seemed to prevent the crash, but of course the reported values for available space, etc. were all garbage as a result.

@gdamore
Copy link

gdamore commented May 3, 2019

Note also that it doesn't require ZFS send to trigger this, just writing to a dedup'ed dataset with crypto enabled will crash.

@trisk
Copy link

trisk commented May 3, 2019

ddt_stat_generate needs to skip the third dva in the each encrypted bp, which is actually the salt value in the encrypted blkptr_t layout. Or omit encrypted bps entirely.

@GernotS
Copy link

GernotS commented May 20, 2019

All my issues have been resolved now. Where do we stand in terms of integration?

thanks
regards
Gernot

@lundman
Copy link
Author

lundman commented May 23, 2019

Another complicated merge, master had prefetch in it. Hopefully I got it correctly.

@GernotS
Copy link

GernotS commented Jun 6, 2019

we are missing openzfs/zfs#8857

@lundman
Copy link
Author

lundman commented Jun 6, 2019

Yep, ready to add it when ZoL deems it OK

@GernotS
Copy link

GernotS commented Jun 7, 2019

small nit:
checking fs/zfs.h
fs/zfs.h: 1100: indent by spaces instead of tabs
*** Error code 1
dmake: Warning: Command failed for target `fs/zfs.check'

@GernotS
Copy link

GernotS commented Jun 10, 2019

May I ask for another rebase against master?

thanks a lot

@GernotS
Copy link

GernotS commented Jun 11, 2019

Doing a scrub with these bits creates permanent Errors in some snapshots of encrypted zvols and filesystems:
errors: Permanent errors have been detected in the following files:

    zones/crypto@2019-05-05_07.17.00--1y:<0x0>

zones/crypto is encrypted but empty, I can provide the snapshot but I guess this should be reproducible anyway.

@GernotS
Copy link

GernotS commented Jun 13, 2019

Scrub errors have been resolved, thanks!

Tom Caputi added 2 commits June 20, 2019 10:56
    This change incorporates three major pieces:

    The first change is a keystore that manages wrapping
    and encryption keys for encrypted datasets. These
    commands mostly involve manipulating the new
    DSL Crypto Key ZAP Objects that live in the MOS. Each
    encrypted dataset has its own DSL Crypto Key that is
    protected with a user's key. This level of indirection
    allows users to change their keys without re-encrypting
    their entire datasets. The change implements the new
    subcommands "zfs load-key", "zfs unload-key" and
    "zfs change-key" which allow the user to manage their
    encryption keys and settings. In addition, several new
    flags and properties have been added to allow dataset
    creation and to make mounting and unmounting more
    convenient.

    The second piece of this patch provides the ability to
    encrypt, decyrpt, and authenticate protected datasets.
    Each object set maintains a Merkel tree of Message
    Authentication Codes that protect the lower layers,
    similarly to how checksums are maintained. This part
    impacts the zio layer, which handles the actual
    encryption and generation of MACs, as well as the ARC
    and DMU, which need to be able to handle encrypted
    buffers and protected data.

    The last addition is the ability to do raw, encrypted
    sends and receives. The idea here is to send raw
    encrypted and compressed data and receive it exactly
    as is on a backup system. This means that the dataset
    on the receiving system is protected using the same
    user key that is in use on the sending side. By doing
    so, datasets can be efficiently backed up to an
    untrusted system without fear of data being
    compromised.

    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Jorgen Lundman <lundman@lundman.net>
    Signed-off-by: Tom Caputi <tcaputi@datto.com>

Send / Recv Fixes following b52563

    This patch fixes several issues discovered after
    the encryption patch was merged:

    Fixed a bug where encrypted datasets could attempt
    to receive embedded data records.

    Fixed a bug where dirty records created by the recv
    code wasn't properly setting the dr_raw flag.

    Fixed a typo where a dmu_tx_commit() was changed to
    dmu_tx_abort()

    Fixed a few error handling bugs unrelated to the
    encryption patch in dmu_recv_stream()

    Signed-off-by: Tom Caputi <tcaputi@datto.com>

Encryption patch follow-up

* HKDF implementation moved to its own file and tests added to ensure
correctness.

* Ztest can now create and test encrypted datasets. This is currently
disabled until issue ZOL #6526 is resolved, but otherwise functions as
advertised.

* Several small bug fixes discovered after enabling ztest to run on
encrypted datasets.

* Fixed coverity defects added by the encryption patch.

* Updated man pages for encrypted send / receive behavior.

* Fixed a bug where encrypted datasets could receive
  DRR_WRITE_EMBEDDED records.

* Minor code cleanups / consolidation.

Disable crypto tests in ztest

* Includes fix in dmu_free_long_object_impl

Unless permission is given to compile the crypto framework in userland
the crypto tests in ztest are disabled on IllumOS.

Fix encryption root hierarchy issue

After doing a recursive raw receive, zfs userspace performs
a final pass to adjust the encryption root hierarchy as
needed. Unfortunately, the FORCE_INHERIT ioctl had a bug
which caused the encryption root to always be assigned to
the direct parent instead of the inheriting parent. This
patch simply fixes this issue

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Closes ZOL # 6847
Closes ZOL # 6848

Encryption Stability and On-Disk Format Fixes

The on-disk format for encrypted datasets protects not only
the encrypted and authenticated blocks themselves, but also
the order and interpretation of these blocks. In order to
make this work while maintaining the ability to do raw
sends, the indirect bps maintain a secure checksum of all
the MACs in the block below it along with a few other
fields that determine how the data is interpreted.

Unfortunately, the current on-disk format erroneously
includes some fields which are not portable and thus cannot
support raw sends. It is not possible to easily work around
this issue due to a separate and much smaller bug which
causes indirect blocks for encrypted dnodes to not be
compressed, which conflicts with the previous bug. In
addition, the current code generates incompatible on-disk
formats on big endian and little endian systems due to an
issue with how block pointers are authenticated. Finally,
raw send streams do not currently include dn_maxblkid when
sending both the metadnode and normal dnodes which are
needed in order to ensure that we are correctly maintaining
the portable objset MAC.

This patch zero's out the offending fields when computing
the bp MAC and ensures that these MACs are always
calculated in little endian order (regardless of the host
system's byte order). This patch also registers an errata
for the old on-disk format, which we detect by adding a
"version" field to newly created DSL Crypto Keys. We allow
datasets without a version (version 0) to only be mounted
for read so that they can easily be migrated. We also now
include dn_maxblkid in raw send streams to ensure the MAC
can be maintained correctly.

Fixes ZOL # 6845

Signed-off-by: Tom Caputi <tcaputi@datto.com>

Fix for # 6916

When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>

Change os->os_next_write_raw to work per txg

Currently, os_next_write_raw is a single boolean used for determining
whether or not the next call to dmu_objset_sync() should write out
the objset_phys_t as a raw buffer. Since the boolean is not associated
with a txg, the work simply happens during the next txg, which is not
necessarily the correct one. In the current implementation this issue
was misdiagnosed, resulting in a small hack in dmu_objset_sync() which
seemed to resolve the problem.

This patch changes os_next_write_raw to be an array of booleans, one
for each txg in TXG_OFF and removes the hack.

Signed-off-by: Tom Caputi <tcaputi@datto.com>

Raw sends must be able to decrease nlevels

Currently, when a raw zfs send file includes a DRR_OBJECT record
that would decrease the number of levels of an existing object,
the object is reallocated with dmu_object_reclaim() which
creates the new dnode using the old object's nlevels. For non-raw
sends this doesn't really matter, but raw sends require that
nlevels on the receive side match that of the send side so that
the checksum-of-MAC tree can be properly maintained. This patch
corrects the issue by freeing the object completely before
allocating it again in this case.

This patch also corrects several issues with dnode_hold_impl()
and related functions that prevented dnodes (particularly
multi-slot dnodes) from being reallocated properly due to
the fact that existing dnodes were not being fully cleaned up
when they were freed.

Signed-off-by: Tom Caputi <tcaputi@datto.com>

Handle compressed buffers in __dbuf_hold_impl()

In __dbuf_hold_impl(), if a buffer is currently syncing and is still
referenced from db_data, a copy is made in case it is dirtied again in
the txg.  Previously, the buffer for the copy was simply allocated with
arc_alloc_buf() which doesn't handle compressed or encrypted buffers
(which are a special case of a compressed buffer).  The result was
typically an invalid memory access because the newly-allocated buffer
was of the uncompressed size.

This commit fixes the problem by handling the 2 compressed cases,
encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and
arc_alloc_compressed_buf().

Although using the proper allocation functions fixes the invalid memory
access by allocating a buffer of the compressed size, another unrelated
issue made it impossible to properly detect compressed buffers in the
first place.  The header's compression flag was set to ZIO_COMPRESS_OFF
in arc_write() when it was possible that an attached buffer was actually
compressed.  This commit adds logic to only set ZIO_COMPRESS_OFF in
the non-ZIO_RAW case which wil handle both cases of compressed buffers
(encrypted or unencrypted).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>

Raw DRR_OBJECT records must write raw data

b1d2173 made it possible for empty metadnode blocks to be
compressed to a hole, fixing a bug that would cause invalid
metadnode MACs when a send stream attempted to free objects
and allowing the blocks to be reclaimed when they were no
longer needed. However, this patch also introduced a race
condition; if a txg sync occurred after a DRR_OBJECT_RANGE
record was received but before any objects were added, the
metadnode block would be compressed to a hole and lose all
of its encryption parameters. This would cause subsequent
DRR_OBJECT records to fail when they attempted to write
their data into an unencrypted block. This patch defers the
DRR_OBJECT_RANGE handling to receive_object() so that the
encryption parameters are set with each object that is
written into that block.

Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Fix bounds check in zio_crypt_do_objset_hmacs

The current bounds check in zio_crypt_do_objset_hmacs() does not
properly handle the possible sizes of the objset_phys_t and
can therefore read outside the buffer's memory. If that memory
happened to match what the check was actually looking for, the
objset would fail to be owned, complaining that the MAC was
invalid.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Raw receive should change key atomically

Currently, raw zfs sends transfer the encrypted master keys and
objset_phys_t encryption parameters in the DRR_BEGIN payload of
each send file. Both of these are processed as soon as they are
read in dmu_recv_stream(), meaning that the new keys are set
before the new snapshot is received. In addition to the fact that
this changes the user's keys for the dataset earlier than they
might expect, the keys were never reset to what they originally
were in the event that the receive failed. This patch splits the
processing into objset handling and key handling, the later of
which is moved to dmu_recv_end() so that they key change can be
done atomically.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Raw receives must compress metadnode blocks

Currently, the DMU relies on ZIO layer compression to free LO
dnode blocks that no longer have objects in them. However,
raw receives disable all compression, meaning that these blocks
can never be freed. In addition to the obvious space concerns,
this could also cause incremental raw receives to fail to mount
since the MAC of a hole is different from that of a completely
zeroed block.

This patch corrects this issue by adding a special case in
zio_write_compress() which will attempt to compress these blocks
to a hole even if ZIO_FLAG_RAW_ENCRYPT is set. This patch also
removes the zfs_mdcomp_disable tunable, since tuning it could
cause these same issues.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Add omitted set for os->os_next_write_raw

This one line patch adds adds a set to os->os_next_write_raw
that was omitted when the code was updated in 1b66810. Without
it, the code (in some instances) could attempt to write raw
encrypted data as regular unencrypted data without the keys
being loaded, triggering an ASSERT in zio_encrypt().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Prevent raw zfs recv -F if dataset is unencrypted

The current design of ZFS encryption only allows a dataset to
have one DSL Crypto Key at a time. As a result, it is important
that the zfs receive code ensures that only one key can be in use
at a time for a given DSL Directory. zfs receive -F complicates
this, since the new dataset is received as a clone of the existing
one so that an atomic switch can be done at the end. To prevent
confusion about which dataset is actually encrypted a check was
added to ensure that encrypted datasets cannot use zfs recv -F to
completely replace existing datasets. Unfortunately, the check did
not take into account unencrypted datasets being overriden by
encrypted ones as a case.

Along the same lines, the code also failed to ensure that raw
recieves could not be done on top of existing unencrypted
datasets, which causes amny problems since the new stream cannot
be decrypted.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

fix zero-length encryption

dont swallow error!

Encrypted dnode blocks should be prefetched raw

Encrypted dnode blocks are always initially read as raw data and
converted to decrypted data when an encrypted bonus buffer is
needed. This allows the DMU to be used for things like fetching
the DMU master node without requiring keys to be loaded. However,
dbuf_issue_final_prefetch() does not currently read the data as
raw. The end result of this is that prefetched dnode blocks are
read twice from disk: once decrypted and then again as raw data.
This patch corrects the issue by adding the flag when appropriate.

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Decryption error handling improvements

Currently, the decryption and block authentication code in
the ZIO / ARC layers is a bit inconsistent with regards to
the ereports that are produces and the error codes that are
passed to calling functions. This patch ensures that all of
these errors (which begin as ECKSUM) are converted to EIO
before they leave the ZIO or ARC layer and that ereports
are correctly generated on each decryption / authentication
failure.

In addition, this patch fixes a bug in zio_decrypt() where
ECKSUM never gets written to zio->io_error.

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Remove ASSERT() in l2arc_apply_transforms()

The ASSERT was erroneously copied from the next section of code.
The buffer's size should be expanded from "psize" to "asize"
if necessary.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>

fail fraction

fixes for SNPRINTF_BLKPTR with encrypted BP's

mdb doesn't have dmu_ot[], so we need a different mechanism for its
SNPRINTF_BLKPTR() to determine if the BP is encrypted vs authenticated.

Additionally, since it already relies on BP_IS_ENCRYPTED (etc),
SNPRINTF_BLKPTR might as well figure out the "crypt_type" on its own,
rather than making the caller do so.

remove assert for testing

Make encrypted "zfs mount -a" failures consistent

Currently, "zfs mount -a" will print a warning and fail to mount
any encrypted datasets that do not have a key loaded. This patch
makes the behavior of this failure consistent with other failure
modes ("zfs mount -a" will silently continue, explict "zfs mount"
will print a message and return an error code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Correct manpage for --raw

Move enum zio_encrypt into sys/fs/zfs.h

assertion failure in arc_release() during encrypted receive

receive_spill does not byte swap spill contents

In zfs receive, the function receive_spill should account
for spill block endian conversion as a defensive measure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>

add codes to truss

Correct swapped keylocation error messages

This patch corrects a small issue where two error messages
in the code that checks for invalid keylocations were
swapped.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Fix issues with raw sends of spill blocks

This patch fixes 2 issues in how spill blocks are processed during
raw sends. The first problem is that compressed spill blocks were
using the logical length rather than the physical length to
determine how much data to dump into the send stream. The second
issue is a typo that caused the spill record's object number to be
used where the objset's ID number was required. Both issues have
been corrected, and the payload_size is now printed in zstreamdump
for future debugging.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Fix race in dnode_check_slots_free()

Currently, dnode_check_slots_free() works by checking dn->dn_type
in the dnode to determine if the dnode is reclaimable. However,
there is a small window of time between dnode_free_sync() in the
first call to dsl_dataset_sync() and when the useraccounting code
is run when the type is set DMU_OT_NONE, but the dnode is not yet
evictable, leading to crashes. This patch adds the ability for
dnodes to track which txg they were last dirtied in and adds a
check for this before performing the reclaim.

This patch also corrects several instances when dn_dirty_link was
treated as a list_node_t when it is technically a multilist_node_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Fix issues found with zfs diff

Two deadlocks / ASSERT failures were introduced in a2c2ed1 which
would occur whenever arc_buf_fill() failed to decrypt a block of
data. This occurred because the call to arc_buf_destroy() which
was responsible for cleaning up the newly created buffer would
attempt to take out the hdr lock that it was already holding. This
was resolved by calling the underlying functions directly without
retaking the lock.

In addition, the dmu_diff() code did not properly ensure that keys
were loaded and mapped before begining dataset traversal. It turns
out that this code does not need to look at any encrypted values,
so the code was altered to perform raw IO only.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

DMU objset should not be encrypted

Correct merge collision in dmu_ot table.

Correct minor differences in arc.c

Add support for decryption faults in zinject

This patch adds the ability for zinject to trigger decryption
and authentication faults in the ZIO and ARC layers. This
functionality is exposed via the new "decrypt" error type, which
may be provided for "data" object types.

This patch also refactors some of the core encryption / decryption
functions so that they have consistent prototypes, handle errors
consistently, and do not have unused arguments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Raw receive functions must not decrypt data

This patch fixes a small bug found where receive_spill() sometimes
attempted to decrypt spill blocks when doing a raw receive. In
addition, this patch fixes another small issue in arc_buf_fill()'s
error handling where a decryption failure (which could be caused by
the first bug) would attempt to set the arc header's IO_ERROR flag
without holding the header's lock.

Signed-off-by: Tom Caputi <tcaputi@datto.com>

Update the correct abd in l2arc_read_done()

This patch fixes an issue where l2arc_read_done() would always
write data to b_pabd, even if raw encrypted data was requested.
This only occured in cases where the L2ARC device had a different
ashift than the main pool.

Signed-off-by: Tom Caputi <tcaputi@datto.com>

[PATCH] Make zvol update volsize operation synchronous.

There is a race condition when new transaction group is added
to dp->dp_dirty_datasets list by the zap_update in the
zvol_update_volsize.
Meanwhile, before these dirty data are synchronized, the receive process
can cause that dmu_recv_end_sync is executed. Then finally dirty data
are going to be synchronized but the synchronization ends with the NULL
pointer dereference error.

Signed-off-by: ab-oe <arkadiusz.bubala@open-e.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

[PATCH] Raw receive fix and encrypted objset security fix

This patch fixes two problems with the encryption code. First, the
current code does not correctly prohibit the DMU from updating
dn_maxblkid during object truncation within a raw receive. This
usually only causes issues when the truncating DRR_FREE record is
aggregated with DRR_FREE records later in the receive, so it is
relatively hard to hit.

Second, this patch fixes a security issue where reading blocks
within an encrypted object did not guarantee that the dnode block
itself had ever been verified against its MAC. Usually the
verification happened anyway when the bonus buffer was read, but
some use cases (notably zvols) might never perform the check.

Signed-off-by: Tom Caputi <tcaputi@datto.com>

Raw receive fix and encrypted objset security fix

This patch fixes two problems with the encryption code. First, the
current code does not correctly prohibit the DMU from updating
dn_maxblkid during object truncation within a raw receive. This
usually only causes issues when the truncating DRR_FREE record is
aggregated with DRR_FREE records later in the receive, so it is
relatively hard to hit.

Second, this patch fixes a security issue where reading blocks
within an encrypted object did not guarantee that the dnode block
itself had ever been verified against its MAC. Usually the
verification happened anyway when the bonus buffer was read, but
some use cases (notably zvols) might never perform the check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Add ASSERT to debug encryption key mapping issues

This patch simply adds an ASSERT that confirms that the last
decrypting reference on a dataset waits until the dataset is
no longer dirty. This should help to debug issues where the
ZIO layer cannot find encryption keys after a dataset has been
disowned.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Fix coverity defects: CID 176037

CID 176037: Uninitialized scalar variable

This patch fixes an uninitialized variable defect caught by
coverity and introduced in 6983060

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

ZFS send fails to dump objects larger than 128PiB

When dumping objects larger than 128PiB it's possible for do_dump() to
miscalculate the FREE_RECORD offset due to an integer overflow
condition: this prevents the receiving end from correctly restoring
the dumped object.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>

Fix hash_lock / keystore.sk_dk_lock lock inversion

The keystore.sk_dk_lock should not be held while performing I/O.
Drop the lock when reading from disk and update the code so
they the first successful caller adds the key.

Improve error handling in spa_keystore_create_mapping_impl().

Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: RageLtMan <rageltman@sempervictus>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Separate the error code for already unloaded key

Connected to the future commit: Adopt pyzfs from ClusterHQ
from ZOL (85ce3f4).

Refactor arc_hdr_realloc_crypt()

The arc_hdr_realloc_crypt() function is responsible for converting
a "full" arc header to an extended "crypt" header and visa versa.
This code was originally written with a bcopy() so that any new
members added to arc headers would automatically be included
without requiring a code change. However, in practice this (along
with small differences in kmem_cache implementations between
various platforms) has caused a number of hard-to-find problems in
ports to other operating systems. This patch solves this problem
by making all member copies explicit and adding ASSERTs for fields
that cannot be set during the transfer. It also manually resets the
old header after the reallocation is finished so it can be properly
reallocated and reused.

Signed-off-by: Tom Caputi <tcaputi@datto.com>

Do not call  dmu_objset_disown twice

In the error case. Mimic master branch style.

fix error handling in arc_read

spill blocks are metadata

zfs_receive_one needs to restore keylocation prop

Refcounted DSL Crypto Key Mappings

Since native ZFS encryption was merged, we have been fighting
against a series of bugs that come down to the same problem: Key
mappings (which must be present during all I/O operations) are
created and destroyed based on dataset ownership, but I/Os can
have traditionally been allowed to "leak" into the next txg after
the dataset is disowned.

In the past we have attempted to solve this problem by trying to
ensure that datasets are disowned ater all I/O is finished by
calling txg_wait_synced(), but we have repeatedly found edge cases
that need to be squashed and code paths that might incur a high
number of txg syncs. This patch attempts to resolve this issue
differently, by adding a reference to the key mapping for each txg
it is dirtied in. By doing so, we can remove many of the
unnecessary calls to txg_wait_synced() we have added in the past
and ensure we don't need to deal with this problem in the future.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Add zfs_refcount_transfer_ownership_many()

When debugging is enabled and a zfs_refcount_t contains multiple holders
using the same key, but different ref_counts, the wrong reference_t may
be transferred.  Add a zfs_refcount_transfer_ownership_many() function,
like the existing zfs_refcount_*_many() functions, to match and transfer
the correct refcount_t;

This issue may occur when using encryption with refcount debugging
enabled.  An arc_buf_hdr_t can have references for both the
hdr->b_l1hdr.b_pabd and hdr->b_crypt_hdr.b_rabd both of which use
the hdr as the reference holder.  When unsharing the buffer the
p_abd should be transferred.

This issue does not impact production builds because refcount holders
are not tracked.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Populate salt, iv, and mac for DRR_OBJECT_RANGE + indent

Stop 0-byte KM_SLEEP allocation in dmu_send.c

Port pool errata feature from ZOL

We need to detect and pass along at least 2 erratas with crypto so it
seemed best to port the errata framework entirely.

Fix handling of maxblkid for raw sends

Currently, the receive code can create an unreadable dataset from
a correct raw send stream. This is because it is currently
impossible to set maxblkid to a lower value without freeing the
associated object. This means truncating files on the send side
to a non-0 size could result in corruption. This patch solves this
issue by adding a new 'force' flag to dnode_new_blkid() which will
allow the raw receive code to force the DMU to accept the provided
maxblkid even if it is a lower value than the existing one.

For testing purposes the send_encrypted_files.ksh test has been
extended to include a variety of truncated files and multiple
snapshots. It also now leverages the xattrtest command to help
ensure raw receives correctly handle xattrs.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Add bookmark v2 on-disk feature

This patch adds the bookmark v2 feature to the on-disk format. This
feature will be needed for the upcoming redacted sends and for an
upcoming fix that for raw receives. The feature is not currently
used by any code and thus this change is a no-op, aside from the
fact that the user can now enable the feature.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Detect and prevent mixed raw and non-raw sends

Currently, there is an issue in the raw receive code where
raw receives are allowed to happen on top of previously
non-raw received datasets. This is a problem because the
source-side dataset doesn't know about how the blocks on
the destination were encrypted. As a result, any MAC in
the objset's checksum-of-MACs tree that is a parent of both
blocks encrypted on the source and blocks encrypted by the
destination will be incorrect. This will result in
authentication errors when we decrypt the dataset.

This patch fixes this issue by adding a new check to the
raw receive code. The code now maintains an "IVset guid",
which acts as an identifier for the set of IVs used to
encrypt a given snapshot. When a snapshot is raw received,
the destination snapshot will take this value from the
DRR_BEGIN payload. Non-raw receives and normal "zfs snap"
operations will cause ZFS to generate a new IVset guid.
When a raw incremental stream is received, ZFS will check
that the "from" IVset guid in the stream matches that of
the "from" destination snapshot. If they do not match, the
code will error out the receive, preventing the problem.

This patch requires an on-disk format change to add the
IVset guids to snapshots and bookmarks. As a result, this
patch has errata handling and a tunable to help affected
users resolve the issue with as little interruption as
possible.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Better user experience for errata 4

This patch attempts to address some user concerns that have arisen
since errata 4 was introduced.

* The errata warning has been made less scary for users without
  any encrypted datasets.

* The errata warning now clears itself without a pool reimport if
  the bookmark_v2 feature is enabled and no encrypted datasets
  exist.

* It is no longer possible to create new encrypted datasets without
  enabling the bookmark_v2 feature, thus helping to ensure that the
  errata is resolved.

Signed-off-by: Tom Caputi <tcaputi@datto.com>

Use 'printf %s' instead of 'echo -n' for compatibility

The ksh 'echo -n' behavior on Illumos and Linux differs.  For
compatibility with others platforms switch to "printf '%s' ".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>

Fix issues with truncated files in raw sends

This patch fixes a few issues with raw receives involving
truncated files:

* dnode_reallocate() now calls dnode_set_blksz() instead of
  dnode_setdblksz(). This ensures that any remaining dbufs with
  blkid 0 are resized along with their containing dnode upon
  reallocation.

* One of the calls to dmu_free_long_range() in receive_object()
  needs to check that the object it is about to free some contents
  of hasn't been completely removed already by a previous call to
  dmu_free_long_object() in the same function.

* The same call to dmu_free_long_range() in the previous point
  needs to ensure it uses the object's current block size and
  not the new block size. This ensures the blocks of the object
  that are supposed to be freed are completely removed and not
  simply partially zeroed out.

This patch also adds handling for DRR_OBJECT_RANGE records to
dprintf_drr() for debugging purposes.

Signed-off-by: Tom Caputi <tcaputi@datto.com>

Update raw send documentation

This patch simply clarifies some of the limitations related to
raw sends in the man page. No functional changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jason Cohen <jwittlincohen@gmail.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>

Minor cleanup in zio_crypt.c

Fix new sentence, new line errors and trailing spaces in zfs.1m

Revert "Fix issues with truncated files in raw sends"

This partially reverts commit 5dbf8b4.  This change resolved
the issues observed with truncated files in raw sends.  However,
the required changes to dnode_allocate() introduced a regression
for non-raw streams which needs to be understood.

The additional debugging improvements from the original patch
were not reverted.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue 7378
Issue 8528
Issue 8540
Issue 8565
Close 8584

Fix issues with truncated files in raw sends

When receiving a raw send stream only reallocated objects
whose contents were not freed by the standard indicators
should call dmu_free_long_range().

Furthermore, if calling dmu_free_long_range() is required
then the objects current block size must be used and not
the new block size.

Two additional test cases were added to provided realistic
test coverage for processing reallocated objects which are
part of a raw receive.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes 8528
Closes 8607

Fix hierarchy misspellings

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes 8563
Closes 8622

Incorrect maximum DVA value in DDE_GET_NDVAS()

The conditional was reversed which caused garbage values to be used when
calculating dds_ref_dsize.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes 7234

Reinstate raw receive check when truncating

This patch re-adds a check that was removed in 369aa50. The check
confirms that a raw receive is not occuring before truncating an
object's dn_maxblkid. At the time, it was believed that all cases
that would hit this code path would be handled in other places,
but that was not the case.

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8852
Closes #8857

Always call rw_init in zio_crypt_key_unwrap

The error path in zio_crypt_key_unwrap would call zio_crypt_key_destroy which
calls rw_destroy(&key->zk_salt_lock); which has not yet been initialized.

We move the rw_init() call to the start of zio_crypt_key_unwrap instead.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: David Quigley <david.quigley@intel.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Issue #5078
@lundman
Copy link
Author

lundman commented Jun 25, 2019

illumos@eb63303

Thank you all for the support, testing and patience. \o/

@lundman lundman closed this Jun 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.