Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: ZIL refactoring, ZIL Kinds, and ZIL-PMEM #12731

Open
wants to merge 58 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
705db79
DO NOT UPSTREAM FIX PROPERLY INSTEAD: scripts/zimport.sh hard-coded p…
problame Dec 11, 2020
1bbf7b5
DO NOT UPSTREAM devvm script
problame Aug 26, 2021
d6005ac
dsl_scan: make zil_scan_arg_t::zsa_zh a const pointer
problame Aug 30, 2021
54f74f2
fixup "ztest: propagate -o to the zdb child process"
problame May 27, 2021
5d5ef1f
vdev_disk_ops: comment to remind about vdev_file_ops for libspl builds
problame Aug 26, 2021
f4afa49
kmod init: move zfs_dbgmsg_init to an earlier point in the procedure
problame Jan 21, 2021
6d519e8
linux: comment error code convention during module initialization
problame Jul 13, 2021
2ae6f86
zil: comment on P2ROUNDUP(lr_length) for WR_NEED_COPY records
problame Jan 4, 2021
36344cc
zil: zil_alloc_lwb: make bp parameter const
problame Oct 13, 2020
ffd2f59
zio_rewrite: make bookmark parameter const
problame Oct 13, 2020
732d95d
set -Werror={incompatible-pointer-types,implicit-function-declaration}
problame Oct 13, 2020
f80525b
zil: comment on how we set the lwb's zil_chain_t::zc_next_blk checksu…
problame Oct 13, 2020
18fd4e4
move zill calls from zvol.c to zvol_log.c, rendering zvol.c free from…
problame Oct 16, 2020
63d5e76
document objset_t::os_zil_header
problame Sep 25, 2020
08ec9c1
zfs_get_data: separate helper functions for WR_{INDIRECT,NEED_COPY}
problame Jan 4, 2021
75e4992
zvol_get_data: separate helper functions for WR_{INDIRECT,NEED_COPY}
problame Mar 1, 2021
241e36a
[1/5] zil: factor out physical ZIL traversal into zil_parse_phys
problame Oct 13, 2020
09d8559
[2/5] dmu_traverse: use zil_parse_phys & only traverse ZIL on explici…
problame Oct 13, 2020
8c5d348
[3/5] dsl_scan: adopt zil_parse_phys API
problame Oct 14, 2020
0d5b9b9
[4/5] zdb: switch to using zil_parse_phys
problame Jan 6, 2021
e38d9fa
[5/5] zil: make zil_parse private to zil.c
problame Jan 6, 2021
a190087
make zilog_t* opaque for dsl_{pool,scan}.h
problame Oct 16, 2020
c9c2c99
[ZIL Kinds] move the bulk of LWB-specific code into zil_lwb.c
problame Aug 29, 2021
8a2cd3b
[ZIL Kinds] move comment on on-disk format into zil_lwb.c
problame Nov 7, 2021
cf0f5da
[ZIL Kinds] move lwb-specific types, constants, and zilog_t into zil_…
problame Oct 3, 2021
4a81ed2
[ZIL Kinds] rename zil_* to zillwb_* and leave stubs in place for pub…
problame Jan 5, 2021
901893f
[ZIL Kinds] rename zil_parse_phys to zillwb_parse_phys & remove stub …
problame Aug 30, 2021
02a542f
[ZIL Kinds] rename zil_chain_t => zillwb_chain_t
problame Jan 6, 2021
cd172b5
[ZIL Kinds] rename struct zilog{,_lwb} & typedef struct zilog_lwb zil…
problame Jan 6, 2021
e90bcbc
make header consumable by bindgen
problame Nov 7, 2021
0615bbf
[ZIL Kinds] add & use ZL_{OS,SPA,POOL,HDR} macros to access parent st…
problame Jan 6, 2021
480d3ca
[ZIL Kinds] rename and wrap zil_header_{,lwb_}t in a new struct zil_h…
problame Jan 6, 2021
3ca5c7c
[ZIL Kinds] rename various ZIL_* macros
problame Jan 6, 2021
94d43ae
add a mechanism for aliasing module parameters
problame Jan 7, 2021
19d14a1
[ZIL Kinds] rename LWB-specific module parameters from zil_* to zfs_z…
problame Jan 7, 2021
a0d3db4
[ZIL Kinds] rename LWB-specific tracepoints & dtrace probes
problame Aug 30, 2021
aa8b684
[ZIL Kinds] rename LWB-specific kstats
problame Aug 30, 2021
7665ff5
[ZIL Kinds] rename kmem_cache_t instances
problame Aug 30, 2021
fd6be20
[ZIL Kinds] rename zil_block_buckets
problame Aug 30, 2021
2261936
[ZIL Kinds] zvol: adjust ZIL usage to fit the state machine that will…
problame Mar 1, 2021
bb6c01c
[ZIL Kinds] zfs_create: always call zil_replaying, to fit the state m…
problame Aug 26, 2021
56a40fc
[ZIL Kinds] implement ZIL kinds
problame Oct 1, 2021
b7939ab
[ZIL-PMEM][independent] spl-kmem: function for allocating aligned memory
problame Jan 20, 2021
30c0787
[ZIL-PMEM][independent] spl: semaphore and spinlock wrappers
problame Jan 20, 2021
b354bf6
[ZIL-PMEM][independent] libspl: add ability to configure a custom abo…
problame Aug 26, 2021
7a865c2
[ZIL-PMEM] zil: make it optional for ZIL kinds to support WR_INDIRECT
problame Aug 28, 2021
15ea926
[ZIL-PMEM] initial implementation of ZIL-PMEM
problame Aug 26, 2021
a4807d5
[1/3] do not cast zio_cksum_t* to fletcher_4_ctx_t* for scalar impls
problame Feb 24, 2021
64b6830
[2/3] introduce zfs_kfpu_ctx_t, zfs_kfpu_{enter,exit} APIs
problame Feb 24, 2021
c51b42b
[3/3] fletcher: add fletcher_4_native_kfpu_ctx
problame Feb 24, 2021
093d251
[ZIL-PMEM] use zfs_kfpu_ctx_t API to reduce number of XSAVEs
problame Aug 28, 2021
f6d3271
[1/3] zfs_log: refactor zfs_log_write
problame Aug 26, 2021
7ee25db
[2/3] dmu: add dmu_write_uioandcc_dbuf() and dmu_write_uioandcc_dnode()
problame Aug 26, 2021
ba722b2
[3/3] zfs_log / zvol_log: avoid dmu_read in {zfs,zvol}_log_write
problame Feb 25, 2021
7571a93
[ZIL-PMEM][independent] add zfs_percpu_counter_stat (GPL only)
problame Aug 27, 2021
7e27706
[ZIL-PMEM] add static stats for the thesis evaluation (uses zfs_percp…
problame Aug 27, 2021
4eea7cf
[ZIL-PMEM] implement ITXG bypass mode for ZVOLs
problame Aug 27, 2021
9460f7d
[ZIL-PMEM] dtrace probe / tracepoint to track delta of task birth & s…
problame Aug 27, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
24 changes: 12 additions & 12 deletions cmd/arc_summary/arc_summary3
Expand Up @@ -66,7 +66,7 @@ SECTION_PATHS = {'arc': 'arcstats',
'l2arc': 'arcstats', # L2ARC stuff lives in arcstats
'vdev': 'vdev_cache_stats',
'zfetch': 'zfetchstats',
'zil': 'zil'}
'zil_lwb': 'zil_lwb'}

parser = argparse.ArgumentParser(description=DESCRIPTION)
parser.add_argument('-a', '--alternate', action='store_true', default=False,
Expand Down Expand Up @@ -898,24 +898,24 @@ def section_vdev(kstats_dict):
print()


def section_zil(kstats_dict):
def section_zil_lwb(kstats_dict):
"""Collect information on the ZFS Intent Log. Some of the information
taken from https://github.com/openzfs/zfs/blob/master/include/sys/zil.h
taken from https://github.com/openzfs/zfs/blob/master/include/sys/zil_lwb.h
"""

zil_stats = isolate_section('zil', kstats_dict)
zil_stats = isolate_section('zil_lwb', kstats_dict)

prt_1('ZIL committed transactions:',
f_hits(zil_stats['zil_itx_count']))
prt_i1('Commit requests:', f_hits(zil_stats['zil_commit_count']))
f_hits(zil_stats['zil_lwb_itx_count']))
prt_i1('Commit requests:', f_hits(zil_stats['zil_lwb_commit_count']))
prt_i1('Flushes to stable storage:',
f_hits(zil_stats['zil_commit_writer_count']))
f_hits(zil_stats['zil_lwb_commit_writer_count']))
prt_i2('Transactions to SLOG storage pool:',
f_bytes(zil_stats['zil_itx_metaslab_slog_bytes']),
f_hits(zil_stats['zil_itx_metaslab_slog_count']))
f_bytes(zil_stats['zil_lwb_itx_metaslab_slog_bytes']),
f_hits(zil_stats['zil_lwb_itx_metaslab_slog_count']))
prt_i2('Transactions to non-SLOG storage pool:',
f_bytes(zil_stats['zil_itx_metaslab_normal_bytes']),
f_hits(zil_stats['zil_itx_metaslab_normal_count']))
f_bytes(zil_stats['zil_lwb_itx_metaslab_normal_bytes']),
f_hits(zil_stats['zil_lwb_itx_metaslab_normal_count']))
print()


Expand All @@ -926,7 +926,7 @@ section_calls = {'arc': section_arc,
'spl': section_spl,
'tunables': section_tunables,
'vdev': section_vdev,
'zil': section_zil}
'zil_lwb': section_zil_lwb}


def main():
Expand Down
153 changes: 135 additions & 18 deletions cmd/zdb/zdb.c
Expand Up @@ -58,7 +58,7 @@
#include <sys/dsl_pool.h>
#include <sys/dsl_bookmark.h>
#include <sys/dbuf.h>
#include <sys/zil.h>
#include <sys/zil_lwb.h>
#include <sys/zil_impl.h>
#include <sys/stat.h>
#include <sys/resource.h>
Expand Down Expand Up @@ -3812,8 +3812,12 @@ dump_objset(objset_t *os)
return;
}

if (dump_opt['i'] != 0 || verbosity >= 2)
dump_intent_log(dmu_objset_zil(os));
if (dump_opt['i'] != 0 || verbosity >= 2) {
EQUIV((os->os_dsl_dataset != NULL), (os->os_zil != NULL));
zilog_t *zilog = dmu_objset_zil(os);
if (zilog != NULL)
dump_intent_log(zilog);
}

if (dmu_objset_ds(os) != NULL) {
dsl_dataset_t *ds = dmu_objset_ds(os);
Expand Down Expand Up @@ -5221,17 +5225,14 @@ dump_size_histograms(zdb_cb_t *zcb)
}

static void
zdb_count_block(zdb_cb_t *zcb, zilog_t *zilog, const blkptr_t *bp,
zdb_count_block(zdb_cb_t *zcb, const blkptr_t *bp,
dmu_object_type_t type)
{
uint64_t refcnt = 0;
int i;

ASSERT(type < ZDB_OT_TOTAL);

if (zilog && zil_bp_tree_add(zilog, bp) != 0)
return;

spa_config_enter(zcb->zcb_spa, SCL_CONFIG, FTAG, RW_READER);

for (i = 0; i < 4; i++) {
Expand Down Expand Up @@ -5404,8 +5405,8 @@ zdb_blkptr_done(zio_t *zio)
}

static int
zdb_blkptr_cb(spa_t *spa, zilog_t *zilog, const blkptr_t *bp,
const zbookmark_phys_t *zb, const dnode_phys_t *dnp, void *arg)
zdb_blkptr_cb(spa_t *spa, const blkptr_t *bp, const zbookmark_phys_t *zb,
const dnode_phys_t *dnp, void *arg)
{
zdb_cb_t *zcb = arg;
dmu_object_type_t type;
Expand All @@ -5431,8 +5432,7 @@ zdb_blkptr_cb(spa_t *spa, zilog_t *zilog, const blkptr_t *bp,

type = BP_GET_TYPE(bp);

zdb_count_block(zcb, zilog, bp,
(type & DMU_OT_NEWTYPE) ? ZDB_OT_OTHER : type);
zdb_count_block(zcb, bp, (type & DMU_OT_NEWTYPE) ? ZDB_OT_OTHER : type);

is_metadata = (BP_GET_LEVEL(bp) != 0 || DMU_OT_IS_METADATA(type));

Expand Down Expand Up @@ -5698,7 +5698,7 @@ zdb_ddt_leak_init(spa_t *spa, zdb_cb_t *zcb)
ddt_bp_create(ddb.ddb_checksum,
&dde.dde_key, ddp, &blk);
if (p == DDT_PHYS_DITTO) {
zdb_count_block(zcb, NULL, &blk, ZDB_OT_DITTO);
zdb_count_block(zcb, &blk, ZDB_OT_DITTO);
} else {
zcb->zcb_dedup_asize +=
BP_GET_ASIZE(&blk) * (ddp->ddp_refcnt - 1);
Expand Down Expand Up @@ -6241,7 +6241,7 @@ count_block_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx)
(void) printf("[%s] %s\n",
"deferred free", blkbuf);
}
zdb_count_block(zcb, NULL, bp, ZDB_OT_DEFERRED);
zdb_count_block(zcb, bp, ZDB_OT_DEFERRED);
return (0);
}

Expand Down Expand Up @@ -6339,6 +6339,112 @@ deleted_livelists_dump_mos(spa_t *spa)
iterate_deleted_livelists(spa, dump_livelist_cb, NULL);
}

typedef struct {
zdb_cb_t *zcb;
uint64_t claim_txg;
uint64_t objset;
spa_t *spa;
} dump_block_stats_arg_t;

static int
dump_block_stats_zillwb_cb_block(const blkptr_t *bp, void *arg)
{
dump_block_stats_arg_t *dbsa = arg;
zbookmark_phys_t zb;
uint64_t claim_txg = dbsa->claim_txg;

if (BP_IS_HOLE(bp))
return (0);

if (claim_txg == 0 && bp->blk_birth >= spa_min_claim_txg(dbsa->spa))
return (-1);

SET_BOOKMARK(&zb, dbsa->objset, ZB_ZIL_OBJECT, ZB_ZIL_LEVEL,
bp->blk_cksum.zc_word[ZILLWB_ZC_SEQ]);

(void) zdb_blkptr_cb(dbsa->spa, bp, &zb, NULL, dbsa->zcb);

return (0);
}

static int
dump_block_stats_zillwb_cb_record(const lr_t *lrc, void *arg)
{
dump_block_stats_arg_t *dbsa = arg;
uint64_t claim_txg = dbsa->claim_txg;

if (lrc->lrc_txtype == TX_WRITE) {
lr_write_t *lr = (lr_write_t *)lrc;
blkptr_t *bp = &lr->lr_blkptr;
zbookmark_phys_t zb;

if (BP_IS_HOLE(bp))
return (0);

if (claim_txg == 0 || bp->blk_birth < claim_txg)
return (0);

SET_BOOKMARK(&zb, dbsa->objset, lr->lr_foid,
ZB_ZIL_LEVEL, lr->lr_offset / BP_GET_LSIZE(bp));

(void) zdb_blkptr_cb(dbsa->spa, bp, &zb, NULL, dbsa->zcb);
}

return (0);
}

static int
dump_block_stats_zillwb(spa_t *spa, uint64_t objset,
const zil_header_lwb_t *zh, void *arg)
{
zdb_cb_t *zcb = arg;
uint64_t claim_txg = zh->zh_claim_txg;

/*
* We only want to visit blocks that have been claimed but not yet
* replayed; plus blocks that are already stable in read-only mode.
*/
if (claim_txg == 0 && spa_writeable(spa))
return (0);

dump_block_stats_arg_t dbsa = {
.zcb = zcb,
.claim_txg = claim_txg,
.objset = objset,
.spa = spa,
};

zillwb_parse_phys(spa, zh, dump_block_stats_zillwb_cb_block,
dump_block_stats_zillwb_cb_record, &dbsa, B_FALSE,
ZIO_PRIORITY_SCRUB, NULL);

return (0);
}

static int
dump_block_stats_zil_header_cb(spa_t *spa, uint64_t objset,
const zil_header_t *zh, void *arg)
{
zh_kind_t kind = 0;
void const *zhk = NULL;
size_t zhk_size = 0;
VERIFY0(zil_kind_specific_data_from_header(spa, zh, &zhk, &zhk_size, NULL, &kind));
switch (kind) {
case ZIL_KIND_LWB:
VERIFY3S(zhk_size, ==, sizeof (zil_header_lwb_t));
return dump_block_stats_zillwb(spa, objset, zhk, arg);
case ZIL_KIND_PMEM:
VERIFY3S(zhk_size, ==, sizeof (zil_header_pmem_t));
return (0); /* no block pointers */
case ZIL_KIND_UNINIT:
/* fallthrough */
case ZIL_KIND_COUNT:
panic("unreachable: zil_kind=%s",
zil_kind_to_str(kind, NULL));
}
panic("unreachable");
}

static int
dump_block_stats(spa_t *spa)
{
Expand Down Expand Up @@ -6403,8 +6509,17 @@ dump_block_stats(spa_t *spa)
zcb.zcb_totalasize += metaslab_class_get_alloc(spa_dedup_class(spa));
zcb.zcb_totalasize +=
metaslab_class_get_alloc(spa_embedded_log_class(spa));
VERIFY0(metaslab_class_get_alloc(spa_exempt_class(spa)));
zcb.zcb_start = zcb.zcb_lastprint = gethrtime();
err = traverse_pool(spa, 0, flags, zdb_blkptr_cb, &zcb);

err = traverse_pool_no_zil(spa, 0, flags, zdb_blkptr_cb, &zcb);
/*
* TODO
* traversal errors have been ignored before we split traverse_pool
* into traverse_pool_{no_zil,zil_headers}. Maintain that behavior.
*/
(void) traverse_pool_zil_headers(spa, 0, flags,
dump_block_stats_zil_header_cb, &zcb);

/*
* If we've traversed the data blocks then we need to wait for those
Expand Down Expand Up @@ -6454,6 +6569,7 @@ dump_block_stats(spa_t *spa)
metaslab_class_get_alloc(spa_special_class(spa)) +
metaslab_class_get_alloc(spa_dedup_class(spa)) +
get_unflushed_alloc_space(spa);
VERIFY0(metaslab_class_get_alloc(spa_exempt_class(spa)));
total_found = tzb->zb_asize - zcb.zcb_dedup_asize +
zcb.zcb_removing_size + zcb.zcb_checkpoint_size;

Expand Down Expand Up @@ -6691,8 +6807,8 @@ typedef struct zdb_ddt_entry {

/* ARGSUSED */
static int
zdb_ddt_add_cb(spa_t *spa, zilog_t *zilog, const blkptr_t *bp,
const zbookmark_phys_t *zb, const dnode_phys_t *dnp, void *arg)
zdb_ddt_add_cb(spa_t *spa, const blkptr_t *bp, const zbookmark_phys_t *zb,
const dnode_phys_t *dnp, void *arg)
{
avl_tree_t *t = arg;
avl_index_t where;
Expand Down Expand Up @@ -6748,8 +6864,9 @@ dump_simulated_ddt(spa_t *spa)

spa_config_enter(spa, SCL_CONFIG, FTAG, RW_READER);

(void) traverse_pool(spa, 0, TRAVERSE_PRE | TRAVERSE_PREFETCH_METADATA |
TRAVERSE_NO_DECRYPT, zdb_ddt_add_cb, &t);
(void) traverse_pool_no_zil(spa, 0,
TRAVERSE_PRE | TRAVERSE_PREFETCH_METADATA | TRAVERSE_NO_DECRYPT,
zdb_ddt_add_cb, &t);

spa_config_exit(spa, SCL_CONFIG, FTAG);

Expand Down