Skip to content

Commit

Permalink
account for ashift when gathering buffers to be written to l2arc device
Browse files Browse the repository at this point in the history
The problem is that since OpenSolaris commit
illumos/illumos-gate@e14bb32
l2ad_hand is kept aligned based on ashift (which is derived from
the cache device's logical and physical block sizes).
So, the hand could be advanced by more than the sum of b_asize-s
of the written L2ARC buffers. This is because b_asize is a misnomer
at the moment as it does not always represent the allocated size:
if a buffer is compressed, then the compressed size is properly rounded,
but if the compression fails or it is not applied, then the original
size is kept and it could be smaller than what ashift requires.

For the same reasons arcstat_l2_asize and the reported used space
on the cache device could be smaller than the actual allocated size
if ashift > 9. That problem is not fixed by this change.

This change only ensures that l2ad_hand is not advanced by more
than target_sz. Otherwise we would overwrite active (unevicted)
L2ARC buffers. That problem is manifested as growing l2_cksum_bad
and l2_io_error counters.

This change also changes 'p' prefix to 'a' prefix in a few places
where variables represent allocated rather than physical size.

The resolved problem may also result in the reported allocated size
being greater than the cache device's capacity, because of the
overwritten buffers (more than one buffer claiming the same disk
space).

PR:	198242
PR:	195746 (possibly related)

Porting notes:

Rather difficult to track changes related to:

Illumos 5369 - arc flags should be an enum
and
Illumos 5408 - managing ZFS cache devices requires lots of RAM

hdr->b_l2hdr = l2hdr;
changed to
hdr->b_flags |= ARC_FLAG_HAS_L2HDR;

list_insert_head(dev->l2ad_buflist, hdr);
changed to
list_insert_head(&dev->l2ad_buflist, hdr);

References:
https://reviews.freebsd.org/D2764
openzfs#3400
openzfs#3433
openzfs#3451

Ported by: kernelOfTruth <kerneloftruth@gmail.com>
  • Loading branch information
avg-I authored and kernelOfTruth committed Jun 13, 2015
1 parent 06358ea commit a5041e1
Showing 1 changed file with 35 additions and 12 deletions.
47 changes: 35 additions & 12 deletions module/zfs/arc.c
Original file line number Diff line number Diff line change
Expand Up @@ -5723,8 +5723,7 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz,
boolean_t *headroom_boost)
{
arc_buf_hdr_t *hdr, *hdr_prev, *head;
uint64_t write_asize, write_psize, write_sz, headroom,
buf_compress_minsz;
uint64_t write_asize, write_sz, headroom, buf_compress_minsz;
void *buf_data;
boolean_t full;
l2arc_write_callback_t *cb;
Expand All @@ -5739,7 +5738,7 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz,
*headroom_boost = B_FALSE;

pio = NULL;
write_sz = write_asize = write_psize = 0;
write_sz = write_asize = 0;
full = B_FALSE;
head = kmem_cache_alloc(hdr_l2only_cache, KM_PUSHPAGE);
head->b_flags |= ARC_FLAG_L2_WRITE_HEAD;
Expand Down Expand Up @@ -5776,6 +5775,7 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz,
for (; hdr; hdr = hdr_prev) {
kmutex_t *hash_lock;
uint64_t buf_sz;
uint64_t buf_a_sz;

if (arc_warm == B_FALSE)
hdr_prev = multilist_sublist_next(mls, hdr);
Expand Down Expand Up @@ -5804,7 +5804,15 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz,
continue;
}

if ((write_sz + hdr->b_size) > target_sz) {
/*
* Assume that the buffer is not going to be compressed
* and could take more space on disk because of a larger
* disk block size.
*/
buf_sz = hdr->b_size;
buf_a_sz = vdev_psize_to_asize(dev->l2ad_vdev, buf_sz);

if ((write_asize + buf_a_sz) > target_sz) {
full = B_TRUE;
mutex_exit(hash_lock);
break;
Expand Down Expand Up @@ -5847,7 +5855,6 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz,
hdr->b_l2hdr.b_hits = 0;
hdr->b_l1hdr.b_tmp_cdata = hdr->b_l1hdr.b_buf->b_data;

buf_sz = hdr->b_size;
hdr->b_flags |= ARC_FLAG_HAS_L2HDR;

mutex_enter(&dev->l2ad_mtx);
Expand All @@ -5864,6 +5871,7 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz,
mutex_exit(hash_lock);

write_sz += buf_sz;
write_asize += buf_a_sz;
}

multilist_sublist_unlock(mls);
Expand All @@ -5882,6 +5890,20 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz,

mutex_enter(&dev->l2ad_mtx);

/*
* Note that elsewhere in this file arcstat_l2_asize
* and the used space on l2ad_vdev are updated using b_asize,
* which is not necessarily rounded up to the device block size.
* Too keep accounting consistent we do the same here as well:
* stats_size accumulates the sum of b_asize of the written buffers,
* while write_asize accumulates the sum of b_asize rounded up
* to the device block size.
* The latter sum is used only to validate the corectness of the code.
*/
uint64_t stats_size;
stats_size = 0;
write_asize = 0;

/*
* Now start writing the buffers. We're starting at the write head
* and work backwards, retracing the course of the buffer selector
Expand Down Expand Up @@ -5928,7 +5950,7 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz,

/* Compression may have squashed the buffer to zero length. */
if (buf_sz != 0) {
uint64_t buf_p_sz;
uint64_t buf_a_sz;

wzio = zio_write_phys(pio, dev->l2ad_vdev,
dev->l2ad_hand, buf_sz, buf_data, ZIO_CHECKSUM_OFF,
Expand All @@ -5939,13 +5961,14 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz,
zio_t *, wzio);
(void) zio_nowait(wzio);

write_asize += buf_sz;
stats_size += buf_sz;

/*
* Keep the clock hand suitably device-aligned.
*/
buf_p_sz = vdev_psize_to_asize(dev->l2ad_vdev, buf_sz);
write_psize += buf_p_sz;
dev->l2ad_hand += buf_p_sz;
buf_a_sz = vdev_psize_to_asize(dev->l2ad_vdev, buf_sz);
write_asize += buf_a_sz;
dev->l2ad_hand += buf_a_sz;
}
}

Expand All @@ -5955,8 +5978,8 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz,
ARCSTAT_BUMP(arcstat_l2_writes_sent);
ARCSTAT_INCR(arcstat_l2_write_bytes, write_asize);
ARCSTAT_INCR(arcstat_l2_size, write_sz);
ARCSTAT_INCR(arcstat_l2_asize, write_asize);
vdev_space_update(dev->l2ad_vdev, write_asize, 0, 0);
ARCSTAT_INCR(arcstat_l2_asize, stats_size);
vdev_space_update(dev->l2ad_vdev, stats_size, 0, 0);

/*
* Bump device hand to the device start if it is approaching the end.
Expand Down

0 comments on commit a5041e1

Please sign in to comment.