Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[superseded by #3451][RFC, l2arc accounting & checksum issue] l2arc-write-target-size.diff , cumulative patch #3433

Conversation

kernelOfTruth
Copy link
Contributor

l2arc-write-target-size.diff
suggested by Andriy Gapon ( @avg-I ) on the FreeBSD mailing list as a fix

which is a cumulation of the following patches:

consistently use asize for allocated size

In the context of l2arc code we not need to explicitly use psize at all.
Logical size is used to track actual buffer size. Allocated size is
used to track disk space and offsets. Physical (compressed) size is
not needed.

l2arc_compress_buf: zero buffer tail only if compression succeeds

The compression is considered successful if the size of compressed
data is less than the original data size after rounding up to
into account the vdev ashift. It does not make sense to have the
data compressed if all savings are lost to alignment.

This change also prevents a potential memory corruption if the
rounded compressed size is greater than the original size as
previously we would zero out the buffer tail before checking for
the condition.

A possible corruption scenario.
Original size is 10KB, so the data is stored in a 10KB buffer.
Thus, a 10KB buffer would also be allocated for the compressed data.
Let's assume that the data gets compressed to 8KB + 1B size
(reduction by almost 20%). Let's further assume that the ashift is 12.
So the compressed size would be rounded up to 12KB. The code would
zeroe out the trailing part of the buffer, so it would overwrite 2KB
beyond the end of the buffer.

l2arc_compress_buf: remove redundant check of an unsigned value

l2arc: restore correct rounding up of asize of compressed data

This rounding up was lost in a mismerge of illumos code.
See
r268075 MFV r267565
f95fd16
illumos/illumos-gate@5d7b4d4
It was originally introduced in r256889, 1c55b38

Revert "Transform the I/O when vdev_physical_ashift is greater than"

This reverts commit c9e947a.

Physical zio must not be silently modified.

account for ashift when choosing buffers to be written to l2arc device

If we don't account for that, then we might end up overwriting disk
area of buffers that have not been evicted yet, because l2arc_evict
operates in terms of disk addresses.

References:
freebsd/freebsd-src@master...avg-I:review/l2arc-write-target-size
http://lists.freebsd.org/pipermail/freebsd-fs/2014-October/020242.html
[needs a more extensive list]

Ported-by:
kernelOfTruth kernelOfTruth@gmail.com

Fixes (supposedly):
zfsonlinux#3114
zfsonlinux#3400
[a potentially several others]

@kernelOfTruth kernelOfTruth changed the title [test/experimental, l2arc accounting & checksum issue] l2arc-write-target-size.diff [test/experimental, l2arc accounting & checksum issue] l2arc-write-target-size.diff , cumulative patch May 20, 2015
@sempervictus
Copy link
Contributor

Thanks as always @kernelOfTruth.
It builds, i can tell you that much.
Aside from my personal host, i'm having some trouble testing this as it collides with abd_next and #3216 pretty badly.
Any chance i could ask you to do a new conjoined PR consisting of abd_next, #3216, and this PR? I could then put it into the full testing rotation @ SVIT without breaking our current evals of #3115 or the pretty much required ABD stack for our workhorses and production systems.

kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this pull request May 21, 2015
adapted to openzfs#3216,

adaption to openzfs#2129 in
@ l2arc_compress_buf(l2arc_buf_hdr_t *l2hdr)

 		/*
 		 * Compression succeeded, we'll keep the cdata around for
 		 * writing and release it afterwards.
 		 */
+		if (rounded > csize) {
+			bzero((char *)cdata + csize, rounded - csize);
+			csize = rounded;
+		}

to

		/*
		 * Compression succeeded, we'll keep the cdata around for
		 * writing and release it afterwards.
		 */
		if (rounded > csize) {
			abd_zero_off(cdata, rounded - csize, csize);
			csize = rounded;
		}

ZFSonLinux:
openzfs#3114
openzfs#3400
openzfs#3433
@kernelOfTruth
Copy link
Contributor Author

@sempervictus pushed to #3216

edit:

also pushed to https://github.com/kernelOfTruth/zfs/commits/tuxoko_zfs/abd_next_19.05.2015%2B3433

only one patch should be on top compared to current abd_next:
kernelOfTruth@2fd7e15

Take the buildbot results with a grain of salt since they, according to @behlendorf , currently don't cover L2ARC testing

So it is advised to give this some preliminary non-production beating & test-runs,

perhaps you could also do a checksum check (zdb & sha256sum) of pretty large files (e.g. larger than the ARC size - if available) - at the beginning of the tests when the files are on the zpool & vdevs, then after the extensive testing - just to be sure that the files are intact.

The FreeBSD, NetBSD, FreeNAS bug-tracker & mailing-list entries talk about checksum errors on the L2ARC after all - that false data should be dropped from the transfers - but still: better safe ...

kernelOfTruth referenced this pull request in kernelOfTruth/zfs May 22, 2015
adapted to abd_next (May 19th 2015)

 		/*
 		 * Compression succeeded, we'll keep the cdata around for
 		 * writing and release it afterwards.
 		 */
+		if (rounded > csize) {
+			bzero((char *)cdata + csize, rounded - csize);
+			csize = rounded;
+		}

to

		/*
		 * Compression succeeded, we'll keep the cdata around for
		 * writing and release it afterwards.
		 */
		if (rounded > csize) {
			abd_zero_off(cdata, rounded - csize, csize);
			csize = rounded;
		}

ZFSonLinux:
zfsonlinux#3114
zfsonlinux#3400
zfsonlinux#3433
@sempervictus
Copy link
Contributor

Running the following stack on my test host presently:

  * origin/pr/3308
  ** 3.12 compat, 4.0 compat for super_operations shrinker callbacks
  * origin/pr/3421
  ** Kernel 4.1 changes
  * origin/pr/3396
  ** Illumos 5269 - zpool import slow
  * origin/pr/2012
  ** Add option to zpool status to print guids
  * origin/pr/2557
  ** Added version command to zpool program
  * origin/pr/2668
  ** Allow for "zfs receive" to skip existing snapshots
  * origin/pr/2784
  ** Illumos #4950 files sometimes can't be removed from a full filesystem
  * origin/pr/3166
  ** Make linking with and finding libblkid required
  * origin/pr/3169
  ** Add dfree_zfs for changing how Samba reports space
  * origin/pr/3307
  ** zfsdev_getminor() should check for invalid file handles
  * remotes/kernelOfTruth/tuxoko_zfs/abd_next_19.05.2015+3433
  ** l2arc-write-target-size.diff
  * master @ 65037d9b25c2bfa98d0aa5c9e34678127c03b345

I've wrapped around L2ARC once already, and its working properly.
I skimmed the PR, but will need to look at the sources in context - i cant glean from the changes why in the world this would work suddenly. Still pushing data for a second wraparound, should know by morning.

@sempervictus
Copy link
Contributor

Looks like the L2ARC is holding up. However, ARC itself isn't doing so well:

    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  
13:32:48     0     0      0     0    0     0    0     0    0   6.1G  2.3G  
root@zfs-test00:/tmp# cat /sys/module/zfs/parameters/zfs_arc_max
4294967296
root@zfs-test00:/tmp# cat /proc/spl/kstat/zfs/arcstats | grep size
size                            4    6549124784
hdr_size                        4    69960
data_size                       4    0
meta_size                       4    4935680
other_size                      4    1342728
anon_size                       4    2115584
mru_size                        4    808960
mru_ghost_size                  4    0
mfu_size                        4    2011136
mfu_ghost_size                  4    0
l2_size                         4    153742000128
l2_asize                        4    24358981632
l2_hdr_size                     4    6542776416
duplicate_buffers_size          4    0

Looks like the L2 headers are overrunning the ARC cache limits and OOMing the VMs running from the local ZVOLs. Not ideal, but at least the L2ARCs aren't showing insane capacities anymore.

EDIT: exporting the zpool results in arc_adapt pegging a core while freeing memory for ~20s. I'm assuming this is due to the L2 headers being purged.

@kernelOfTruth
Copy link
Contributor Author

@sempervictus thanks a lot for your report !

Okay, so now that L2ARC is working as intended ™ (I can't see where that regression might have come from related to that patchset - will have to look again later, when preparing those patches bit-by-bit),

it's time to fix everything up - then ZFS should be ready to take off =)

That huge memory consumption was supposed to be fixed by "Illumos - 5408 managing ZFS cache devices requires lots of RAM", I'll take a look at Illumos-gate whether I can find more fixes

Alright according to #3433 (comment) that patch stack, which you ran for this test, doesn't include the above mentioned fix from Illumos 5408.

@sempervictus Could you please give the patch stack from #3216 a test ?

That supposedly would show much more promising results related to the OOMing and ARC cache tripping over limits

and gives the confirmation that this patchset from @avg-I (thanks a lot !) in all cases is needed for the huge upcoming changes (ABD + #3115 ) and milestone (0.6.5 ? 0.7.0 ?) to have optimum stability and performance.

In any case I'll prepare a pull request with broken-out commits, patches ...

Following your EDIT: I'm sure it's related - in the past when my L2ARC was full (due to a test close to 100 GiB l2arc with 16 GiB RAM) and needed to be purged there also was some big load on the CPU

Thanks !

@sempervictus
Copy link
Contributor

Thanks @kernelOfTruth, this is starting to come together.
I have noticed something disturbing from the patch stack however - l2arc_adapt is consuming more CPU than any of the VMs except one (large JVM procs and MySQL atop that one). The IOWait is growing commensurately with the CPU usage by the adapt thread, with 20% on the physical host registering as 70% (!!!) in the VMs. The heavy VM with the DB and JVM procs died from timeouts :(.

I'll build out #3216, but with large block having been adopted one some hosts, i can't test without destroying their pools.

I noticed @dweeezil pushed more commits to #3115, hopefully we can get that rebased against the large block feature and abd_next to create a stable stack which can be tested across current ZFS hosts.

Thanks again, will post more results as i get 'em.

@sempervictus
Copy link
Contributor

Built a basic stack off of #3216:

  * origin/pr/2557
  ** Added version command to zpool program
  * origin/pr/3169
  ** Add dfree_zfs for changing how Samba reports space
  * origin/pr/3166
  ** Make linking with and finding libblkid required
  * origin/pr/3307
  ** zfsdev_getminor() should check for invalid file handles
  * origin/pr/2668
  ** Allow for "zfs receive" to skip existing snapshots
  * origin/pr/2012
  ** Add option to zpool status to print guids
  * master @ 65037d9b25c2bfa98d0aa5c9e34678127c03b345

Results are promising:

size                            4    4306210160
hdr_size                        4    284803296
data_size                       4    2625819648
meta_size                       4    103254016
other_size                      4    111221872
anon_size                       4    327857152
mru_size                        4    41394176
mru_ghost_size                  4    4263519744
mfu_size                        4    2359822336
mfu_ghost_size                  4    21323776
l2_size                         4    52562030592
l2_asize                        4    9820920320
l2_hdr_size                     4    1181111328
duplicate_buffers_size          4    0
root@zfs-test00:~# arcstat 
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  
00:24:11     0     0      0     0    0     0    0     0    0   4.0G  4.0G

Will take a while to wrap around the L2ARC. I'll push a deep find or something to try and force more data into the ARC. If this holds up on the bonnie VM through the night i'll see about pushing to one of our SCST hosts since a couple are running sans L2ARC right now.

@avg-I
Copy link
Contributor

avg-I commented May 27, 2015

I think that the changes to l2arc_write_buffers() might not be entirely correct.
The version of the patch found in this review request is probably more correct.
Note arcstat_l2_asize and vdev_space_update deltas.

The problem is that elsewhere throughout the ARC code those statistics are updated based on b_asize from the L2 header. b_asize is not entirely correct name because its value does not account for ashift if the buffer is not compressed.

Changing b_asize to really be allocated size (rather than physical size as it is now) is a lot of work for which I didn't have time.
Another possibility is to change all the places were arcstat_l2_asize and vdev_space_update are updated, so that b_asize is passed through vdev_psize_to_asize(). I didn't do that either.
Ultimately, I decided to keep the current meaning of those two statistics. That meaning is not correct, but it's historically acceptable.

To summarize: the current code in this pull request updates arcstat_l2_asize and vdev_space_update inconsistently, so the stats would grow over time. I am not sure what would be a good test to observe that.

@kernelOfTruth
Copy link
Contributor Author

@avg-I Thank you very much for your detailed explanation !

I've pushed a pull-request to let the build-bots crunch on it & to have the commit available for testing

Now waiting to hear back from the upstream review ...

via @avg-I :

Is it possible to make me an owner of this request?
https://reviews.csiden.org/r/112/

CC: @skiselkov , @ahrens , @buffyg

@kernelOfTruth kernelOfTruth changed the title [test/experimental, l2arc accounting & checksum issue] l2arc-write-target-size.diff , cumulative patch [superseded][RFC, l2arc accounting & checksum issue] l2arc-write-target-size.diff , cumulative patch Jun 9, 2015
@kernelOfTruth kernelOfTruth changed the title [superseded][RFC, l2arc accounting & checksum issue] l2arc-write-target-size.diff , cumulative patch [superseded by #3451][RFC, l2arc accounting & checksum issue] l2arc-write-target-size.diff , cumulative patch Jun 9, 2015
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this pull request Jun 12, 2015
The problem is that since OpenSolaris commit
illumos/illumos-gate@e14bb32
l2ad_hand is kept aligned based on ashift (which is derived from
the cache device's logical and physical block sizes).
So, the hand could be advanced by more than the sum of b_asize-s
of the written L2ARC buffers. This is because b_asize is a misnomer
at the moment as it does not always represent the allocated size:
if a buffer is compressed, then the compressed size is properly rounded,
but if the compression fails or it is not applied, then the original
size is kept and it could be smaller than what ashift requires.

For the same reasons arcstat_l2_asize and the reported used space
on the cache device could be smaller than the actual allocated size
if ashift > 9. That problem is not fixed by this change.

This change only ensures that l2ad_hand is not advanced by more
than target_sz. Otherwise we would overwrite active (unevicted)
L2ARC buffers. That problem is manifested as growing l2_cksum_bad
and l2_io_error counters.

This change also changes 'p' prefix to 'a' prefix in a few places
where variables represent allocated rather than physical size.

The resolved problem may also result in the reported allocated size
being greater than the cache device's capacity, because of the
overwritten buffers (more than one buffer claiming the same disk
space).

PR:	198242
PR:	195746 (possibly related)

Porting notes:

Rather difficult to track changes related to:

Illumos 5369 - arc flags should be an enum
and
Illumos 5408 - managing ZFS cache devices requires lots of RAM

hdr->b_l2hdr = l2hdr;
changed to
hdr->b_flags |= ARC_FLAG_HAS_L2HDR;

list_insert_head(dev->l2ad_buflist, hdr);
changed to
list_insert_head(&dev->l2ad_buflist, hdr);

References:
https://reviews.freebsd.org/D2764
openzfs#3400
openzfs#3433
openzfs#3451

Ported by: kernelOfTruth <kerneloftruth@gmail.com>
@kernelOfTruth
Copy link
Contributor Author

superseded by

#3451
#3491

kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this pull request Jun 13, 2015
The problem is that since OpenSolaris commit
illumos/illumos-gate@e14bb32
l2ad_hand is kept aligned based on ashift (which is derived from
the cache device's logical and physical block sizes).
So, the hand could be advanced by more than the sum of b_asize-s
of the written L2ARC buffers. This is because b_asize is a misnomer
at the moment as it does not always represent the allocated size:
if a buffer is compressed, then the compressed size is properly rounded,
but if the compression fails or it is not applied, then the original
size is kept and it could be smaller than what ashift requires.

For the same reasons arcstat_l2_asize and the reported used space
on the cache device could be smaller than the actual allocated size
if ashift > 9. That problem is not fixed by this change.

This change only ensures that l2ad_hand is not advanced by more
than target_sz. Otherwise we would overwrite active (unevicted)
L2ARC buffers. That problem is manifested as growing l2_cksum_bad
and l2_io_error counters.

This change also changes 'p' prefix to 'a' prefix in a few places
where variables represent allocated rather than physical size.

The resolved problem may also result in the reported allocated size
being greater than the cache device's capacity, because of the
overwritten buffers (more than one buffer claiming the same disk
space).

PR:	198242
PR:	195746 (possibly related)

Porting notes:

Rather difficult to track changes related to:

Illumos 5369 - arc flags should be an enum
and
Illumos 5408 - managing ZFS cache devices requires lots of RAM

hdr->b_l2hdr = l2hdr;
changed to
hdr->b_flags |= ARC_FLAG_HAS_L2HDR;

list_insert_head(dev->l2ad_buflist, hdr);
changed to
list_insert_head(&dev->l2ad_buflist, hdr);

References:
https://reviews.freebsd.org/D2764
openzfs#3400
openzfs#3433
openzfs#3451

Ported by: kernelOfTruth <kerneloftruth@gmail.com>
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this pull request Jun 13, 2015
The problem is that since OpenSolaris commit
illumos/illumos-gate@e14bb32
l2ad_hand is kept aligned based on ashift (which is derived from
the cache device's logical and physical block sizes).
So, the hand could be advanced by more than the sum of b_asize-s
of the written L2ARC buffers. This is because b_asize is a misnomer
at the moment as it does not always represent the allocated size:
if a buffer is compressed, then the compressed size is properly rounded,
but if the compression fails or it is not applied, then the original
size is kept and it could be smaller than what ashift requires.

For the same reasons arcstat_l2_asize and the reported used space
on the cache device could be smaller than the actual allocated size
if ashift > 9. That problem is not fixed by this change.

This change only ensures that l2ad_hand is not advanced by more
than target_sz. Otherwise we would overwrite active (unevicted)
L2ARC buffers. That problem is manifested as growing l2_cksum_bad
and l2_io_error counters.

This change also changes 'p' prefix to 'a' prefix in a few places
where variables represent allocated rather than physical size.

The resolved problem may also result in the reported allocated size
being greater than the cache device's capacity, because of the
overwritten buffers (more than one buffer claiming the same disk
space).

PR:	198242
PR:	195746 (possibly related)

Porting notes:

Rather difficult to track changes related to:

Illumos 5369 - arc flags should be an enum
and
Illumos 5408 - managing ZFS cache devices requires lots of RAM

hdr->b_l2hdr = l2hdr;
changed to
hdr->b_flags |= ARC_FLAG_HAS_L2HDR;

list_insert_head(dev->l2ad_buflist, hdr);
changed to
list_insert_head(&dev->l2ad_buflist, hdr);

References:
https://reviews.freebsd.org/D2764
openzfs#3400
openzfs#3433
openzfs#3451

Ported by: kernelOfTruth <kerneloftruth@gmail.com>
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this pull request Jun 14, 2015
The problem is that since OpenSolaris commit
illumos/illumos-gate@e14bb32
l2ad_hand is kept aligned based on ashift (which is derived from
the cache device's logical and physical block sizes).
So, the hand could be advanced by more than the sum of b_asize-s
of the written L2ARC buffers. This is because b_asize is a misnomer
at the moment as it does not always represent the allocated size:
if a buffer is compressed, then the compressed size is properly rounded,
but if the compression fails or it is not applied, then the original
size is kept and it could be smaller than what ashift requires.

For the same reasons arcstat_l2_asize and the reported used space
on the cache device could be smaller than the actual allocated size
if ashift > 9. That problem is not fixed by this change.

This change only ensures that l2ad_hand is not advanced by more
than target_sz. Otherwise we would overwrite active (unevicted)
L2ARC buffers. That problem is manifested as growing l2_cksum_bad
and l2_io_error counters.

This change also changes 'p' prefix to 'a' prefix in a few places
where variables represent allocated rather than physical size.

The resolved problem may also result in the reported allocated size
being greater than the cache device's capacity, because of the
overwritten buffers (more than one buffer claiming the same disk
space).

PR:	198242
PR:	195746 (possibly related)

Porting notes:

Rather difficult to track changes related to:

Illumos 5369 - arc flags should be an enum
and
Illumos 5408 - managing ZFS cache devices requires lots of RAM

hdr->b_l2hdr = l2hdr;
changed to
hdr->b_flags |= ARC_FLAG_HAS_L2HDR;

list_insert_head(dev->l2ad_buflist, hdr);
changed to
list_insert_head(&dev->l2ad_buflist, hdr);

Account for the error message:
error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
uint64_t stats_size = 0;

References:
https://reviews.freebsd.org/D2764
openzfs#3400
openzfs#3433
openzfs#3451

Ported by: kernelOfTruth <kerneloftruth@gmail.com>
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this pull request Jun 14, 2015
The problem is that since OpenSolaris commit
illumos/illumos-gate@e14bb32
l2ad_hand is kept aligned based on ashift (which is derived from
the cache device's logical and physical block sizes).
So, the hand could be advanced by more than the sum of b_asize-s
of the written L2ARC buffers. This is because b_asize is a misnomer
at the moment as it does not always represent the allocated size:
if a buffer is compressed, then the compressed size is properly rounded,
but if the compression fails or it is not applied, then the original
size is kept and it could be smaller than what ashift requires.

For the same reasons arcstat_l2_asize and the reported used space
on the cache device could be smaller than the actual allocated size
if ashift > 9. That problem is not fixed by this change.

This change only ensures that l2ad_hand is not advanced by more
than target_sz. Otherwise we would overwrite active (unevicted)
L2ARC buffers. That problem is manifested as growing l2_cksum_bad
and l2_io_error counters.

This change also changes 'p' prefix to 'a' prefix in a few places
where variables represent allocated rather than physical size.

The resolved problem may also result in the reported allocated size
being greater than the cache device's capacity, because of the
overwritten buffers (more than one buffer claiming the same disk
space).

PR:	198242
PR:	195746 (possibly related)

Porting notes:

Rather difficult to track changes related to:

Illumos 5369 - arc flags should be an enum
and
Illumos 5408 - managing ZFS cache devices requires lots of RAM

hdr->b_l2hdr = l2hdr;
changed to
hdr->b_flags |= ARC_FLAG_HAS_L2HDR;

list_insert_head(dev->l2ad_buflist, hdr);
changed to
list_insert_head(&dev->l2ad_buflist, hdr);

Account for the error message:
error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
uint64_t stats_size = 0;

References:
https://reviews.freebsd.org/D2764
openzfs#3400
openzfs#3433
openzfs#3451

Ported by: kernelOfTruth <kerneloftruth@gmail.com>
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this pull request Jun 24, 2015
The problem is that since OpenSolaris commit
illumos/illumos-gate@e14bb32
l2ad_hand is kept aligned based on ashift (which is derived from
the cache device's logical and physical block sizes).
So, the hand could be advanced by more than the sum of b_asize-s
of the written L2ARC buffers. This is because b_asize is a misnomer
at the moment as it does not always represent the allocated size:
if a buffer is compressed, then the compressed size is properly rounded,
but if the compression fails or it is not applied, then the original
size is kept and it could be smaller than what ashift requires.

For the same reasons arcstat_l2_asize and the reported used space
on the cache device could be smaller than the actual allocated size
if ashift > 9. That problem is not fixed by this change.

This change only ensures that l2ad_hand is not advanced by more
than target_sz. Otherwise we would overwrite active (unevicted)
L2ARC buffers. That problem is manifested as growing l2_cksum_bad
and l2_io_error counters.

This change also changes 'p' prefix to 'a' prefix in a few places
where variables represent allocated rather than physical size.

The resolved problem may also result in the reported allocated size
being greater than the cache device's capacity, because of the
overwritten buffers (more than one buffer claiming the same disk
space).

PR:	198242
PR:	195746 (possibly related)

Porting notes:

Rather difficult to track changes related to:

Illumos 5369 - arc flags should be an enum
and
Illumos 5408 - managing ZFS cache devices requires lots of RAM

hdr->b_l2hdr = l2hdr;
changed to
hdr->b_flags |= ARC_FLAG_HAS_L2HDR;

list_insert_head(dev->l2ad_buflist, hdr);
changed to
list_insert_head(&dev->l2ad_buflist, hdr);

Account for the error message:
error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
uint64_t stats_size = 0;

References:
https://reviews.freebsd.org/D2764
openzfs#3400
openzfs#3433
openzfs#3451

Ported by: kernelOfTruth <kerneloftruth@gmail.com>
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jun 24, 2015
If we don't account for that, then we might end up overwriting disk
area of buffers that have not been evicted yet, because l2arc_evict
operates in terms of disk addresses.

The discrepancy between the write size calculation and the actual
increment to l2ad_hand was introduced in commit 3a17a7a.

The change that introduced l2ad_hand alignment was almost correct
as the write size was accumulated as a sum of rounded buffer sizes.
See commit illumos/illumos-gate@e14bb32.

Also, we now consistently use asize / a_sz for the allocated size and
psize / p_sz for the physical size.  The latter accounts for a
possible size reduction because of the compression, whereas the
former accounts for a possible subsequent size expansion because of
the alignment requirements.

The code still assumes that either underlying storage subsystems or
hardware is able to do read-modify-write when an L2ARC buffer size is
not a multiple of a disk's block size.  This is true for 4KB sector disks
that provide 512B sector emulation, but may not be true in general.
In other words, we currently do not have any code to make sure that
an L2ARC buffer, whether compressed or not, which is used for physical
I/O has a suitable size.

Note that currently the cache device utilization is calculated based
on the physical size, not the allocated size.  The same applies to
l2_asize kstat. That is wrong, but this commit does not fix that.
The accounting problem was introduced partially in commit 3a17a7a
and partially in 3038a2b (accounting became consistent but in favour
of the wrong size).

Porting Notes:

Reworked to be C90 compatible and the 'write_psize' variable was
removed because it is now unused.

References:
  https://reviews.csiden.org/r/229/
  https://reviews.freebsd.org/D2764

Ported-by: kernelOfTruth <kerneloftruth@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#3400
Issue openzfs#3433
Issue openzfs#3451
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jun 25, 2015
If we don't account for that, then we might end up overwriting disk
area of buffers that have not been evicted yet, because l2arc_evict
operates in terms of disk addresses.

The discrepancy between the write size calculation and the actual
increment to l2ad_hand was introduced in commit 3a17a7a.

The change that introduced l2ad_hand alignment was almost correct
as the write size was accumulated as a sum of rounded buffer sizes.
See commit illumos/illumos-gate@e14bb32.

Also, we now consistently use asize / a_sz for the allocated size and
psize / p_sz for the physical size.  The latter accounts for a
possible size reduction because of the compression, whereas the
former accounts for a possible subsequent size expansion because of
the alignment requirements.

The code still assumes that either underlying storage subsystems or
hardware is able to do read-modify-write when an L2ARC buffer size is
not a multiple of a disk's block size.  This is true for 4KB sector disks
that provide 512B sector emulation, but may not be true in general.
In other words, we currently do not have any code to make sure that
an L2ARC buffer, whether compressed or not, which is used for physical
I/O has a suitable size.

Note that currently the cache device utilization is calculated based
on the physical size, not the allocated size.  The same applies to
l2_asize kstat. That is wrong, but this commit does not fix that.
The accounting problem was introduced partially in commit 3a17a7a
and partially in 3038a2b (accounting became consistent but in favour
of the wrong size).

Porting Notes:

Reworked to be C90 compatible and the 'write_psize' variable was
removed because it is now unused.

References:
  https://reviews.csiden.org/r/229/
  https://reviews.freebsd.org/D2764

Ported-by: kernelOfTruth <kerneloftruth@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#3400
Closes openzfs#3433
Closes openzfs#3451
dweeezil pushed a commit to dweeezil/zfs that referenced this pull request Aug 20, 2015
If we don't account for that, then we might end up overwriting disk
area of buffers that have not been evicted yet, because l2arc_evict
operates in terms of disk addresses.

The discrepancy between the write size calculation and the actual
increment to l2ad_hand was introduced in commit 3a17a7a.

The change that introduced l2ad_hand alignment was almost correct
as the write size was accumulated as a sum of rounded buffer sizes.
See commit illumos/illumos-gate@e14bb32.

Also, we now consistently use asize / a_sz for the allocated size and
psize / p_sz for the physical size.  The latter accounts for a
possible size reduction because of the compression, whereas the
former accounts for a possible subsequent size expansion because of
the alignment requirements.

The code still assumes that either underlying storage subsystems or
hardware is able to do read-modify-write when an L2ARC buffer size is
not a multiple of a disk's block size.  This is true for 4KB sector disks
that provide 512B sector emulation, but may not be true in general.
In other words, we currently do not have any code to make sure that
an L2ARC buffer, whether compressed or not, which is used for physical
I/O has a suitable size.

Note that currently the cache device utilization is calculated based
on the physical size, not the allocated size.  The same applies to
l2_asize kstat. That is wrong, but this commit does not fix that.
The accounting problem was introduced partially in commit 3a17a7a
and partially in 3038a2b (accounting became consistent but in favour
of the wrong size).

Porting Notes:

Reworked to be C90 compatible and the 'write_psize' variable was
removed because it is now unused.

References:
  https://reviews.csiden.org/r/229/
  https://reviews.freebsd.org/D2764

Ported-by: kernelOfTruth <kerneloftruth@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#3400
Closes openzfs#3433
Closes openzfs#3451

Cherry-pick ported by: Tim Chase <tim@chase2k.com>
Cherry-picked from ef56b07
dweeezil pushed a commit to dweeezil/zfs that referenced this pull request Aug 20, 2015
If we don't account for that, then we might end up overwriting disk
area of buffers that have not been evicted yet, because l2arc_evict
operates in terms of disk addresses.

The discrepancy between the write size calculation and the actual
increment to l2ad_hand was introduced in commit 3a17a7a.

The change that introduced l2ad_hand alignment was almost correct
as the write size was accumulated as a sum of rounded buffer sizes.
See commit illumos/illumos-gate@e14bb32.

Also, we now consistently use asize / a_sz for the allocated size and
psize / p_sz for the physical size.  The latter accounts for a
possible size reduction because of the compression, whereas the
former accounts for a possible subsequent size expansion because of
the alignment requirements.

The code still assumes that either underlying storage subsystems or
hardware is able to do read-modify-write when an L2ARC buffer size is
not a multiple of a disk's block size.  This is true for 4KB sector disks
that provide 512B sector emulation, but may not be true in general.
In other words, we currently do not have any code to make sure that
an L2ARC buffer, whether compressed or not, which is used for physical
I/O has a suitable size.

Note that currently the cache device utilization is calculated based
on the physical size, not the allocated size.  The same applies to
l2_asize kstat. That is wrong, but this commit does not fix that.
The accounting problem was introduced partially in commit 3a17a7a
and partially in 3038a2b (accounting became consistent but in favour
of the wrong size).

Porting Notes:

Reworked to be C90 compatible and the 'write_psize' variable was
removed because it is now unused.

References:
https://reviews.csiden.org/r/229/
https://reviews.freebsd.org/D2764

Ported-by: kernelOfTruth <kerneloftruth@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#3400
Closes openzfs#3433
Closes openzfs#3451

Ported-to-0.6.4.2-by: Tim Chase <tim@chase2k.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants