Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illumos #6214, concurrency race in manipulating ARC buffer flags during L2ARC writeout can cause pool corruption #3757

Closed
siebenmann opened this issue Sep 10, 2015 · 5 comments
Milestone

Comments

@siebenmann
Copy link
Contributor

This is a just-filed Illumos bug where a concurrency race in manipulating hdr->b_flags between L2ARC writeout and ARC eviction can cause corrupt metadata to get written to the primary pool, Illumos #6214 - zpools going south. I don't know if this applies to ZoL given the different kernel environments, but ZoL does have the base commit that introduced the problem in Illumos in b9541d6.

@behlendorf behlendorf added this to the 0.7.0 milestone Sep 10, 2015
@behlendorf
Copy link
Contributor

We'll definitely want to pull the upstream fix when it's available right away for this issue. The same kind of race should be possible in ZoL but may be very unlikely in practice.

@ryao
Copy link
Contributor

ryao commented Sep 10, 2015

@behlendorf It requires that the system use L2ARC, which does not get test converage on the buildbot. Consequently, we have no idea how likely/unlikely it is. I think we should revert b9541d6 as a precaution while it is still not in a tagged release.

@kernelOfTruth
Copy link
Contributor

Quickly skimming the discussion on https://www.reddit.com/r/sysadmin/comments/1hmvel/zfs_your_worst_experiences/

and the issues entry at https://www.illumos.org/issues/6214 - there's one case of corrupt metadata (haven't found more via google with a quick search, but from 2 years ago - so it clearly can't be attributed to these changes)

Are there any pointers to the reports ?

What platform did this is appear on (mainly Illumos ? *BSD ? ZOL ?) ?

What patchstack ? There were at least 2 subsequent fixes following that referenced commit, which fixed calculation and other problems with the l2arc - but I doubt they change anything in relation to zpool metadata corruption/issue

@behlendorf behlendorf modified the milestones: 0.6.5, 0.7.0 Sep 10, 2015
@behlendorf
Copy link
Contributor

@siebenmann thanks for bringing this to our attention. I think that even though this issue is exceptionally unlikely we need treat this as a blocker for 0.6.5 and delay the release. @ryao unfortunately we can't easily revert b9541d6 so we'll need to fix the root cause. Luckily it's my understanding that there's already fix in the works from the illumos guys.

@siebenmann
Copy link
Contributor Author

I believe that the current illumos-devel thread discussing the proposed fix for this starts here, if people want a head start before the fix lands in the illumos-gate repo. This initial illumos-devel message describes where to find the original thread with the problem report and discussion in the ZFS and OmniOS mailing lists. I don't know more than this, @kernelOfTruth, because I haven't been paying particular attention to the discussion of it on either list before now.

(What prompted my report here is this strong warning about L2ARCs on omnios-discuss from one of the OmniOS developers.)

behlendorf pushed a commit to behlendorf/zfs that referenced this issue Sep 11, 2015
Reintroduce b_compress to the l2arc_buf_hdr_t.  In commit b9541d6
the compression flags were moved to the generic b_flags in the
arc_buf_hdr_t.  This is a problem because l2arc_compress_buf()
may manipulate the compression flags and this can only be done
safely under the hash lock which is not held.  See Illumos 6214
for a detailed analysis of the race.

References:
  https://www.illumos.org/issues/6214
  http://cr.illumos.org/~webrev/sensille/6214_zpools_going_south/

Porting Notes:

HDR_GET_COMPRESS() macro was removed from arc_buf_info().

In l2arc_read_done() the new ASSERTs were changed to VERIFYs to
act as an extra layer of protection for production builds.

Issue openzfs#3757
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Sep 11, 2015
Reintroduce b_compress to the l2arc_buf_hdr_t.  In commit b9541d6
the compression flags were moved to the generic b_flags in the
arc_buf_hdr_t.  This is a problem because l2arc_compress_buf()
may manipulate the compression flags and this can only be done
safely under the hash lock which is not held.  See Illumos 6214
for a detailed analysis of the race.

References:
  https://www.illumos.org/issues/6214
  http://cr.illumos.org/~webrev/sensille/6214_zpools_going_south/

Porting Notes:

HDR_GET_COMPRESS() macro was removed from arc_buf_info().

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#3757
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants