-
Notifications
You must be signed in to change notification settings - Fork 69
3821 Race in rollback, zil close, and zil flush #208
Conversation
d66461d
to
c79c5ea
Compare
|
@ahrens does this change have any relation to the following bugs (filed by me)? |
|
@avg-I There's a possibility that they are related although it's not immediately obvious to me. This race was very subtle so it's possible that it had a wider impact than what we initially analyzed. |
|
@grwilson having looked at the actual change I think that it is probably unrelated to those issues. |
|
Reporting back that I was able to reproduce both problems with this patch applied. |
c79c5ea
to
b860d8d
Compare
|
Thanks @avg-I, I added you to the commit message as an official reviewer for this patch. |
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Effectively we can’t update the “synced” uberblock until all blocks have been written to disk, the uberblocks have been updated on-disk, and all ZILs have been cleaned. The old code was updating this too early which would allow the ZIL to think that it could proceed when closing the zil. Now, a txg is only considered "synced" after the all blocks have been written to disk, the uberblocks have been updated on-disk, and all ZILs have been cleaned. This will close the race that can happen between zil_commit() and zil_clean(). Upstream bugs: DLPX-43104
b860d8d
to
804e54b
Compare
|
@zettabot go |
|
@ahrens I think this is good to go, but please just verify the zloop output prior to RTI-ing and merging this; I think that failure is seen on master, but I'm not certain. |
|
Yes, that's definitely a known problem with ztest. |
Reviewed by: Matthew Ahrens mahrens@delphix.com
Reviewed by: Dan Kimmel dan.kimmel@delphix.com
Reviewed by: Pavel Zakharov pavel.zakharov@delphix.com
Effectively we can’t update the “synced” uberblock until all blocks
have been written to disk, the uberblocks have been updated on-disk,
and all ZILs have been cleaned.
The old code was updating this too early which would allow the ZIL to
think that it could proceed when closing the zil. Now, a txg is only
considered "synced" after the all blocks have been written to disk, the
uberblocks have been updated on-disk, and all ZILs have been cleaned.
This will close the race that can happen between zil_commit() and
zil_clean().
Upstream bugs: DLPX-43104