-
Notifications
You must be signed in to change notification settings - Fork 2
DocTruncate
This document describes the implementation of truncate operations in SLASH2.
There are two kinds of truncates:
- full truncate i.e. truncate to file offset position zero
- partial truncate i.e. truncate to a non-zero file offset position
In the MDS, full truncates cause the file generation number to be bumped and recording of the old file ID (FID) and file generation number (fid+gen) into a logfile which is eventually shipped to all sliods that were registered in the file's replica table. Once these logfiles are successfully received by all sliods they pertain to and the sliods reply signifying the actions have been applied (deleting the old version object files), these logfiles are deleted and the batch ID is advanced.
IO systems that are long offline, unavailable, or permanently taken out
of production will cause these logfiles to jam up, possibly resulting in
degradation of performance, depending on system activity and amount of
time.
Appropriate administrator action is to remove such IOS profiles from the
slcfg
file on the MDS.
Actions that evoke this garbage collection mechanism are:
- full truncate
-
unlink(2)
syscall - a clobbering
rename(2)
i.e. overwriting an existing file
Partial truncation is caused solely by the truncate(2)
system call
when a nonzero value is specified.
First, if the file is already marked FCMH_MDS_IN_PTRUNC
(which
signifies that it is already handling partial truncation resolution i.e.
the steps outlined here), failure (EAGAIN
) is returned immediately to
the client issuing the SETATTR
and the client is registered on an
in-memory list to be notified of completion after resolution has
occurred.
Only one partial truncation on a file can be happening at any given
time.
Next, the file is marked (still in-memory only, for now)
FCMH_MDS_IN_PTRUNC
and the client is notified that this behavior is
taking place.
At this point, it is the client's responsibility to reissue this
SETATTR
operation in case of communication failure as the MDS provides
no guarantees yet that the operation will be recorded by a journal or
other persistent behavior tracker.
The client has not returned to the application yet and is just waiting
for notification from the MDS in his internal state machine.
Next, any leases currently granted to clients for bmaps included within or falling after the partial truncate file offset position (hereby referred to as the ptrunc position) are instructed to be released. The MDS then waits for clients to relinquish all said leases, waiting a maximum of the bmap timeout time in the case of unresponsive clients.
The next action is determined depending on the value of the ptrunc position:
- if the ptrunc position falls cleanly between two bmaps, the following
actions are taken:
- bmaps after the ptrunc position are changed from
VALID
→GARBAGE
, written to the MDFS, and journaled; - the new file size is saved in the file's
sst_size
and is written to the MDFS without journaling; and
- bmaps after the ptrunc position are changed from
- the
FCMH_MDS_IN_PTRUNC
flag is cleared from the file
-
if the ptrunc position falls within a bmap, Update Scheduler work is queued to resolve the CRC recalculation for the affected sliver that must occur before processing can return to normal.
TODO: When the ptrunc position falls between slivers the MDS should be able to handle this case as in (1.) above
The residency states of all bmaps past the ptrunc position are changed from
VALID
→GARBAGE
and the residency state of the bmap where the ptrunc position lies is markedVALID
→TRUNCPNDG
.At earliest convenience (although of higher priority than any replication activity), a randomly selected
TRUNCPNDG
marked IOS is asked to perform the CRC recalculation. When one IOS replies success, the bmap is markedTRUNCPNDG
→VALID
and other replicas are markedTRUNCPNDG
→GARBAGE
.
At this point, the MDS issues BMAP_WAKE
notifications to the original
client as well as to any new clients that attempted SETATTR
or
BMAP_LEASE
requests since IN_PTRUNC
was set.
If a connection to the MDS is ever lost, the clients are themselves
responsible for reestablishing and reissuing requests.