DocTruncate

Truncation Handling

Overview

This document describes the implementation of truncate operations in SLASH2.

There are two kinds of truncates:

full truncate i.e. truncate to file offset position zero
partial truncate i.e. truncate to a non-zero file offset position

Full truncation

In the MDS, full truncates cause the file generation number to be bumped and recording of the old file ID (FID) and file generation number (fid+gen) into a logfile which is eventually shipped to all sliods that were registered in the file's replica table. Once these logfiles are successfully received by all sliods they pertain to and the sliods reply signifying the actions have been applied (deleting the old version object files), these logfiles are deleted and the batch ID is advanced.

IO systems that are long offline, unavailable, or permanently taken out of production will cause these logfiles to jam up, possibly resulting in degradation of performance, depending on system activity and amount of time. Appropriate administrator action is to remove such IOS profiles from the slcfg file on the MDS.

Actions that evoke this garbage collection mechanism are:

full truncate
unlink(2) syscall
a clobbering rename(2) i.e. overwriting an existing file

Partial truncation

Partial truncation is caused solely by the truncate(2) system call when a nonzero value is specified.

First, if the file is already marked FCMH_MDS_IN_PTRUNC (which signifies that it is already handling partial truncation resolution i.e. the steps outlined here), failure (EAGAIN) is returned immediately to the client issuing the SETATTR and the client is registered on an in-memory list to be notified of completion after resolution has occurred. Only one partial truncation on a file can be happening at any given time.

Next, the file is marked (still in-memory only, for now) FCMH_MDS_IN_PTRUNC and the client is notified that this behavior is taking place. At this point, it is the client's responsibility to reissue this SETATTR operation in case of communication failure as the MDS provides no guarantees yet that the operation will be recorded by a journal or other persistent behavior tracker. The client has not returned to the application yet and is just waiting for notification from the MDS in his internal state machine.

Next, any leases currently granted to clients for bmaps included within or falling after the partial truncate file offset position (hereby referred to as the ptrunc position) are instructed to be released. The MDS then waits for clients to relinquish all said leases, waiting a maximum of the bmap timeout time in the case of unresponsive clients.

The next action is determined depending on the value of the ptrunc position:

if the ptrunc position falls cleanly between two bmaps, the following actions are taken:
- bmaps after the ptrunc position are changed from VALID → GARBAGE, written to the MDFS, and journaled;
- the new file size is saved in the file's sst_size and is written to the MDFS without journaling; and

the FCMH_MDS_IN_PTRUNC flag is cleared from the file

if the ptrunc position falls within a bmap, Update Scheduler work is queued to resolve the CRC recalculation for the affected sliver that must occur before processing can return to normal.

TODO: When the ptrunc position falls between slivers the MDS should be able to handle this case as in (1.) above

The residency states of all bmaps past the ptrunc position are changed from VALID → GARBAGE and the residency state of the bmap where the ptrunc position lies is marked VALID → TRUNCPNDG.

At earliest convenience (although of higher priority than any replication activity), a randomly selected TRUNCPNDG marked IOS is asked to perform the CRC recalculation. When one IOS replies success, the bmap is marked TRUNCPNDG → VALID and other replicas are marked TRUNCPNDG → GARBAGE.

At this point, the MDS issues BMAP_WAKE notifications to the original client as well as to any new clients that attempted SETATTR or BMAP_LEASE requests since IN_PTRUNC was set. If a connection to the MDS is ever lost, the clients are themselves responsible for reestablishing and reissuing requests.

SLASH2

Funded in part by:

Provide feedback

Saved searches