HAMMER VFS - Implement REDO recovery code

* Implement the primary REDO recovery mechanics and document the whole mess. REDO recovery essentially works using an expanded UNDO/REDO FIFO range. The nominal UNDO range, required for running UNDOs, is calculated first. If a REDO_SYNC record is found within this range the record specifies the expanded FIFO start offset required to run REDOs. This is necessary because the inodes related to REDOs layed down in the FIFO are not necessarily flush in the next flush sequence, so the recovery code may have to scan the UNDO/REDO FIFO backwards considerably beyond the nominal recovery ranged required to run UNDOs in order to find active REDOs. When a REDO_SYNC record is found the recovery code expands the range by scanning backwards and validating the UNDO/REDO FIFO as it goes. It must make sure that the sequence space remains contiguous all the back to the REDO_SYNC point. While doing the reverse scan the recovery code collects REDO_TERM_* records which are used to mask earlier REDO_* records once their meta-data has been flushed. Only TERM records in the expanded range that are outside the nominal UNDO range matter. Any TERM records in the nominal UNDO range refer to meta-data which was undone by the stage1 UNDO recovery and so must be ignored (we want to run the related REDOs). The recovery code then does a forward scan through the entire expanded range of the UNDO/REDO FIFO executing any REDO_* records it finds which have not been masked by later REDO_TERM_* records. It executes the REDOs using the live filesystem. * Expand the REDO FIFO structure, I had forgotten to add a localization field, otherwise HAMMER doesn't know which PFS the REDO is refering to. * Umount was improperly flushing the FIFO to the disk for read-only mounts. Fix it. * The recovery code now detects whether any REDOs are present by the observation of a REDO_SYNC record in the nominal UNDO recovery range. It will not run stage2 (the REDO pass) if it does not see this record. * Properly generate a REDO_SYNC record in the UNDO space when generating only REDOs, as well as UNDOs. HAMMER was previously only generating the REDO_SYNC record when generating UNDOs. * Generate a REDO_TRUNC record during a file flush if any records were previously queued with REDO, even if those records no longer exist (e.g. due to a truncation) and even if REDO is now turned off due to redo heuristic limits being exceeded. This is necessary in order for the recovery code to properly sequence REDOs and TRUNCations during recovery. * For now be very verbose during redo recovery. * Make sure that mount -o ro and mount -u -o rw work properly. The stage2 REDO cannot be run on a read-only mount because it requires a live filesystem. The operations are defered until the mount is upgraded to rw.
randy1 · Mar 26, 2010 · c58123d · c58123d
1 parent 8ac50aa
commit c58123d
Show file tree

Hide file tree

Showing 7 changed files with 586 additions and 36 deletions.
diff --git a/sys/vfs/hammer/hammer.h b/sys/vfs/hammer/hammer.h
@@ -869,6 +869,8 @@ struct hammer_mount {
 	hammer_tid_t	flush_tid2;		/* flusher tid sequencing */
 	int64_t copy_stat_freebigblocks;	/* number of free bigblocks */
 	u_int32_t	undo_seqno;		/* UNDO/REDO FIFO seqno */
+	u_int32_t	recover_stage2_seqno;	/* REDO recovery seqno */
+	hammer_off_t	recover_stage2_offset;	/* REDO recovery offset */
 
 	struct netexport export;
 	struct hammer_lock sync_lock;
@@ -896,6 +898,8 @@ typedef struct hammer_mount	*hammer_mount_t;
 #define HAMMER_MOUNT_CRITICAL_ERROR	0x0001
 #define HAMMER_MOUNT_FLUSH_RECOVERY	0x0002
 #define HAMMER_MOUNT_REDO_SYNC		0x0004
+#define HAMMER_MOUNT_REDO_RECOVERY_REQ	0x0008
+#define HAMMER_MOUNT_REDO_RECOVERY_RUN	0x0010
 
 struct hammer_sync_info {
 	int error;

diff --git a/sys/vfs/hammer/hammer_disk.h b/sys/vfs/hammer/hammer_disk.h
@@ -494,6 +494,8 @@ struct hammer_fifo_redo {
 	hammer_off_t		redo_offset;	/* logical offset in file */
 	int32_t			redo_data_bytes;
 	u_int32_t		redo_flags;
+	u_int32_t		redo_localization;
+	u_int32_t		redo_reserved;
 	u_int64_t		redo_mtime;	/* set mtime */
 };
 

diff --git a/sys/vfs/hammer/hammer_flusher.c b/sys/vfs/hammer/hammer_flusher.c
@@ -894,6 +894,8 @@ hammer_flusher_meta_halflimit(hammer_mount_t hmp)
 int
 hammer_flusher_haswork(hammer_mount_t hmp)
 {
+	if (hmp->ronly)
+		return(0);
 	if (hmp->flags & HAMMER_MOUNT_CRITICAL_ERROR)
 		return(0);
 	if (TAILQ_FIRST(&hmp->flush_group_list) ||	/* dirty inodes */

diff --git a/sys/vfs/hammer/hammer_mount.h b/sys/vfs/hammer/hammer_mount.h
@@ -59,8 +59,6 @@ struct hammer_mount_info {
 #define HMNT_MASTERID	0x00000002	/* master_id field set */
 #define HMNT_EXPORTREQ	0x00000004
 #define HMNT_UNDO_DIRTY	0x00000008
-#define HMNT_STAGE2	0x00000010	/* ran stage-2 recovery */
-#define HMNT_HASREDO	0x00000020	/* stage-2 must scan for REDO */
 
 #define HMNT_USERFLAGS	(HMNT_NOHISTORY | HMNT_MASTERID)