[FEATURE] Use `fsfreeze` instead of `sync` before snapshot #2187

yasker · 2021-01-15T19:07:42Z

Is your feature request related to a problem? Please describe.
Currently before snapshot operation, we will run a sync to flush the cache into the disk. It will be even safer if we also run fsfreeze against the filesystem, to stop the filesystem from written data to the disk before we take the snapshot. This will ensure the snapshot we took contains the stable filesystem image.

Describe the solution you'd like
Snapshot process:

fsfreeze --freeze <fs_mount_point>, which should include the cache flush. See https://linux.die.net/man/8/fsfreeze
Take a snapshot
fsfreeze --unfreeze <fs_mount_point>

Notice fsfreeze only works for certain filesystems. For the filesystems that doesn't support fsfreeze, sync will be used instead.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Since we have to keep the freeze period as short as possible, it's better to do it inside the engine. One additional complexity is how can we know where is block device is mounted inside the engine.

The text was updated successfully, but these errors were encountered:

longhorn-io-github-bot · 2021-03-03T20:53:29Z

innobead · 2021-03-08T03:45:52Z

ref: #2029 (comment)

joshimoo · 2021-03-09T10:47:30Z

the manual test should be:

create a kubernetes pod + pvc that mounts a lh volume
dd if=/dev/urandom of=/mnt/<volume>
while running initiate a snapshot
verify snapshot succeeded and dd operation will complete
create another snapshot/backup.

@khushboo-rancher @meldafrawi I am guessing this is already covered by our manual backup tests.

khushboo-rancher · 2021-03-18T20:12:16Z

Verified with Longhorn-master 03/19/2021

Validation - Pass

The test steps from #2187 (comment) works fine.

@joshimoo The snapshot and backup creation worked fine while the writing was in process, but this works fine with v1.1.0 also, how can I make sure that fsfreeze is being validated with these test steps?

Regarding the test cases, we generally do these kind of testing with our pre release testing like basic operations with volume while writing is in the process, but I can't locate exact test case, I'll add this in our pre-release test.

khushboo-rancher · 2021-03-19T15:24:05Z

Logs for Froze filesystem

[instance-manager-e-424ff07d] [pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9-e-93cf8583] time="2021-03-19T15:12:59Z" level=info msg="Starting snapshot" snapshot= volume=pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9 
[instance-manager-e-424ff07d] [pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9-e-93cf8583] time="2021-03-19T15:13:00Z" level=info msg="Froze filesystem of volume mounted at /var/lib/kubelet/pods/7f3a44c7-fc27-4e08-ac8c-134eb79b3106/volumes/kubernetes.io~csi/pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9/mount" volume=pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9 
[instance-manager-r-8b09fc1c] [pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9-r-9320ccbc] time="2021-03-19T15:13:00Z" level=info msg="Replica server starts to snapshot [d77debd9-7670-4cd1-ab7e-eecfd11c4026] volume, user created true, created time 2021-03-19T15:13:00Z, labels map[]" 
[instance-manager-r-8b09fc1c] time="2021-03-19T15:13:00Z" level=info msg="Starting to create disk" disk=d77debd9-7670-4cd1-ab7e-eecfd11c4026 
[instance-manager-r-8b09fc1c] time="2021-03-19T15:13:00Z" level=info msg="Finished creating disk" disk=d77debd9-7670-4cd1-ab7e-eecfd11c4026 
[instance-manager-e-424ff07d] [pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9-e-93cf8583] time="2021-03-19T15:13:00Z" level=info msg="Starting to snapshot: 10.42.3.7:10000 d77debd9-7670-4cd1-ab7e-eecfd11c4026 UserCreated true Created at 2021-03-19T15:13:00Z, Labels map[]" 
[instance-manager-r-15523ad4] [pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9-r-290693c4] time="2021-03-19T15:13:00Z" level=info msg="Replica server starts to snapshot [d77debd9-7670-4cd1-ab7e-eecfd11c4026] volume, user created true, created time 2021-03-19T15:13:00Z, labels map[]" 
[instance-manager-r-5bdc7ee4] [pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9-r-cab61160] time="2021-03-19T15:13:00Z" level=info msg="Replica server starts to snapshot [d77debd9-7670-4cd1-ab7e-eecfd11c4026] volume, user created true, created time 2021-03-19T15:13:00Z, labels map[]" 
[instance-manager-r-5bdc7ee4] time="2021-03-19T15:13:00Z" level=info msg="Starting to create disk" disk=d77debd9-7670-4cd1-ab7e-eecfd11c4026 
[instance-manager-r-15523ad4] [pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9-r-290693c4] time="2021-03-19T15:13:00Z" level=info msg="Starting to create disk" disk=d77debd9-7670-4cd1-ab7e-eecfd11c4026 
[instance-manager-r-5bdc7ee4] time="2021-03-19T15:13:00Z" level=info msg="Finished creating disk" disk=d77debd9-7670-4cd1-ab7e-eecfd11c4026 
[instance-manager-r-15523ad4] [pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9-r-290693c4] time="2021-03-19T15:13:00Z" level=info msg="Finished creating disk" disk=d77debd9-7670-4cd1-ab7e-eecfd11c4026 
[instance-manager-e-424ff07d] [pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9-e-93cf8583] time="2021-03-19T15:13:00Z" level=info msg="Starting to snapshot: 10.42.1.7:10000 d77debd9-7670-4cd1-ab7e-eecfd11c4026 UserCreated true Created at 2021-03-19T15:13:00Z, Labels map[]" 
[instance-manager-e-424ff07d] time="2021-03-19T15:13:00Z" level=info msg="Starting to snapshot: 10.42.2.7:10000 d77debd9-7670-4cd1-ab7e-eecfd11c4026 UserCreated true Created at 2021-03-19T15:13:00Z, Labels map[]" 
[instance-manager-e-424ff07d] time="2021-03-19T15:13:00Z" level=info msg="Finished to snapshot: 10.42.1.7:10000 d77debd9-7670-4cd1-ab7e-eecfd11c4026 UserCreated true Created at 2021-03-19T15:13:00Z, Labels map[]" 
[instance-manager-e-424ff07d] time="2021-03-19T15:13:00Z" level=info msg="Finished to snapshot: 10.42.2.7:10000 d77debd9-7670-4cd1-ab7e-eecfd11c4026 UserCreated true Created at 2021-03-19T15:13:00Z, Labels map[]" 
[instance-manager-e-424ff07d] [pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9-e-93cf8583] time="2021-03-19T15:13:00Z" level=info msg="Finished to snapshot: 10.42.3.7:10000 d77debd9-7670-4cd1-ab7e-eecfd11c4026 UserCreated true Created at 2021-03-19T15:13:00Z, Labels map[]" 
[instance-manager-e-424ff07d] time="2021-03-19T15:13:00Z" level=info msg="Finished snapshot" snapshot= volume=pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9 
[instance-manager-e-424ff07d] [pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9-e-93cf8583] time="2021-03-19T15:13:00Z" level=info msg="Unfroze filesystem of volume mounted at /var/lib/kubelet/pods/7f3a44c7-fc27-4e08-ac8c-134eb79b3106/volumes/kubernetes.io~csi/pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9/mount" volume=pvc-43f5d029-511e-48ff-84bd-13545d3a7ef9

joshimoo · 2021-03-19T16:44:43Z

@khushboo-rancher Your best bet is the Froze Filesystem log.

You can also add status reporting dd if=/dev/urandom of=/mnt/<volume> conv=sync status=progress this should force outputting. But all of this is only relevant if the freeze takes longer then a second, which it shouldn't :)

If you want to give this a real good test, you can spin up a 32 GB memory node, then create a 500 GB volume.
Mount this volume and write it full, then start a new write now try dd if=/dev/urandom of=/mnt/<volume> status=progress bs=16M we use a big 16M buffer for DD, you should have a big dirty page cache because of the 32 GB.

Before doing the snapshot, run cat /proc/meminfo | grep Dirty
After the fsfreeze run cat /proc/meminfo | grep Dirty and keep the results.

Which will get flushed once the fsfreeze happens, the same behavior can be tested without a mounted filesystem in that case we will run a sync which can take a while as well.

Here is some info on the page cache.
https://www.thomas-krenn.com/en/wiki/Linux_Page_Cache_Basics

khushboo-rancher · 2021-03-23T08:32:46Z

I checked the page after taking the snapshot, it went down after the fsfreeze.

innobead · 2021-10-08T09:53:33Z

Reopened, due to #3125 reverting the implementation.

ChipWolf · 2022-03-03T13:50:29Z

Is this still going to proceed?

derekbit · 2023-09-11T16:40:30Z

The issue mentioned in #3125 is due to that fsfreeze freezes the mount point of the given path. Despite the man page stating the path is the pathname of the mount point, it can be any path.

The potential race condition can be addressed by adding a device mapper (dm) device on top of the iSCSI block device. As a result, the block device was exported as a Longhorn volume.

If one wants to take a snapshot of the volume, it can execute dmsetup suspend and the filesystem is automatically frozen (code). In addition to preventing the race condition, the DM device can be used to live migrate a v1 volume to another instance-manager pod as described in #6001 (comment). Its disadvantage is that it cannot be applied to existing volumes.

Any feedback is appreciated. Thank you.

cc @shuo-wu @innobead @james-munson @WebberHuang1118 @Vicente-Cheng

derekbit · 2023-09-11T16:41:54Z

Is this still going to proceed?

@ChipWolf Yes. Any concern or disadvantage of the current sync way?

ChipWolf · 2023-09-11T16:42:52Z

Is this still going to proceed?

@ChipWolf Yes. Any concern or disadvantage of the current sync way?

Stable snapshots, freezing disk writes

derekbit · 2023-09-12T09:29:38Z

Performance benchmark

AWS c5d.xlarge with direct nvme disks
The device mapper doesn't introduce significant performance impact.
Longhorn volume with single replica (v1.5.1)

IOPS (Read/Write)
        Random:           15,277 / 9,014
    Sequential:          26,884 / 13,337

Bandwidth in KiB/sec (Read/Write)
        Random:         180,662 / 78,641
    Sequential:         180,682 / 79,799
                                        

Latency in ns (Read/Write)
        Random:        388,402 / 371,265
    Sequential:        324,336 / 372,944

Longhorn volume with single replica + device mapper linear device
- dmsetup create testdisk --table "0 $(blockdev --getsz /dev/longhorn/testvol) linear /dev/longhorn/testvol 0"

IOPS (Read/Write)
        Random:           15,394 / 9,001
    Sequential:          26,783 / 13,309

Bandwidth in KiB/sec (Read/Write)
        Random:         180,626 / 78,629
    Sequential:         185,223 / 79,812
                                        
Latency in ns (Read/Write)
        Random:        385,477 / 369,820
    Sequential:        318,747 / 367,316

cc @shuo-wu @shuo-wu

WebberHuang1118 · 2023-09-12T10:33:54Z

The issue mentioned in #3125 is due to that fsfreeze freezes the mount point of the given path. Despite the man page stating the path is the pathname of the mount point, it can be any path.

The potential race condition can be addressed by adding a device mapper (dm) device on top of the iSCSI block device. As a result, the block device was exported as a Longhorn volume.

If one wants to take a snapshot of the volume, it can execute dmsetup suspend and the filesystem is automatically frozen (code). In addition to preventing the race condition, the DM device can be used to live migrate a v1 volume to another instance-manager pod as described in #6001 (comment). Its disadvantage is that it cannot be applied to existing volumes.

Any feedback is appreciated. Thank you.

cc @shuo-wu @innobead @james-munson @WebberHuang1118 @Vicente-Cheng

As I know, the suspended DM device should be on the top of the storage stack, according to the architecture, there would be another longhorn device on it?

Since DM introduces an additional layer for processing bio, it could bring performance degradation. However, if we only leverage linear target, maybe it can be ignored.

Aside from above, LGTM, thanks.

innobead · 2023-09-12T13:08:53Z

If one wants to take a snapshot of the volume, it can execute dmsetup suspend and the filesystem is automatically frozen (code). In addition to preventing the race condition, the DM device can be used to live migrate a v1 volume to another instance-manager pod as described in #6001 (comment). Its disadvantage is that it cannot be applied to existing volumes.

If an existing volume gets reattached, then we are still able to apply this dm device, right?

About live upgrade, this will be a big benefit for resource usage reduction, because it means there will be no more 2 instance manager pods in the long run if users enable live migration. We can do live migration in a later version, but this version focuses on data consistency.

Also, the current implementation using sync is for all filesystems on the host, so it could be expected to have extra performance costs..

After discussing with @derekbit, introducing this extra layer dm device on top of Longhorn volume should be independent from volume type, no matter what type of volume is. It can apply to fs or raw block volume.

derekbit · 2023-09-12T15:20:08Z

If one wants to take a snapshot of the volume, it can execute dmsetup suspend and the filesystem is automatically frozen (code). In addition to preventing the race condition, the DM device can be used to live migrate a v1 volume to another instance-manager pod as described in #6001 (comment). Its disadvantage is that it cannot be applied to existing volumes.

If an existing volume gets reattached, then we are still able to apply this dm device, right?

After creating an iSCSI block device /dev/sdx, we will add a dm-linear device on the top of the block device. Then, link the device to /dev/longhorn/<pvc name>. Thus, the block mappings are different from those in the current device implementation. The dm solution cannot be applied to existing volumes.

About live upgrade, this will be a big benefit for resource usage reduction, because it means there will be no more 2 instance manager pods in the long run if users enable live migration. We can do live migration in a later version, but this version focuses on data consistency.

Also, the current implementation using sync is for all filesystems on the host, so it could be expected to have extra performance costs..

After discussing with @derekbit, introducing this extra layer dm device on top of Longhorn volume should be independent from volume type, no matter what type of volume is. It can apply to fs or raw block volume.

derekbit · 2023-09-12T15:26:39Z

As I know, the suspended DM device should be on the top of the storage stack, according to the architecture, there would be another longhorn device on it?

The new storage stack will be like
#2187 (comment)

cc @WebberHuang1118

innobead · 2023-09-12T15:31:46Z

After creating an iSCSI block device /dev/sdx, we will add a dm-linear device on the top of the block device. Then, link the device to /dev/longhorn/<pvc name>. Thus, the block mappings are different from those in the current device implementation. The dm solution cannot be applied to existing volumes.

Does this mean we need to have two ways to handle existing volume and new volume? If yes, how to differentiate them?

Vicente-Cheng · 2023-09-13T01:54:28Z

Thanks @derekbit for the detailed explanation.
So briefly, the new design would be like #6001 (comment), right?
(means the /dev/sdx would add a dm on the top of the block device and export it as the longhorn device)

I do not have much concern about this design. It looks like the dm and suspend command would help us to do things like fsfreeze.

Just curious about the client IO and suspend period.
When we do dmsetup suspend, the client IO will stall.
Some sensitive clients will get errors or warnings if this suspension period takes a little bit longer.
Also on #6001 (comment), we will have the same situation.

TL;DR, I am curious about the trade-off between dm suspension period and client IO.
Others seem good to me! Thanks for it!

WebberHuang1118 · 2023-09-13T03:03:08Z

Thanks @derekbit for the detailed explanation. So briefly, the new design would be like #6001 (comment), right? (means the /dev/sdx would add a dm on the top of the block device and export it as the longhorn device)

I do not have much concern about this design. It looks like the dm and suspend command would help us to do things like fsfreeze.

Just curious about the client IO and suspend period. When we do dmsetup suspend, the client IO will stall. Some sensitive clients will get errors or warnings if this suspension period takes a little bit longer. Also on #6001 (comment), we will have the same situation.

TL;DR, I am curious about the trade-off between dm suspension period and client IO. Others seem good to me! Thanks for it!

From my understanding,

During DM suspension, the on-the-fly IOs would be kept in the fs layer or dm layer, which depends on the progress of individual IOs. After DM resume is issued, the prisoned IOs will be re-processed.

For the suspension period, the affection depends on the application's implementation. For example, dd has no pre-defined timeout, so there will be no side effects for it. But like @Vicente-Cheng mentioned, it could be a different situation for some sensitive applications.

However, @derekbit only plans to leverage DM suspend to have table reload for a linear target, since it's an operation with less complexity (compared to taking snapshots, data migration ... etc.), IMHO, this may not require a great suspend period but is worth further verification.

Besides, since the DM linear target only performs naive bio remap, it should be safe to add it to any storage layer dynamically. Maybe we can consider this to have backward compatibility.

Thanks.

derekbit · 2023-09-13T04:17:03Z

If one wants to take a snapshot of the volume, it can execute dmsetup suspend and the filesystem is automatically frozen (code). In addition to preventing the race condition, the DM device can be used to live migrate a v1 volume to another instance-manager pod as described in #6001 (comment). Its disadvantage is that it cannot be applied to existing volumes.

If an existing volume gets reattached, then we are still able to apply this dm device, right?

After creating an iSCSI block device /dev/sdx, we will add a dm-linear device on the top of the block device. Then, link the device to /dev/longhorn/<pvc name>. Thus, the block mappings are different from those in the current device implementation. The dm solution cannot be applied to existing volumes.

Correct my statement. Just discussed with dm expert @WebberHuang1118 and did a quick test. dm-linear is a simple device and won't change the data layout, so we can detach existing volumes and add the dm-linear device when attaching them.

cc @innobead

innobead · 2023-09-13T04:31:19Z

This is great! Good cooperation @derekbit @WebberHuang1118

derekbit · 2023-09-26T12:45:45Z

When starting a v1 volume engine, there is a big controller lock for protecting the controller creation, and the WriteAt and ReadAt are also safeguarded by the big lock. Unlike the implementation of v2 volume, we cannot directly add a dm-linear device in startFrontend function. Attempting to do so would result in a deadlock during dm-linear creation. This is due to there are IOs being written to the iSCSI device when creating a dm-linear device.

To address the issue, we have to do

modify startFrontend(): create an iSCSI block device and do not duplicate the iSCSI block device to /dev/longhorn/
introduce exposeFrontend(): add a dm-linear device on the top of the iSCSI block device and duplicate the device /dev/mapper/<volume name> to /dev/longhorn/<volume name>. The function cannot be protected by the controller lock to avoid a dead lock. However, will it introduce any issue?

Additionally, the dm-linear device introduced also impacts the volume expansion and so on. The scope is much bigger and beyond the ticket.

Before the implementation, it is essential to investigate the necessity of the big controller lock.

cc @shuo-wu @Vicente-Cheng @WebberHuang1118 @innobead

WebberHuang1118 · 2023-09-27T00:45:34Z

When starting a v1 volume engine, there is a big controller lock for protecting the controller creation, and the WriteAt and ReadAt are also safeguarded by the big lock. Unlike the implementation of v2 volume, we cannot directly add a dm-linear device in startFrontend function. Attempting to do so would result in a deadlock during dm-linear creation. This is due to there are IOs being written to the iSCSI device when creating a dm-linear device.

To address the issue, we have to do

modify startFrontend(): create an iSCSI block device and do not duplicate the iSCSI block device to /dev/longhorn/

introduce exposeFrontend(): add a dm-linear device on the top of the iSCSI block device and duplicate the device /dev/mapper/<volume name> to /dev/longhorn/<volume name>. The function cannot be protected by the controller lock to avoid a dead lock. However, will it introduce any issue?

Additionally, the dm-linear device introduced also impacts the volume expansion and so on. The scope is much bigger and beyond the ticket.

Before the implementation, it is essential to investigate the necessity of the big controller lock.

cc @shuo-wu @Vicente-Cheng @WebberHuang1118 @innobead

Hi @derekbit,

Can you provide your implementation? I'd like to take a loot at it, thanks.

derekbit · 2023-09-27T03:46:38Z

@WebberHuang1118
Not implemented because of the concerns mentioned in #2187 (comment).

WebberHuang1118 · 2023-10-03T04:26:23Z

Based on my understanding,
the LH block device is visible after controller.start()->controller.startFrontend()->Tgt.Startup().

Since the block device shouldn't recieve any I/O before LH exposes it, the big controller lock in controller.Start() seems okay to be removed. Is there any concern about holding the lock during controller start?

IIRC, DM device has flush I/Os before table loading, maybe it's the reason @derekbit encountered I/O deadlock.

Thanks.

ChipWolf · 2023-10-26T01:09:15Z

I have applications that need to save data to disk before write operations are paused for a snapshot, and be notified when write operations resume. In the proposed change, will there be a way to do this, hooks or something?

Some of my applications, due to a specific upstream dependency, hold crucial data in memory and only save it to disk occasionally. These applications don't handle extended write pauses well, but they expose some methods to manage write suspensions/resume operation, and saving any pending data to disk. As the volumes used by these applications are large, snapshots can take a while, making it important for our app to perform these operations before writes are suspended.

Ideally we'd want to receive a hook pre-snapshot that our app must acknowledge before the snapshot takes place, then another hook on completion of the snapshot.

Related/duplicate #2128 #2164

ejweber · 2024-03-28T21:36:43Z

Status update:

After discussing offline with @innobead and @derekbit, we have decided to pursue the fsfreeze approach (assuming we can prove its safety) for now, with the linear device potentially coming later if it help to improve live engine upgrade for v1 volumes as predicted.

I implemented an idea for making the fsfreeze approach safe in longhorn/longhorn-engine@34f4b2a. It opened file descriptors to all the volume's mount points, then chose one to verify (is it still a mount point) and freeze. The open descriptors prevented anyone (e.g. the CSI driver) from unmounting the file system before the freeze command was issued. @PhanLe1010, @shuo-wu, and I discovered some drawbacks:

The file system can still be lazy unmounted (umount -l) while "locked" in this way. It is unlikely that anyone will lazy unmount the file system in the super brief time period between lock and freeze (since it doesn't appear that the CSI plugin or kubelet do it), but it is a concern.
If the engine crashes mid-snapshot, it cannot unfreeze the file system. Subsequent attempts to unmount it will fail until some component tries to unfreeze it again.

For 1, I am implementing a suggestion from @PhanLe1010 that we use our own mount of the file system instead of a file descriptor on someone else's mount to do the "lock" in longhorn/longhorn-engine@fd512c3. We control the mount, so there is virtually no danger of us losing it before fsfreeze.

For 2, I am still working on it, but I think it will look something like a suggestion from @shuo-wu. When instance-manager starts up, it will briefly look for mounted Longhorn file systems and attempt to unfreeze them. (If instance-manager was previously down, there is no way these file systems are healthy.) Similarly, when an engine starts up, it will briefly look for mounts of itself and attempt to unfreeze one. We may need to periodically check for mounts of Longhorn file systems for which no engine is running in instance-manager as well, in case there is a scenario in which:

An engine crashes.
Its instance-manager does not crash.
Longhorn never tries to restart the engine.

While I think this feature will be good to implement, I'm a bit worried users will run into corner cases we do not anticipate. I think we should make fsfreeze functionality at least globally configurable (defaulting to on) to give users an out if their environment or use case doesn't play well with it.

ejweber · 2024-04-03T16:45:57Z

Another interesting caveat is that both my method of identifying a file system to freeze and the one implemented in https://github.com/longhorn/longhorn-engine/pull/565/files tend to fall apart for block volumes.

Both implementations look for a file system with the source /dev/longhorn/... as its source. If the container consuming the block volume creates a file system and mounts it to some location:

Its source is /some/path/in/the/container.
The mount is completely invisible to longhorn-engine, since it does not exist in the host mount namespace.

This is probably fine, and maybe even good.

I don't think Kubernetes block volumes are generally used by workloads that create a file system on them and mount them. The Kubernetes docs suggest a more common use case is raw block access for databases.
If a workload IS partitioning a block volume and/or creating file systems on it, it probably assumes it has total control over it. It would be strange to seek out file systems that Longhorn didn't create itself for freezing.

ejweber · 2024-04-15T22:44:32Z

Implementation is currently held up by an ongoing investigation into a manual test failure that can leave the fsfreeze approach AND the dmsetup approach with unkillable workload processes and the need to reboot. Posting the results here as I work through it: https://github.com/longhorn/longhorn/wiki/Freezing-File-Systems-With-dmsetup-suspend-Versus-fsfreeze.

yasker added this to the v1.1.1 milestone Jan 15, 2021

yasker added area/v1-data-engine v1 data engine (iSCSI tgt) kind/feature Feature request, new feature priority/0 Must be fixed in this release (managed by PO) require/manual-test-plan Require adding/updating manual test cases if they can't be automated labels Jan 15, 2021

yasker assigned joshimoo Jan 15, 2021

This was referenced Mar 3, 2021

Wrap underlying os.exec error so we can process the programs exit code longhorn/go-iscsi-helper#38

Merged

Freeze a volumes filesystem if mounted before taking a snapshot longhorn/longhorn-engine#565

Merged

joshimoo added the require/automation-engine label Mar 3, 2021

shuo-wu mentioned this issue Mar 8, 2021

[BUG] Scheduled backups didn't complete and leave tons of snapshots behind #2029

Open

joshimoo mentioned this issue Mar 12, 2021

[BUG] MountVolume.SetUp failed for volume: Structure needs cleaning #2330

Open

PhanLe1010 mentioned this issue Mar 17, 2021

[QUESTION] After attaching volume in maintenance mode, the volume can't no longer be attached or detached #2338

Closed

khushboo-rancher self-assigned this Mar 18, 2021

khushboo-rancher mentioned this issue Mar 19, 2021

Add manual test for fsfreeze longhorn/longhorn-tests#576

Merged

khushboo-rancher closed this as completed Mar 23, 2021

joshimoo mentioned this issue Apr 1, 2021

[BUG] RWX filesystem is not frozen for snapshots #2434

Open

innobead added the highlight Important feature/issue to highlight label Apr 9, 2021

innobead changed the title ~~[FEATURE]Use fsfreeze instead of sync before snapshot~~ [FEATURE] Use fsfreeze instead of sync before snapshot Apr 16, 2021

joshimoo mentioned this issue Oct 7, 2021

[BUG] fsfreeze race condition #3125

Closed

innobead reopened this Oct 8, 2021

innobead removed this from the v1.1.1 milestone Mar 3, 2022

innobead modified the milestones: v1.5.0, v1.6.0 Apr 13, 2023

innobead mentioned this issue Sep 12, 2023

[FEATURE] Introduce volume live upgrade including engine #6670

Open

innobead modified the milestones: v1.6.0, v1.7.0 Nov 29, 2023

innobead removed the require/integration-test-for-engine label Dec 23, 2023

innobead assigned ejweber and unassigned derekbit Feb 26, 2024

ejweber mentioned this issue Apr 17, 2024

[IMPROVEMENT] Potentially reduce the two minute iSCSI timeout for v1 volumes #8382

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Use `fsfreeze` instead of `sync` before snapshot #2187

[FEATURE] Use `fsfreeze` instead of `sync` before snapshot #2187

yasker commented Jan 15, 2021

longhorn-io-github-bot commented Mar 3, 2021 •

edited by joshimoo

innobead commented Mar 8, 2021

joshimoo commented Mar 9, 2021 •

edited

khushboo-rancher commented Mar 18, 2021

khushboo-rancher commented Mar 19, 2021

joshimoo commented Mar 19, 2021 •

edited

khushboo-rancher commented Mar 23, 2021

innobead commented Oct 8, 2021

ChipWolf commented Mar 3, 2022

derekbit commented Sep 11, 2023

derekbit commented Sep 11, 2023

ChipWolf commented Sep 11, 2023

derekbit commented Sep 12, 2023 •

edited

WebberHuang1118 commented Sep 12, 2023

innobead commented Sep 12, 2023 •

edited

derekbit commented Sep 12, 2023 •

edited

derekbit commented Sep 12, 2023

innobead commented Sep 12, 2023

Vicente-Cheng commented Sep 13, 2023 •

edited

WebberHuang1118 commented Sep 13, 2023 •

edited

derekbit commented Sep 13, 2023

innobead commented Sep 13, 2023

derekbit commented Sep 26, 2023 •

edited

WebberHuang1118 commented Sep 27, 2023

derekbit commented Sep 27, 2023

WebberHuang1118 commented Oct 3, 2023

ChipWolf commented Oct 26, 2023 •

edited

ejweber commented Mar 28, 2024

ejweber commented Apr 3, 2024

ejweber commented Apr 15, 2024

[FEATURE] Use fsfreeze instead of sync before snapshot #2187

[FEATURE] Use fsfreeze instead of sync before snapshot #2187

Comments

yasker commented Jan 15, 2021

longhorn-io-github-bot commented Mar 3, 2021 • edited by joshimoo

Pre-merged Checklist

innobead commented Mar 8, 2021

joshimoo commented Mar 9, 2021 • edited

khushboo-rancher commented Mar 18, 2021

khushboo-rancher commented Mar 19, 2021

joshimoo commented Mar 19, 2021 • edited

khushboo-rancher commented Mar 23, 2021

innobead commented Oct 8, 2021

ChipWolf commented Mar 3, 2022

derekbit commented Sep 11, 2023

derekbit commented Sep 11, 2023

ChipWolf commented Sep 11, 2023

derekbit commented Sep 12, 2023 • edited

WebberHuang1118 commented Sep 12, 2023

innobead commented Sep 12, 2023 • edited

derekbit commented Sep 12, 2023 • edited

derekbit commented Sep 12, 2023

innobead commented Sep 12, 2023

Vicente-Cheng commented Sep 13, 2023 • edited

WebberHuang1118 commented Sep 13, 2023 • edited

derekbit commented Sep 13, 2023

innobead commented Sep 13, 2023

derekbit commented Sep 26, 2023 • edited

WebberHuang1118 commented Sep 27, 2023

derekbit commented Sep 27, 2023

WebberHuang1118 commented Oct 3, 2023

ChipWolf commented Oct 26, 2023 • edited

ejweber commented Mar 28, 2024

Status update:

ejweber commented Apr 3, 2024

ejweber commented Apr 15, 2024

[FEATURE] Use `fsfreeze` instead of `sync` before snapshot #2187

[FEATURE] Use `fsfreeze` instead of `sync` before snapshot #2187

longhorn-io-github-bot commented Mar 3, 2021 •

edited by joshimoo

joshimoo commented Mar 9, 2021 •

edited

joshimoo commented Mar 19, 2021 •

edited

derekbit commented Sep 12, 2023 •

edited

innobead commented Sep 12, 2023 •

edited

derekbit commented Sep 12, 2023 •

edited

Vicente-Cheng commented Sep 13, 2023 •

edited

WebberHuang1118 commented Sep 13, 2023 •

edited

derekbit commented Sep 26, 2023 •

edited

ChipWolf commented Oct 26, 2023 •

edited