Live merge over extend the active volume - fix requires downtime #188

nirs · 2022-05-18T16:15:18Z

When extending the base volume before live merge, we use
a dumb calculation:

new_size = base_size + top_size + chunk_size

This calculation is correct only in the most extreme and practically
impossible case, when all clusters in top do not exist in base, and base
and top have no free space.

Practically, in a typical live merge there is only a small amount of data
in top, some of the clusters in top are already in base, and base has
lot of free space, so base does not need to be extended, or need a small
extension.

After the live merge is completed, the base becomes the active image,
so we cannot reduce it to the optimal size. Because top size is at least
one chunk, this extends base by at least 2 chunks for every live merge.

Base size after live merge:

#	before	after
1	2.5g	7.5g
2	7.5g	12.5g
3	12.5g	17.5g
4	17.5g	22.5g
5	22.5g	27.5g

The sad result is the the active volume grow on each live merge, until
it reaches the virtual size of the disk.

To reduce the active volume to the optimal size, user need to shut down
the VM, and invoke the reduce disk action in the API, or create a snapshot
and perform cold merge, since after cold merge we reduce the volume to
optimal size.

For internal volume, user can invoke the reduce volume API, but this is
not easy to do and the user do not have any indication that there is
a problem.

This is not a new issue - the problem exist since ovirt 3.5, adding live
snapshot support. But now this issue is much more important, because we
use active layer merge during vm backup, and we increased the chunk size
to 2.5g.

How to reproduce:

Add 30 GiB empty disk to a running vm
Repeat 7 times: create snapshot and delete it

Actual result:
Active volume size grows to maximum size (~33 GiB)

Expected result:
Active volume size does not change (~2.62 GiB)

The text was updated successfully, but these errors were encountered:

When extending the base volume before merge, we use a dumb calculation extending the base volume by top_size + chunk_size. This allocate way too much space which is typically not needed. For active layer merge, there is no way to reduce the volume after the merge without shutting down the VM. The result is growing the active volume on every merge, until it consumes the maximum size. Fix the issue by measuring the sub-chain from top to base before the extend. This give the exact size needed to commit the top volume into the base volume, including the size required for the bitmaps that may be in the top and base volume. In the case of active layer merge, this measurement is a heuristic, since the guest can write data during the measurement, or later during the merge. We add one chunk of free space to minimize the chance of pausing a VM during merge. The only way to prevent pausing during merge is to monitor base volume block threshold during the merge. This was not possible in the past, but can be done with current libvirt, but vdsm thin provisioning code is not ready for this yet. For internal merge, measuring is exact, and there is no need to leave free space in the base volume since the top volume is read only. Because we extend volumes using the current size and capacity, always adding one chunk, the code is little ugly, reporting the required size without free space. I think this can be improved by adding different API to extend volumes to a known size. Fixes: oVirt#188 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

ahadas · 2022-05-19T06:11:57Z

@nirs note that we already have a bug related to that: https://bugzilla.redhat.com/1993235

nirs · 2022-05-19T09:05:19Z

@nirs note that we already have a bug related to that: https://bugzilla.redhat.com/1993235

Snapshot growing to more than the virtual size is expected if the snapshot is full. But
it is good that we have another report of this issue.

When extending the base volume before merge, we use a dumb calculation extending the base volume by top_size + chunk_size. This allocate way too much space which is typically not needed. For active layer merge, there is no way to reduce the volume after the merge without shutting down the VM. The result is growing the active volume on every merge, until it consumes the maximum size. Fix the issue by measuring the sub-chain from top to base before the extend. This give the exact size needed to commit the top volume into the base volume, including the size required for the bitmaps that may be in the top and base volume. In the case of active layer merge, this measurement is a heuristic, since the guest can write data during the measurement, or later during the merge. We add one chunk of free space to minimize the chance of pausing a VM during merge. The only way to prevent pausing during merge is to monitor base volume block threshold during the merge. This was not possible in the past, but can be done with current libvirt, but vdsm thin provisioning code is not ready for this yet. For internal merge, measuring is exact, and there is no need to leave free space in the base volume since the top volume is read only. Fixes: oVirt#188 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When extending the base volume before merge, we use a dumb calculation extending the base volume by top_size + chunk_size. This allocate way too much space which is typically not needed. For active layer merge, there is no way to reduce the volume after the merge without shutting down the VM. The result is growing the active volume on every merge, until it consumes the maximum size. Fix the issue by measuring the sub-chain from top to base before the extend. This give the exact size needed to commit the top volume into the base volume, including the size required for the bitmaps that may be in the top and base volume. In the case of active layer merge, this measurement is a heuristic, since the guest can write data during the measurement, or later during the merge. We add one chunk of free space to minimize the chance of pausing a VM during merge. The only way to prevent pausing during merge is to monitor base volume block threshold during the merge. This was not possible in the past, but can be done with current libvirt, but vdsm thin provisioning code is not ready for this yet. For internal merge, measuring is exact, and there is no need to leave free space in the base volume since the top volume is read only. Fixes: oVirt#188 Related-to: https://bugzilla.redhat.com/1993235 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs · 2022-05-19T18:24:30Z

Downstream bug https://bugzilla.redhat.com/1993235 was closed as not a bug, but
comment 16 shows how the fix to this
issue can improve this use case.

When extending the base volume before merge, we use a dumb calculation extending the base volume by top_size + chunk_size. This allocate way too much space which is typically not needed. For active layer merge, there is no way to reduce the volume after the merge without shutting down the VM. The result is growing the active volume on every merge, until it consumes the maximum size. Fix the issue by measuring the sub-chain from top to base before the extend. This give the exact size needed to commit the top volume into the base volume, including the size required for the bitmaps that may be in the top and base volume. In the case of active layer merge, this measurement is a heuristic, since the guest can write data during the measurement, or later during the merge. We add one chunk of free space to minimize the chance of pausing a VM during merge. The only way to prevent pausing during merge is to monitor base volume block threshold during the merge. This was not possible in the past, but can be done with current libvirt, but vdsm thin provisioning code is not ready for this yet. For internal merge, measuring is exact, and there is no need to leave free space in the base volume since the top volume is read only. Fixes: oVirt#188 Related-to: https://bugzilla.redhat.com/1993235 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When extending the base volume before merge, we use a dumb calculation extending the base volume by top_size + chunk_size. This allocate way too much space which is typically not needed. For active layer merge, there is no way to reduce the volume after the merge without shutting down the VM. The result is growing the active volume on every merge, until it consumes the maximum size. Fix the issue by measuring the sub-chain from top to base before the extend. This give the exact size needed to commit the top volume into the base volume, including the size required for the bitmaps that may be in the top and base volume. In the case of active layer merge, this measurement is a heuristic, since the guest can write data during the measurement, or later during the merge. We add one chunk of free space to minimize the chance of pausing a VM during merge. The only way to prevent pausing during merge is to monitor base volume block threshold during the merge. This was not possible in the past, but can be done with current libvirt, but vdsm thin provisioning code is not ready for this yet. For internal merge, measuring is exact, and there is no need to leave free space in the base volume since the top volume is read only. Fixes: #188 Related-to: https://bugzilla.redhat.com/1993235 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When extending the base volume before merge, we use a dumb calculation extending the base volume by top_size + chunk_size. This allocate way too much space which is typically not needed. For active layer merge, there is no way to reduce the volume after the merge without shutting down the VM. The result is growing the active volume on every merge, until it consumes the maximum size. Fix the issue by measuring the sub-chain from top to base before the extend. This give the exact size needed to commit the top volume into the base volume, including the size required for the bitmaps that may be in the top and base volume. In the case of active layer merge, this measurement is a heuristic, since the guest can write data during the measurement, or later during the merge. We add one chunk of free space to minimize the chance of pausing a VM during merge. The only way to prevent pausing during merge is to monitor base volume block threshold during the merge. This was not possible in the past, but can be done with current libvirt, but vdsm thin provisioning code is not ready for this yet. For internal merge, measuring is exact, and there is no need to leave free space in the base volume since the top volume is read only. Fixes: oVirt#188 Related-to: https://bugzilla.redhat.com/1993235 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs added the bug Issue is a bug or fix for a bug label May 18, 2022

nirs added this to the ovirt-4.5.1 milestone May 18, 2022

nirs self-assigned this May 18, 2022

nirs added storage virt labels May 18, 2022

nirs mentioned this issue May 18, 2022

Fix over extend in live merge #189

Merged

nirs closed this as completed in #189 May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Live merge over extend the active volume - fix requires downtime #188

Live merge over extend the active volume - fix requires downtime #188

nirs commented May 18, 2022

ahadas commented May 19, 2022 •

edited by nirs

nirs commented May 19, 2022

nirs commented May 19, 2022

Live merge over extend the active volume - fix requires downtime #188

Live merge over extend the active volume - fix requires downtime #188

Comments

nirs commented May 18, 2022

ahadas commented May 19, 2022 • edited by nirs

nirs commented May 19, 2022

nirs commented May 19, 2022

ahadas commented May 19, 2022 •

edited by nirs