Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live merge over extend the active volume - fix requires downtime #188

Closed
nirs opened this issue May 18, 2022 · 3 comments · Fixed by #189
Closed

Live merge over extend the active volume - fix requires downtime #188

nirs opened this issue May 18, 2022 · 3 comments · Fixed by #189
Assignees
Labels
bug Issue is a bug or fix for a bug storage virt
Milestone

Comments

@nirs
Copy link
Member

nirs commented May 18, 2022

When extending the base volume before live merge, we use
a dumb calculation:

new_size = base_size + top_size + chunk_size

This calculation is correct only in the most extreme and practically
impossible case, when all clusters in top do not exist in base, and base
and top have no free space.

Practically, in a typical live merge there is only a small amount of data
in top, some of the clusters in top are already in base, and base has
lot of free space, so base does not need to be extended, or need a small
extension.

After the live merge is completed, the base becomes the active image,
so we cannot reduce it to the optimal size. Because top size is at least
one chunk, this extends base by at least 2 chunks for every live merge.

Base size after live merge:

# before after
1 2.5g 7.5g
2 7.5g 12.5g
3 12.5g 17.5g
4 17.5g 22.5g
5 22.5g 27.5g

The sad result is the the active volume grow on each live merge, until
it reaches the virtual size of the disk.

To reduce the active volume to the optimal size, user need to shut down
the VM, and invoke the reduce disk action in the API, or create a snapshot
and perform cold merge, since after cold merge we reduce the volume to
optimal size.

For internal volume, user can invoke the reduce volume API, but this is
not easy to do and the user do not have any indication that there is
a problem.

This is not a new issue - the problem exist since ovirt 3.5, adding live
snapshot support. But now this issue is much more important, because we
use active layer merge during vm backup, and we increased the chunk size
to 2.5g.

How to reproduce:

  1. Add 30 GiB empty disk to a running vm
  2. Repeat 7 times: create snapshot and delete it

Actual result:
Active volume size grows to maximum size (~33 GiB)

Expected result:
Active volume size does not change (~2.62 GiB)

@nirs nirs added the bug Issue is a bug or fix for a bug label May 18, 2022
@nirs nirs added this to the ovirt-4.5.1 milestone May 18, 2022
@nirs nirs self-assigned this May 18, 2022
nirs added a commit to nirs/vdsm that referenced this issue May 18, 2022
When extending the base volume before merge, we use a dumb calculation
extending the base volume by top_size + chunk_size. This allocate way
too much space which is typically not needed. For active layer merge,
there is no way to reduce the volume after the merge without shutting
down the VM. The result is growing the active volume on every merge,
until it consumes the maximum size.

Fix the issue by measuring the sub-chain from top to base before the
extend. This give the exact size needed to commit the top volume into
the base volume, including the size required for the bitmaps that may be
in the top and base volume.

In the case of active layer merge, this measurement is a heuristic,
since the guest can write data during the measurement, or later during
the merge. We add one chunk of free space to minimize the chance of
pausing a VM during merge. The only way to prevent pausing during merge
is to monitor base volume block threshold during the merge.  This was
not possible in the past, but can be done with current libvirt, but vdsm
thin provisioning code is not ready for this yet.

For internal merge, measuring is exact, and there is no need to leave
free space in the base volume since the top volume is read only.

Because we extend volumes using the current size and capacity, always
adding one chunk, the code is little ugly, reporting the required size
without free space. I think this can be improved by adding different API
to extend volumes to a known size.

Fixes: oVirt#188
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@ahadas
Copy link
Member

ahadas commented May 19, 2022

@nirs note that we already have a bug related to that: https://bugzilla.redhat.com/1993235

@nirs
Copy link
Member Author

nirs commented May 19, 2022

@nirs note that we already have a bug related to that: https://bugzilla.redhat.com/1993235

Snapshot growing to more than the virtual size is expected if the snapshot is full. But
it is good that we have another report of this issue.

nirs added a commit to nirs/vdsm that referenced this issue May 19, 2022
When extending the base volume before merge, we use a dumb calculation
extending the base volume by top_size + chunk_size. This allocate way
too much space which is typically not needed. For active layer merge,
there is no way to reduce the volume after the merge without shutting
down the VM. The result is growing the active volume on every merge,
until it consumes the maximum size.

Fix the issue by measuring the sub-chain from top to base before the
extend. This give the exact size needed to commit the top volume into
the base volume, including the size required for the bitmaps that may be
in the top and base volume.

In the case of active layer merge, this measurement is a heuristic,
since the guest can write data during the measurement, or later during
the merge. We add one chunk of free space to minimize the chance of
pausing a VM during merge. The only way to prevent pausing during merge
is to monitor base volume block threshold during the merge.  This was
not possible in the past, but can be done with current libvirt, but vdsm
thin provisioning code is not ready for this yet.

For internal merge, measuring is exact, and there is no need to leave
free space in the base volume since the top volume is read only.

Fixes: oVirt#188
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/vdsm that referenced this issue May 19, 2022
When extending the base volume before merge, we use a dumb calculation
extending the base volume by top_size + chunk_size. This allocate way
too much space which is typically not needed. For active layer merge,
there is no way to reduce the volume after the merge without shutting
down the VM. The result is growing the active volume on every merge,
until it consumes the maximum size.

Fix the issue by measuring the sub-chain from top to base before the
extend. This give the exact size needed to commit the top volume into
the base volume, including the size required for the bitmaps that may be
in the top and base volume.

In the case of active layer merge, this measurement is a heuristic,
since the guest can write data during the measurement, or later during
the merge. We add one chunk of free space to minimize the chance of
pausing a VM during merge. The only way to prevent pausing during merge
is to monitor base volume block threshold during the merge.  This was
not possible in the past, but can be done with current libvirt, but vdsm
thin provisioning code is not ready for this yet.

For internal merge, measuring is exact, and there is no need to leave
free space in the base volume since the top volume is read only.

Fixes: oVirt#188
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/vdsm that referenced this issue May 19, 2022
When extending the base volume before merge, we use a dumb calculation
extending the base volume by top_size + chunk_size. This allocate way
too much space which is typically not needed. For active layer merge,
there is no way to reduce the volume after the merge without shutting
down the VM. The result is growing the active volume on every merge,
until it consumes the maximum size.

Fix the issue by measuring the sub-chain from top to base before the
extend. This give the exact size needed to commit the top volume into
the base volume, including the size required for the bitmaps that may be
in the top and base volume.

In the case of active layer merge, this measurement is a heuristic,
since the guest can write data during the measurement, or later during
the merge. We add one chunk of free space to minimize the chance of
pausing a VM during merge. The only way to prevent pausing during merge
is to monitor base volume block threshold during the merge.  This was
not possible in the past, but can be done with current libvirt, but vdsm
thin provisioning code is not ready for this yet.

For internal merge, measuring is exact, and there is no need to leave
free space in the base volume since the top volume is read only.

Fixes: oVirt#188
Related-to: https://bugzilla.redhat.com/1993235
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@nirs
Copy link
Member Author

nirs commented May 19, 2022

Downstream bug https://bugzilla.redhat.com/1993235 was closed as not a bug, but
comment 16 shows how the fix to this
issue can improve this use case.

nirs added a commit to nirs/vdsm that referenced this issue May 26, 2022
When extending the base volume before merge, we use a dumb calculation
extending the base volume by top_size + chunk_size. This allocate way
too much space which is typically not needed. For active layer merge,
there is no way to reduce the volume after the merge without shutting
down the VM. The result is growing the active volume on every merge,
until it consumes the maximum size.

Fix the issue by measuring the sub-chain from top to base before the
extend. This give the exact size needed to commit the top volume into
the base volume, including the size required for the bitmaps that may be
in the top and base volume.

In the case of active layer merge, this measurement is a heuristic,
since the guest can write data during the measurement, or later during
the merge. We add one chunk of free space to minimize the chance of
pausing a VM during merge. The only way to prevent pausing during merge
is to monitor base volume block threshold during the merge.  This was
not possible in the past, but can be done with current libvirt, but vdsm
thin provisioning code is not ready for this yet.

For internal merge, measuring is exact, and there is no need to leave
free space in the base volume since the top volume is read only.

Fixes: oVirt#188
Related-to: https://bugzilla.redhat.com/1993235
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@nirs nirs closed this as completed in #189 May 26, 2022
nirs added a commit that referenced this issue May 26, 2022
When extending the base volume before merge, we use a dumb calculation
extending the base volume by top_size + chunk_size. This allocate way
too much space which is typically not needed. For active layer merge,
there is no way to reduce the volume after the merge without shutting
down the VM. The result is growing the active volume on every merge,
until it consumes the maximum size.

Fix the issue by measuring the sub-chain from top to base before the
extend. This give the exact size needed to commit the top volume into
the base volume, including the size required for the bitmaps that may be
in the top and base volume.

In the case of active layer merge, this measurement is a heuristic,
since the guest can write data during the measurement, or later during
the merge. We add one chunk of free space to minimize the chance of
pausing a VM during merge. The only way to prevent pausing during merge
is to monitor base volume block threshold during the merge.  This was
not possible in the past, but can be done with current libvirt, but vdsm
thin provisioning code is not ready for this yet.

For internal merge, measuring is exact, and there is no need to leave
free space in the base volume since the top volume is read only.

Fixes: #188
Related-to: https://bugzilla.redhat.com/1993235
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
erav pushed a commit to hbraha/vdsm that referenced this issue Jun 21, 2022
When extending the base volume before merge, we use a dumb calculation
extending the base volume by top_size + chunk_size. This allocate way
too much space which is typically not needed. For active layer merge,
there is no way to reduce the volume after the merge without shutting
down the VM. The result is growing the active volume on every merge,
until it consumes the maximum size.

Fix the issue by measuring the sub-chain from top to base before the
extend. This give the exact size needed to commit the top volume into
the base volume, including the size required for the bitmaps that may be
in the top and base volume.

In the case of active layer merge, this measurement is a heuristic,
since the guest can write data during the measurement, or later during
the merge. We add one chunk of free space to minimize the chance of
pausing a VM during merge. The only way to prevent pausing during merge
is to monitor base volume block threshold during the merge.  This was
not possible in the past, but can be done with current libvirt, but vdsm
thin provisioning code is not ready for this yet.

For internal merge, measuring is exact, and there is no need to leave
free space in the base volume since the top volume is read only.

Fixes: oVirt#188
Related-to: https://bugzilla.redhat.com/1993235
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is a bug or fix for a bug storage virt
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants