Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate the wait before extending a volume #124

Merged
merged 8 commits into from
May 11, 2022
Merged

Conversation

nirs
Copy link
Member

@nirs nirs commented Apr 11, 2022

When receiving a block threshold event or when pausing because of ENOSPC,
extend the drive as soon as possible.

This change decreases the wait before sending an extend request from 0.0-2.0
seconds to 10-30 milliseconds.

Here are test results from 4 runs, each writing 50 GiB to think disk at
~1300 MiB/s. Each run extends the disk 20 times. The VM was not paused
during the test.

time min avg max
total 0.77 1.15 1.39
extend 0.55 0.92 1.14
refresh 0.16 0.22 0.31
wait 0.01 0.01 0.03

Fixes: #85

@nirs nirs added bug Issue is a bug or fix for a bug enhancement Enhancing the system by adding new feature or improving performance or reliability storage labels Apr 11, 2022
@nirs nirs requested review from almusil and tinez as code owners April 11, 2022 18:13
@nirs nirs marked this pull request as draft April 11, 2022 18:13
lib/vdsm/virt/thinp.py Show resolved Hide resolved
lib/vdsm/virt/vmdevices/storage.py Outdated Show resolved Hide resolved
lib/vdsm/virt/vmdevices/storage.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
lib/vdsm/virt/periodic.py Outdated Show resolved Hide resolved
doc/thin-provisioning.md Outdated Show resolved Hide resolved
lib/vdsm/virt/periodic.py Outdated Show resolved Hide resolved
lib/vdsm/virt/periodic.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
lib/vdsm/virt/vm.py Outdated Show resolved Hide resolved
lib/vdsm/virt/vmdevices/storage.py Outdated Show resolved Hide resolved
lib/vdsm/virt/vmdevices/storage.py Outdated Show resolved Hide resolved
lib/vdsm/virt/vmdevices/storage.py Outdated Show resolved Hide resolved
lib/vdsm/virt/vmdevices/storage.py Outdated Show resolved Hide resolved
@nirs nirs added this to the oVirt 4.5.1 milestone Apr 14, 2022
Copy link
Member

@mz-pdm mz-pdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc/thin-provisioning.md should be updated.

lib/vdsm/virt/vmdevices/storage.py Outdated Show resolved Hide resolved
lib/vdsm/virt/vmdevices/storage.py Outdated Show resolved Hide resolved
lib/vdsm/virt/vmdevices/storage.py Outdated Show resolved Hide resolved
lib/vdsm/virt/vmdevices/storage.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
lib/vdsm/virt/periodic.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
@nirs nirs marked this pull request as ready for review April 26, 2022 13:45
@nirs
Copy link
Member Author

nirs commented Apr 26, 2022

doc/thin-provisioning.md should be updated.

This document is stale and needs a rewrite. I will post another PR for this.

@nirs nirs requested a review from aesteve-rh April 26, 2022 14:20
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
aesteve-rh
aesteve-rh previously approved these changes Apr 28, 2022
Copy link
Member

@mz-pdm mz-pdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet complete review, I'll continue tomorrow.

lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Show resolved Hide resolved
@nirs nirs self-assigned this Apr 28, 2022
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
@nirs
Copy link
Member Author

nirs commented May 4, 2022

/ost

@nirs nirs requested review from mz-pdm and aesteve-rh May 4, 2022 21:43
aesteve-rh
aesteve-rh previously approved these changes May 5, 2022
lib/vdsm/common/config.py.in Show resolved Hide resolved
Copy link
Member

@mz-pdm mz-pdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version with per-device locks looks much better, good job. It looks overall good to me, just one possible concern and several typos and other documentation issues.

lib/vdsm/virt/vm.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Show resolved Hide resolved
lib/vdsm/virt/periodic.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Show resolved Hide resolved
@nirs nirs force-pushed the thinp-events branch 2 times, most recently from b7e4140 to e6f62ab Compare May 9, 2022 23:17
@nirs nirs added the verified Change was tested; please describe how it was tested in the PR label May 9, 2022
@nirs
Copy link
Member Author

nirs commented May 9, 2022

/ost

@nirs
Copy link
Member Author

nirs commented May 11, 2022

@mz-pdm it will be nice to have this in this week build, to get more time for testing.

Copy link
Member

@mz-pdm mz-pdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There is no build this week.)

@nirs
Copy link
Member Author

nirs commented May 11, 2022

(There is no build this week.)
Thanks for reviewing!

nirs added 8 commits May 11, 2022 15:26
When monitoring drives, we have 2 expected threshold states:

- UNSET: If the drive have enough free space, set a block threshold so
  we will get an event when the drive should be extended. If the drive
  does not have enough free space, mark it as EXCEEDED and start an
  extend flow.

- EXCEEDED: Drive was marked as EXCEEDED after receiving a block
  threshold or enospc events, or we found that the drive does not have
  enough free space. We start an extend flow.

Extract _handle_unset() and _handle_exceeded(), each handling one
threshold state. _should_extend_volume() was split to 2 simpler helpers,
_can_extend_drive() and _drive_needs_extend().

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
The tests were using the old configuration (1 GiB chunk size, 50%
utilization). This is correct, but make it hard to simulate some cases.
Update to current configuration (2.5 GiB chunk size, 20% utilization)

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
The recent change to log a warning when exceeded drive cannot be
extended since it was already extended to the maximum size revealed that
we try to monitor and extend such drive every 2 seconds.

Example failure flow with 20g drive:

1. Vdsm extends drive to 22g, and set threshold to 20g
2. Guest writes to offset 20g
3. Vdsm get block threshold event and mark drive as exceeded
4. Every 2 seconds, vdsm try to extend the drive and log a warning:

    WARN  (periodic/0) [virt.vm] (vmId='d6eb4739-ccd3-4652-b6fc-7f4fd4a972ad')
    Drive already extended to maximum size ...

Why we extend the drive to 22g if drive capacity is 20g?
When using qcow2 format, writing 20g of guest data requires additional
space for qcow2 metadata. Vdsm allows up to 1.1 * capacity (22g).

This is not a new issue - it existed since we started to use block
threshold few years ago. But since we did not log anything, the bug was
hidden, consuming CPU cycles and increasing the chance for filling up
the executor queue and delaying other tasks.

To fix this issue, introduce a new threshold state: `DISABLED`. In this
state the drive is not picked up for monitoring. If the drive is
resized, we change the threshold state back to `UNSET` so the drive will
be monitored again in the next monitor cycle.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We want to extend drives as soon as possible when receiving events from
libvirt. However we may receive multiple events at the same time, or the
periodic monitor may wake up at the same time we want to handle an
event. Running the monitor concurrently can lead to double extend
messages and confusing logs.

Handle multiple calls by locking the drive during monitoring.  If
another thread try to monitor a drive while the drive monitor lock is
held, it waits up 0.5 seconds for the lock and skip the check on
timeout.

A new `thinp:monitor_timeout` option was added to control the timeout.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
When a volume was refreshed, it was still marked as exceeded. If the
periodic montior wakes up at this point, it can try to extend the drive
again. This is unlikely event, but if it happens there is no way to fix
it without shutting down the VM.

Avoid the race by locking the drive monitor lock before the refresh, and
releasing the lock after we set a new block threshold. If the monitor
tries to access the drive at this point, it will wait until the
completion callback is completed or skip the check on timeout.

If we cannot acquire the drive monitor lock in 60.0 seconds to we abort
the refresh to avoid blocking the mailbox thread pool. This may happen
if the drive monitor lock is held by another monitor thread block on
inaccessible storage.

A new `thinp:refresh_timeout` option was added to control the timeout.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
To allow immediate extend when receiving a block threshold or enospc
events, we need to handle the case when the monitor wakes up soon after
we extended the drive, find that the drive is exceeded, and try to
extend it again. This creates duplicate extend requests that should be
harmless, but they make it hard to analyze extend stats.

To avoid this issue we keep the time of the last extend in the drive.
When the monitor handles an exceeded drive, it skips this drive if a
previous extend was started recently, and the extend timeout did not
expired.

We need to handle also another case, when a drive was extended recently,
but setting the block threshold failed, and it should be extended again.
If the guest write quickly before the extend timeout expired, we want to
extend the drive immediately.

To allow immediate extend, add an "urgent" argument to
_handle_exceeded(). It is set when we find that the drive needs to be
extended in _handle_unset(). It will be also used when receiving a block
threshold or enospc events.

New `thinp:extend_timeout` option added to control the timeout.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
If the monitor is configured with an executor dispatch function, it
dispatches a call to extend the drive when getting a block threshold or
enospc events. The drive will be extended on the first available
executor worker.

Using a dispatch function keeps the thinp module decoupled from the
periodic module, makes testing easy, and prepares for removing the
periodic monitor and use thinp executor.

The periodic monitor checks if VM is ready for commands before
monitoring. Testing extending of 4 drives at the same time shows that
this check returns False randomly for no reason. Since this cause a
delay before extending the drive, we don't do this check when receiving
an event.

We also don't check migration status or use it when handling libvirt
calls failures, since it should be needed for volume monitoring,
disabled on migration destination VM.

When extending a drive we hold the drive monitor lock, to avoid
conflicts with the periodic monitor and extend completion threads.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Expose periodic.dispatch() function allowing immediate dispatching of
calls on the periodic executor. This is useful when you want to handle
libvirt events on the periodic executor.

The first user of this facility is the thinp volume monitor. Now when we
receive a block threshold or enospc events we use the periodic dispatch
to extend the relevant drive immediately. This eliminates the 0-2
seconds wait after receiving an event.

Here are test results from 4 runs, each writing 50 GiB to think disk at
~1300 MiB/s. Each run extends the disk 20 times. The VM was not paused
during the test.

| time        |  min  |  avg  |  max  |
|-------------|-------|-------|-------|
| total       |  0.77 |  1.15 |  1.39 |
| extend      |  0.55 |  0.92 |  1.14 |
| refresh     |  0.16 |  0.22 |  0.31 |
| wait        |  0.01 |  0.01 |  0.03 |

Fixes: oVirt#85
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@nirs nirs merged commit 12a65be into oVirt:master May 11, 2022
@nirs nirs deleted the thinp-events branch May 11, 2022 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is a bug or fix for a bug enhancement Enhancing the system by adding new feature or improving performance or reliability storage verified Change was tested; please describe how it was tested in the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Minimize wait after receiving block threshold event
5 participants