Eliminate the wait before extending a volume #124

nirs · 2022-04-11T18:13:37Z

When receiving a block threshold event or when pausing because of ENOSPC,
extend the drive as soon as possible.

This change decreases the wait before sending an extend request from 0.0-2.0
seconds to 10-30 milliseconds.

Here are test results from 4 runs, each writing 50 GiB to think disk at
~1300 MiB/s. Each run extends the disk 20 times. The VM was not paused
during the test.

time	min	avg	max
total	0.77	1.15	1.39
extend	0.55	0.92	1.14
refresh	0.16	0.22	0.31
wait	0.01	0.01	0.03

Fixes: #85

lib/vdsm/virt/thinp.py

lib/vdsm/virt/vmdevices/storage.py

lib/vdsm/virt/thinp.py

lib/vdsm/virt/periodic.py

doc/thin-provisioning.md

lib/vdsm/virt/periodic.py

lib/vdsm/virt/thinp.py

lib/vdsm/virt/vm.py

lib/vdsm/virt/vmdevices/storage.py

mz-pdm

doc/thin-provisioning.md should be updated.

lib/vdsm/virt/vmdevices/storage.py

lib/vdsm/virt/thinp.py

lib/vdsm/virt/periodic.py

lib/vdsm/virt/thinp.py

nirs · 2022-04-26T13:47:26Z

doc/thin-provisioning.md should be updated.

This document is stale and needs a rewrite. I will post another PR for this.

lib/vdsm/virt/thinp.py

tests/virt/thinp_monitor_test.py

mz-pdm

Not yet complete review, I'll continue tomorrow.

lib/vdsm/virt/thinp.py

nirs · 2022-05-04T21:43:28Z

/ost

lib/vdsm/common/config.py.in

mz-pdm

This version with per-device locks looks much better, good job. It looks overall good to me, just one possible concern and several typos and other documentation issues.

lib/vdsm/virt/vm.py

lib/vdsm/virt/thinp.py

lib/vdsm/virt/periodic.py

lib/vdsm/virt/thinp.py

nirs · 2022-05-09T23:25:47Z

/ost

nirs · 2022-05-11T10:55:01Z

@mz-pdm it will be nice to have this in this week build, to get more time for testing.

mz-pdm

(There is no build this week.)

nirs · 2022-05-11T12:26:09Z

(There is no build this week.)
Thanks for reviewing!

When monitoring drives, we have 2 expected threshold states: - UNSET: If the drive have enough free space, set a block threshold so we will get an event when the drive should be extended. If the drive does not have enough free space, mark it as EXCEEDED and start an extend flow. - EXCEEDED: Drive was marked as EXCEEDED after receiving a block threshold or enospc events, or we found that the drive does not have enough free space. We start an extend flow. Extract _handle_unset() and _handle_exceeded(), each handling one threshold state. _should_extend_volume() was split to 2 simpler helpers, _can_extend_drive() and _drive_needs_extend(). Signed-off-by: Nir Soffer <nsoffer@redhat.com>

The tests were using the old configuration (1 GiB chunk size, 50% utilization). This is correct, but make it hard to simulate some cases. Update to current configuration (2.5 GiB chunk size, 20% utilization) Signed-off-by: Nir Soffer <nsoffer@redhat.com>

The recent change to log a warning when exceeded drive cannot be extended since it was already extended to the maximum size revealed that we try to monitor and extend such drive every 2 seconds. Example failure flow with 20g drive: 1. Vdsm extends drive to 22g, and set threshold to 20g 2. Guest writes to offset 20g 3. Vdsm get block threshold event and mark drive as exceeded 4. Every 2 seconds, vdsm try to extend the drive and log a warning: WARN (periodic/0) [virt.vm] (vmId='d6eb4739-ccd3-4652-b6fc-7f4fd4a972ad') Drive already extended to maximum size ... Why we extend the drive to 22g if drive capacity is 20g? When using qcow2 format, writing 20g of guest data requires additional space for qcow2 metadata. Vdsm allows up to 1.1 * capacity (22g). This is not a new issue - it existed since we started to use block threshold few years ago. But since we did not log anything, the bug was hidden, consuming CPU cycles and increasing the chance for filling up the executor queue and delaying other tasks. To fix this issue, introduce a new threshold state: `DISABLED`. In this state the drive is not picked up for monitoring. If the drive is resized, we change the threshold state back to `UNSET` so the drive will be monitored again in the next monitor cycle. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

We want to extend drives as soon as possible when receiving events from libvirt. However we may receive multiple events at the same time, or the periodic monitor may wake up at the same time we want to handle an event. Running the monitor concurrently can lead to double extend messages and confusing logs. Handle multiple calls by locking the drive during monitoring. If another thread try to monitor a drive while the drive monitor lock is held, it waits up 0.5 seconds for the lock and skip the check on timeout. A new `thinp:monitor_timeout` option was added to control the timeout. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When a volume was refreshed, it was still marked as exceeded. If the periodic montior wakes up at this point, it can try to extend the drive again. This is unlikely event, but if it happens there is no way to fix it without shutting down the VM. Avoid the race by locking the drive monitor lock before the refresh, and releasing the lock after we set a new block threshold. If the monitor tries to access the drive at this point, it will wait until the completion callback is completed or skip the check on timeout. If we cannot acquire the drive monitor lock in 60.0 seconds to we abort the refresh to avoid blocking the mailbox thread pool. This may happen if the drive monitor lock is held by another monitor thread block on inaccessible storage. A new `thinp:refresh_timeout` option was added to control the timeout. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

To allow immediate extend when receiving a block threshold or enospc events, we need to handle the case when the monitor wakes up soon after we extended the drive, find that the drive is exceeded, and try to extend it again. This creates duplicate extend requests that should be harmless, but they make it hard to analyze extend stats. To avoid this issue we keep the time of the last extend in the drive. When the monitor handles an exceeded drive, it skips this drive if a previous extend was started recently, and the extend timeout did not expired. We need to handle also another case, when a drive was extended recently, but setting the block threshold failed, and it should be extended again. If the guest write quickly before the extend timeout expired, we want to extend the drive immediately. To allow immediate extend, add an "urgent" argument to _handle_exceeded(). It is set when we find that the drive needs to be extended in _handle_unset(). It will be also used when receiving a block threshold or enospc events. New `thinp:extend_timeout` option added to control the timeout. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

If the monitor is configured with an executor dispatch function, it dispatches a call to extend the drive when getting a block threshold or enospc events. The drive will be extended on the first available executor worker. Using a dispatch function keeps the thinp module decoupled from the periodic module, makes testing easy, and prepares for removing the periodic monitor and use thinp executor. The periodic monitor checks if VM is ready for commands before monitoring. Testing extending of 4 drives at the same time shows that this check returns False randomly for no reason. Since this cause a delay before extending the drive, we don't do this check when receiving an event. We also don't check migration status or use it when handling libvirt calls failures, since it should be needed for volume monitoring, disabled on migration destination VM. When extending a drive we hold the drive monitor lock, to avoid conflicts with the periodic monitor and extend completion threads. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Expose periodic.dispatch() function allowing immediate dispatching of calls on the periodic executor. This is useful when you want to handle libvirt events on the periodic executor. The first user of this facility is the thinp volume monitor. Now when we receive a block threshold or enospc events we use the periodic dispatch to extend the relevant drive immediately. This eliminates the 0-2 seconds wait after receiving an event. Here are test results from 4 runs, each writing 50 GiB to think disk at ~1300 MiB/s. Each run extends the disk 20 times. The VM was not paused during the test. | time | min | avg | max | |-------------|-------|-------|-------| | total | 0.77 | 1.15 | 1.39 | | extend | 0.55 | 0.92 | 1.14 | | refresh | 0.16 | 0.22 | 0.31 | | wait | 0.01 | 0.01 | 0.03 | Fixes: oVirt#85 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs added bug Issue is a bug or fix for a bug enhancement Enhancing the system by adding new feature or improving performance or reliability storage labels Apr 11, 2022

nirs requested review from bennyz, mz-pdm, ahadas and barpavel April 11, 2022 18:13

nirs requested review from almusil and tinez as code owners April 11, 2022 18:13

nirs marked this pull request as draft April 11, 2022 18:13

vjuranek reviewed Apr 11, 2022

View reviewed changes

barpavel reviewed Apr 11, 2022

View reviewed changes

nirs added this to the oVirt 4.5.1 milestone Apr 14, 2022

mz-pdm reviewed Apr 14, 2022

View reviewed changes

nirs force-pushed the thinp-events branch from 61fc887 to b0df24e Compare April 26, 2022 13:41

nirs requested review from mz-pdm, vjuranek and barpavel April 26, 2022 13:44

nirs marked this pull request as ready for review April 26, 2022 13:45

nirs requested a review from aesteve-rh April 26, 2022 14:20

aesteve-rh reviewed Apr 28, 2022

View reviewed changes

lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved

aesteve-rh reviewed Apr 28, 2022

View reviewed changes

tests/virt/thinp_monitor_test.py Show resolved Hide resolved

aesteve-rh previously approved these changes Apr 28, 2022

View reviewed changes

mz-pdm reviewed Apr 28, 2022

View reviewed changes

lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved

lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved

lib/vdsm/virt/thinp.py Show resolved Hide resolved

nirs self-assigned this Apr 28, 2022

nirs commented Apr 28, 2022

View reviewed changes

lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved

nirs dismissed aesteve-rh’s stale review via 3a4391f May 1, 2022 10:55

nirs force-pushed the thinp-events branch from b0df24e to 3a4391f Compare May 1, 2022 10:55

nirs mentioned this pull request May 1, 2022

Prepare for eliminating delay on extend #157

Merged

nirs requested review from mz-pdm and aesteve-rh May 4, 2022 21:43

aesteve-rh previously approved these changes May 5, 2022

View reviewed changes

lib/vdsm/common/config.py.in Show resolved Hide resolved

mz-pdm reviewed May 6, 2022

View reviewed changes

nirs dismissed aesteve-rh’s stale review via b7e4140 May 9, 2022 23:11

nirs force-pushed the thinp-events branch 2 times, most recently from b7e4140 to e6f62ab Compare May 9, 2022 23:17

nirs added the verified Change was tested; please describe how it was tested in the PR label May 9, 2022

nirs force-pushed the thinp-events branch from e6f62ab to 10276a8 Compare May 10, 2022 12:50

nirs requested review from mz-pdm and aesteve-rh May 10, 2022 13:28

aesteve-rh approved these changes May 10, 2022

View reviewed changes

vjuranek approved these changes May 10, 2022

View reviewed changes

mz-pdm approved these changes May 11, 2022

View reviewed changes

nirs added 8 commits May 11, 2022 15:26

nirs force-pushed the thinp-events branch from 10276a8 to 3c18ee9 Compare May 11, 2022 12:26

nirs mentioned this pull request May 11, 2022

Rewrite doc/thin-provisioning.md #173

Open

nirs merged commit 12a65be into oVirt:master May 11, 2022

nirs deleted the thinp-events branch May 11, 2022 13:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate the wait before extending a volume #124

Eliminate the wait before extending a volume #124

nirs commented Apr 11, 2022 •

edited

Loading

mz-pdm left a comment

nirs commented Apr 26, 2022

mz-pdm left a comment

nirs commented May 4, 2022

mz-pdm left a comment

nirs commented May 9, 2022

nirs commented May 11, 2022

mz-pdm left a comment

nirs commented May 11, 2022 •

edited

Loading

Eliminate the wait before extending a volume #124

Eliminate the wait before extending a volume #124

Conversation

nirs commented Apr 11, 2022 • edited Loading

mz-pdm left a comment

Choose a reason for hiding this comment

nirs commented Apr 26, 2022

mz-pdm left a comment

Choose a reason for hiding this comment

nirs commented May 4, 2022

mz-pdm left a comment

Choose a reason for hiding this comment

nirs commented May 9, 2022

nirs commented May 11, 2022

mz-pdm left a comment

Choose a reason for hiding this comment

nirs commented May 11, 2022 • edited Loading

nirs commented Apr 11, 2022 •

edited

Loading

nirs commented May 11, 2022 •

edited

Loading