Prepare for eliminating delay on extend #157

nirs · 2022-05-01T11:30:21Z

This is the first part from #124, preparing for eliminating the the delay
before extending drives (#85). The changes were reviewed in the last 3 weeks
and should be ready for merge.

Fixes:

Fix handling of block threshold events
Fix handling of enospc events
Fix the way we force drive to exceeded
Fix extending of exceeded volume, was delayed checking stale allocation

Improvements:

Improve handling of improbable allocation
Validate allocation only when needed
Unify logging

Refactoring:

Simplify the way we update block info for drives
Simplify monitor_volumes, removing unhelpful return value and logging.
Remove VM.monitor_volumes, minimizing coupling with vm.py
Introduce on_enospc event handler, unifying the way we handle libvirt events.
Report changes in Drive event handlers
Update stale documentation
Remove stale tests infrastructure

nirs · 2022-05-01T11:38:40Z

/ost

nirs · 2022-05-01T13:10:18Z

Tested extend flows:

Extending 50 GiB disk at 600 MiB/s
Live storage migration of 30 GiB disk while writing at 300 MiB/s

lib/vdsm/virt/thinp.py

On each monitor cycle, we need to get block stats from libvirt and updated the block info for every drive that needs monitoring. Based on the block info, we decide if the drive should be extended. Separate the block info update from the extend check. This makes the code easier to change and understand. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Previously we returned True if some drives were extended, and ENOSPC handler logged a message if some drives were extended. This log is not very useful. Simplify the interface by not returning anything. This will allow handling the monitoring request in the executor thread pool, and it simplifies the monitoring code. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When a vm pauses because of I/O error, we need to mark the drive for extension. Look up the drive early and pass it to the helper for sending event, instead of looking up the drive in the helper. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When a VM is paused because of ENOSPC, call VolumeMonitor.on_enospc() instead of VolumeMonitor.monitor_volume(). This will make it easier to unify the way we handle block threshold events and I/O errors. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

VM.monitor_volumes() was called only from the periodic job, which already access vm.volume_monitor for checking if monitoring is needed. Remove the unneeded method and change the periodic job and the tests to access vm.volume_monitor.monitor_volumes(). Signed-off-by: Nir Soffer <nsoffer@redhat.com>

We always use block threshold events. When the flag to disable block threshold event was removed, the comment was not updated. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Previously calling the event handler twice would reset the exceeded time and log bogus message. We should reset the exceeded time and log only when setting the block threshold to exceeded on the first time. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Previously we called monitor_volumes(), but this does nothing unless a drive block threshold is UNSET or EXCEEDED. If the guest is writing fast, it may exceed the block threshold before we set it, and this case we will never get a block threshold event from libvirt. This can lead to paused VM with full disk that will never be extended. Since the disk is full, resuming the VM fails immediately with ENOSPC. But since block threshold was already exceeded, qemu does not submit a block threshold event. The only way to recover is to shutdown the VM. Fixed by unifying the way we handle ENOSPC and block threshold event; We mark a drive block threshold as EXCEEDED in both cases. The periodic monitor will pick up the drive in the next monitoring cycle. Since on_enospc() cannot handle the case of ENOSPC without a drive, we call it only when know the drive. Fixing monitor_volumes() to handle this case is possible but require a big change, and I don't think this case actually happens. This change have nice side effects: - Avoid the issue of blocking the libvirt event loop when calling monitor_volumes() and storage APIs block. - Avoid noise in the logs when libvirt issue multiple ENOSPC events for the same drive. We log changes in drive block threshold only once. This change introduces up to 2 seconds delay when resuming a paused VM. However, since the VM already paused, the delay is not critical. This delay will be eliminated when we eliminate the delay when receiving block threshold event. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When finding that a drive need extension we force drive threshold state to exceeded to make sure we continue to monitor this drive until it is extended. This is good, but the way this was implemented is wrong, clearing the exceeded time. Use Drive.on_block_threshold() which does the right thing now. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When libvirt report that the threshold was exceeded: 2022-04-09 23:30:01,179+0300 INFO (libvirt/events) [virt.vm] (vmId='724d41da-0b01-4d58-8a73-113906da1565') Block threshold 3221225472 exceeded by 196608 for drive 'sdb[1]' querying block status may show that allocation is smaller than the threshold: 2022-04-09 23:30:01,186+0300 DEBUG (periodic/1) [virt.vm] (vmId='724d41da-0b01-4d58-8a73-113906da1565') Extension info for drive sdb volume e1b76177-d925-497d-93ff-05ccb980f32a: BlockInfo(index=1, name='sdb', allocation=3213557760, capacity=53687091200, physical=5368709120, threshold=0) This happens when trying to extend a disk immediately after receiving a block threshold event. Change the check to always extend drives when threshold state is EXCEEDED without checking the amount of free space. Since we always extend, we don't need the workaround for libvirt bug when allocation is not reported. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When handling block threshold or enospc events, return True if the event change the drive threshold state to EXCEEDED. This can be used by callers to detect when a drive should be extended immediately. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Move comments from the docstring into the function to keep the comments closer to the code they explain. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Because we always mark a drive for extension, there is no point to log the threshold state. Unify the text "Requesting an extension" to match similar logs about extending the volume and the replica. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

FakeVM block_stats and query_block_stats are not used now. They were left by mistake when tests using them were removed. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

If we detect an improbable allocation value, we paused the VM and raise. Improve this implementation to make it easier to work with. - Rename the exception to describe the issue better - Remove the error log since we raise. The top level error handler will log a traceback with this error. - Don't handle this error in monitor_volumes(), so the error will propagate to the top level error handler. - Use fstring to format the exception instead of fragile %. - Simplify the exception message, all the important details are already included in block_info named tuple. - Add the missing test to make sure this behavior is not modified by mistake. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

We use the allocation reported by libvirt only in when checking drives before getting a block threshold event, so we better check the allocation only when we we need it. Getting improbable allocation in other cases does not affect anything. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs added enhancement Enhancing the system by adding new feature or improving performance or reliability cleanup Code change keeping current behavior storage virt labels May 1, 2022

nirs added this to the ovirt-4.5.1 milestone May 1, 2022

nirs requested review from bennyz and aesteve-rh May 1, 2022 11:30

nirs requested review from mz-pdm, almusil and tinez as code owners May 1, 2022 11:30

nirs mentioned this pull request May 1, 2022

Eliminate the wait before extending a volume #124

Merged

nirs added the verified Change was tested; please describe how it was tested in the PR label May 1, 2022

aesteve-rh previously approved these changes May 2, 2022

View reviewed changes

vjuranek reviewed May 2, 2022

View reviewed changes

lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved

lib/vdsm/virt/thinp.py Show resolved Hide resolved

nirs dismissed aesteve-rh’s stale review via 20c8ba0 May 2, 2022 16:42

nirs force-pushed the thinp-events-pr1 branch from 0e073c3 to 20c8ba0 Compare May 2, 2022 16:42

nirs requested a review from vjuranek May 2, 2022 16:51

nirs added 11 commits May 2, 2022 21:15

vm: Lookup drive earlier in onIOError

f089bbf

When a vm pauses because of I/O error, we need to mark the drive for extension. Look up the drive early and pass it to the helper for sending event, instead of looking up the drive in the helper. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

thinp: Add on_enospc() handler

4706bbd

When a VM is paused because of ENOSPC, call VolumeMonitor.on_enospc() instead of VolumeMonitor.monitor_volume(). This will make it easier to unify the way we handle block threshold events and I/O errors. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

thinp: Update stale comment

0bfd8c0

We always use block threshold events. When the flag to disable block threshold event was removed, the comment was not updated. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs added 5 commits May 2, 2022 21:15

thinp: Inline docstring comments

7d52fca

Move comments from the docstring into the function to keep the comments closer to the code they explain. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

thinp: Unify log when extending

9aa3227

Because we always mark a drive for extension, there is no point to log the threshold state. Unify the text "Requesting an extension" to match similar logs about extending the volume and the replica. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

tests: Remove unneeded fake implementation

115775d

FakeVM block_stats and query_block_stats are not used now. They were left by mistake when tests using them were removed. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs force-pushed the thinp-events-pr1 branch from 20c8ba0 to b0f8a8f Compare May 2, 2022 18:16

vjuranek approved these changes May 3, 2022

View reviewed changes

nirs assigned vjuranek, aesteve-rh and nirs and unassigned vjuranek and aesteve-rh May 3, 2022

aesteve-rh approved these changes May 3, 2022

View reviewed changes

bennyz approved these changes May 4, 2022

View reviewed changes

mz-pdm approved these changes May 4, 2022

View reviewed changes

nirs merged commit 0136927 into oVirt:master May 4, 2022

nirs deleted the thinp-events-pr1 branch May 4, 2022 19:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare for eliminating delay on extend #157

Prepare for eliminating delay on extend #157

nirs commented May 1, 2022

nirs commented May 1, 2022

nirs commented May 1, 2022

Prepare for eliminating delay on extend #157

Prepare for eliminating delay on extend #157

Conversation

nirs commented May 1, 2022

nirs commented May 1, 2022

nirs commented May 1, 2022