Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare for eliminating delay on extend #157

Merged
merged 16 commits into from
May 4, 2022
Merged

Conversation

nirs
Copy link
Member

@nirs nirs commented May 1, 2022

This is the first part from #124, preparing for eliminating the the delay
before extending drives (#85). The changes were reviewed in the last 3 weeks
and should be ready for merge.

Fixes:

  • Fix handling of block threshold events
  • Fix handling of enospc events
  • Fix the way we force drive to exceeded
  • Fix extending of exceeded volume, was delayed checking stale allocation

Improvements:

  • Improve handling of improbable allocation
  • Validate allocation only when needed
  • Unify logging

Refactoring:

  • Simplify the way we update block info for drives
  • Simplify monitor_volumes, removing unhelpful return value and logging.
  • Remove VM.monitor_volumes, minimizing coupling with vm.py
  • Introduce on_enospc event handler, unifying the way we handle libvirt events.
  • Report changes in Drive event handlers
  • Update stale documentation
  • Remove stale tests infrastructure

@nirs nirs added enhancement Enhancing the system by adding new feature or improving performance or reliability cleanup Code change keeping current behavior storage virt labels May 1, 2022
@nirs nirs added this to the ovirt-4.5.1 milestone May 1, 2022
@nirs nirs requested review from bennyz and aesteve-rh May 1, 2022 11:30
@nirs nirs requested review from mz-pdm, almusil and tinez as code owners May 1, 2022 11:30
@nirs
Copy link
Member Author

nirs commented May 1, 2022

/ost

@nirs nirs added the verified Change was tested; please describe how it was tested in the PR label May 1, 2022
@nirs
Copy link
Member Author

nirs commented May 1, 2022

Tested extend flows:

  • Extending 50 GiB disk at 600 MiB/s
  • Live storage migration of 30 GiB disk while writing at 300 MiB/s

aesteve-rh
aesteve-rh previously approved these changes May 2, 2022
lib/vdsm/virt/thinp.py Outdated Show resolved Hide resolved
lib/vdsm/virt/thinp.py Show resolved Hide resolved
nirs added 11 commits May 2, 2022 21:15
On each monitor cycle, we need to get block stats from libvirt and
updated the block info for every drive that needs monitoring. Based on
the block info, we decide if the drive should be extended.

Separate the block info update from the extend check. This makes the
code easier to change and understand.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Previously we returned True if some drives were extended, and ENOSPC
handler logged a message if some drives were extended. This log is not
very useful.

Simplify the interface by not returning anything. This will allow
handling the monitoring request in the executor thread pool, and it
simplifies the monitoring code.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
When a vm pauses because of I/O error, we need to mark the drive for
extension. Look up the drive early and pass it to the helper for sending
event, instead of looking up the drive in the helper.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
When a VM is paused because of ENOSPC, call VolumeMonitor.on_enospc()
instead of VolumeMonitor.monitor_volume(). This will make it easier
to unify the way we handle block threshold events and I/O errors.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
VM.monitor_volumes() was called only from the periodic job, which
already access vm.volume_monitor for checking if monitoring is needed.
Remove the unneeded method and change the periodic job and the tests to
access vm.volume_monitor.monitor_volumes().

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We always use block threshold events. When the flag to disable block
threshold event was removed, the comment was not updated.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Previously calling the event handler twice would reset the exceeded
time and log bogus message. We should reset the exceeded time and log
only when setting the block threshold to exceeded on the first time.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Previously we called monitor_volumes(), but this does nothing unless a
drive block threshold is UNSET or EXCEEDED. If the guest is writing
fast, it may exceed the block threshold before we set it, and this case
we will never get a block threshold event from libvirt.

This can lead to paused VM with full disk that will never be extended.
Since the disk is full, resuming the VM fails immediately with ENOSPC.
But since block threshold was already exceeded, qemu does not submit a
block threshold event. The only way to recover is to shutdown the VM.

Fixed by unifying the way we handle ENOSPC and block threshold event; We
mark a drive block threshold as EXCEEDED in both cases. The periodic
monitor will pick up the drive in the next monitoring cycle.

Since on_enospc() cannot handle the case of ENOSPC without a drive, we
call it only when know the drive. Fixing monitor_volumes() to handle
this case is possible but require a big change, and I don't think this
case actually happens.

This change have nice side effects:

- Avoid the issue of blocking the libvirt event loop when calling
  monitor_volumes() and storage APIs block.

- Avoid noise in the logs when libvirt issue multiple ENOSPC events for
  the same drive. We log changes in drive block threshold only once.

This change introduces up to 2 seconds delay when resuming a paused VM.
However, since the VM already paused, the delay is not critical. This
delay will be eliminated when we eliminate the delay when receiving
block threshold event.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
When finding that a drive need extension we force drive threshold state
to exceeded to make sure we continue to monitor this drive until it is
extended. This is good, but the way this was implemented is wrong,
clearing the exceeded time. Use Drive.on_block_threshold() which does
the right thing now.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
When libvirt report that the threshold was exceeded:

    2022-04-09 23:30:01,179+0300 INFO  (libvirt/events) [virt.vm]
    (vmId='724d41da-0b01-4d58-8a73-113906da1565') Block threshold
    3221225472 exceeded by 196608 for drive 'sdb[1]'

querying block status may show that allocation is smaller than the
threshold:

    2022-04-09 23:30:01,186+0300 DEBUG (periodic/1) [virt.vm]
    (vmId='724d41da-0b01-4d58-8a73-113906da1565') Extension info for
    drive sdb volume e1b76177-d925-497d-93ff-05ccb980f32a:
    BlockInfo(index=1, name='sdb', allocation=3213557760,
    capacity=53687091200, physical=5368709120, threshold=0)

This happens when trying to extend a disk immediately after receiving a
block threshold event.

Change the check to always extend drives when threshold state is
EXCEEDED without checking the amount of free space.

Since we always extend, we don't need the workaround for libvirt bug
when allocation is not reported.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
When handling block threshold or enospc events, return True if the event
change the drive threshold state to EXCEEDED. This can be used by
callers to detect when a drive should be extended immediately.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added 5 commits May 2, 2022 21:15
Move comments from the docstring into the function to keep the comments
closer to the code they explain.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Because we always mark a drive for extension, there is no point to log
the threshold state. Unify the text "Requesting an extension" to match
similar logs about extending the volume and the replica.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
FakeVM block_stats and query_block_stats are not used now. They were
left by mistake when tests using them were removed.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
If we detect an improbable allocation value, we paused the VM and raise.
Improve this implementation to make it easier to work with.

- Rename the exception to describe the issue better
- Remove the error log since we raise. The top level error handler will
  log a traceback with this error.
- Don't handle this error in monitor_volumes(), so the error will
  propagate to the top level error handler.
- Use fstring to format the exception instead of fragile %.
- Simplify the exception message, all the important details are already
  included in block_info named tuple.
- Add the missing test to make sure this behavior is not modified by
  mistake.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We use the allocation reported by libvirt only in when checking drives
before getting a block threshold event, so we better check the
allocation only when we we need it. Getting improbable allocation in
other cases does not affect anything.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@nirs nirs assigned vjuranek, aesteve-rh and nirs and unassigned vjuranek and aesteve-rh May 3, 2022
@nirs nirs merged commit 0136927 into oVirt:master May 4, 2022
@nirs nirs deleted the thinp-events-pr1 branch May 4, 2022 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Code change keeping current behavior enhancement Enhancing the system by adding new feature or improving performance or reliability storage verified Change was tested; please describe how it was tested in the PR virt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants