Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare for monitoring scratch disks #58

Merged
merged 16 commits into from
Feb 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions CODEOWNERS
Validating CODEOWNERS rules …
Expand Up @@ -12,10 +12,10 @@ tests/storage/ @nirs @vjuranek
lib/dnf-plugins @mz-pdm
lib/vdsm/virt/ @mz-pdm
lib/vdsm/virt/backup.py @nirs
lib/vdsm/virt/drivemonitor.py @nirs @vjuranek
lib/vdsm/virt/livemerge.py @nirs @vjuranek
lib/vdsm/virt/periodic.py @mz-pdm @nirs
lib/vdsm/virt/secret.py @nirs @vjuranek
lib/vdsm/virt/thinp.py @nirs @vjuranek
lib/vdsm/virt/vm.py @mz-pdm @nirs @vjuranek
lib/vdsm/virt/vmdevices/lease.py @nirs @vjuranek @mz-pdm
lib/vdsm/virt/vmdevices/storage.py @nirs @vjuranek @mz-pdm
Expand All @@ -24,9 +24,9 @@ tests/virt/backup_test.py @nirs
tests/virt/cd_test.py @vjuranek
tests/virt/diskreplicate_test.py @nirs
tests/virt/drive_extension_test.py @nirs
tests/virt/drivemonitor_test.py @nirs
tests/virt/fakedomainadapter.py @nirs
tests/virt/livemerge_test.py @nirs
tests/virt/thinp_test.py @nirs
tests/virt/vmlease_test.py @nirs @mz-pdm
tests/virt/vmsecret_test.py @nirs
tests/virt/vmstorage_test.py @nirs @mz-pdm
Expand Down
28 changes: 14 additions & 14 deletions doc/thin-provisioning.md
Expand Up @@ -12,7 +12,7 @@ implemented with block based storage.
Vdsm monitors thin provisioned drives or drives being replicated to thin
provisioned drives periodically. During startup, DriveWatermarkMonitor
is created and scheduled with the periodic executor to run
VM.monitor_drives every 2 seconds (configurable) on all VMs.
VM.monitor_volumes every 2 seconds (configurable) on all VMs.

For each VM, we fetch the drives that should be monitored. We have 2
cases:
Expand All @@ -39,7 +39,7 @@ failed, a VM may try to write behind the current disk size. In this case
qemu will pause the VM and we get a libvirt
VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON event with ```ENOSPC``` reason.

When receiving such event, we call VM.monitor_drives() on the paused VM.
When receiving such event, we call VM.monitor_volumes() on the paused VM.
We are likely to find that one or more drives are too full, and trigger
an extend of the drives.

Expand Down Expand Up @@ -96,7 +96,7 @@ because of many reasons:
- Storage may not be available when trying to refresh a logical volume

Because the system is based on periodic monitoring, the operation will
be retried on the next drive monitoring internal.
be retried on the next volume monitoring internal.


### Pre-extension and post-reduce
Expand Down Expand Up @@ -131,14 +131,14 @@ These configuration options control thin provisioning:

## Implementation in Vdsm 4.20.3 and onwards (oVirt >= 4.2)

We want to optimize drive monitoring using the "BLOCK_THRESHOLD" event
We want to optimize volume monitoring using the "BLOCK_THRESHOLD" event
provided by libvirt >= 3.2. Instead of checking periodically if a drive
should be extended, we will mark a drive for extension when receiving a
libvirt block threshold event.

Libvirt cannot yet deliver events for all the flows and the storage
configurations oVirt supports. Please check the documentation of
the DriveMonitor.monitored_drives method in the drivemonitor.py module
the VolumeMonitor.monitored_volumes method in the thinp.py module
to learn when Vdsm can use events, and when Vdsm must keep polling the
drives.

Expand All @@ -161,7 +161,7 @@ drive. For example, with default configuration, if the drive allocation
is 3g, the threshold will be 2.5g.

If setting block threshold for a drive failed, the system should retry
the operation on the next drive monitor cycle.
the operation on the next volume monitor cycle.


### Handling block threshold events
Expand Down Expand Up @@ -191,7 +191,7 @@ In some cases (e.g. live merge pivot), we need to temporarily stop
monitoring the drives. We start monitoring back the drives as soon as
possible.

If we receive a block threshold event while drive monitoring is disabled
If we receive a block threshold event while volume monitoring is disabled
for monitoring, we mark the drive for extension as usual, but the
extension request will not be handled until monitoring is enabled for
this drive.
Expand All @@ -202,7 +202,7 @@ this drive.
- Set the block thresholds when starting or recovering the VM
- When we get the threshold event, we mark the related drive for
extension
- Up to 2 seconds (configurable) later, the periodic drive monitoring
- Up to 2 seconds (configurable) later, the periodic volume monitoring
will trigger the extension flow
- When the extension flow ends, set a new block threshold in libvirt
for the extended drive
Expand Down Expand Up @@ -241,7 +241,7 @@ this drive.
during LSM.
- Keep the current flow with no changes.
- Events received during LSM will mark a drive for extension, but the
drive monitor ignores this state during LSM, since it must check the
volume monitor ignores this state during LSM, since it must check the
drive and/or the replica explicitly.
- When LSM is completed:
- If the new drive is chunked and the source drive was marked for
Expand All @@ -255,7 +255,7 @@ this drive.
- If the new drive is not chunked, and the drive was marked for
extension, clear the threshold, as it is not relevant any more.
- If LSM failed, and a drive was marked for extension during LSM, it
will extended on the next drive monitor cycle.
will extended on the next volume monitor cycle.


### Live Merge
Expand All @@ -264,7 +264,7 @@ this drive.
- Keep the block threshold event as is, so we don't miss an event during
pivot.
- If we receive a block threshold event during the pivot, the drive will
be marked for extension, but the drive monitoring code will ingore
be marked for extension, but the volume monitoring code will ingore
this because the drive is diabled for monitoring.
- Perform a pivot
- If pivot succeeded:
Expand All @@ -287,7 +287,7 @@ this drive.
- Enable monitoring for the drive
- If pivot failed, enable monitoring for the drive. If the drive was
marked for extension during the piovt, it will be extended on the next
drive monitoring cycle.
volume monitoring cycle.


### Live migration (no changes required)
Expand All @@ -312,7 +312,7 @@ this drive.
- Much less work for the periodic workers, checking only drives during
LSM, and extending drives marked for extension.
- Eliminates the major source of discarded worker
- Since drive monitor does nothing most of the time, delays in drive
- Since volume monitor does nothing most of the time, delays in drive
monitoring are unlikely, avoiding delays in extending drives, that may
lead to pausing a VM.

Expand All @@ -326,4 +326,4 @@ before to trigger extension.
### Future work

Avoid the delay between block threshold event is received until the
drive is extended by waking up the drive monitor when event is received.
drive is extended by waking up the volume monitor when event is received.
2 changes: 1 addition & 1 deletion lib/vdsm/clientIF.py
Expand Up @@ -701,7 +701,7 @@ def dispatchLibvirtEvents(self, conn, dom, *args):
v.onDeviceRemoved(device_alias)
elif eventid == libvirt.VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD:
dev, path, threshold, excess = args[:-1]
v.drive_monitor.on_block_threshold(
v.volume_monitor.on_block_threshold(
dev, path, threshold, excess)
elif eventid == libvirt.VIR_DOMAIN_EVENT_ID_BLOCK_JOB_2:
drive, job_type, job_status, _ = args
Expand Down
14 changes: 9 additions & 5 deletions lib/vdsm/storage/sp.py
Expand Up @@ -1479,11 +1479,15 @@ def extendVolume(self, sdUUID, volumeUUID, size):
# unchanged.
self._assert_sd_in_pool(sdUUID)

# Extend volume without refreshing its size. If the SPM host see the
# new size immediately after extension, this can cause data corruption
# during VM migration when the source host is SPM. Volume size will be
# refreshed in Vm.after_volume_extension(), which is a callback of disk
# extend command.
# Extend the volume without refreshing its size. If the SPM host
# see the new size immediately after extension, this can cause
# data corruption during VM migration when the source host is
# SPM.
#
# The logical volume will be refreshed by the host requesting
# the extension when we send a reply, after refreshing the disk
# on the migration destination host.
#
# For more details see https://bugzilla.redhat.com/1983882
sdCache.produce(sdUUID).extendVolume(volumeUUID, size, refresh=False)

Expand Down
6 changes: 3 additions & 3 deletions lib/vdsm/virt/jobs/snapshot.py
Expand Up @@ -213,7 +213,7 @@ def _thaw_vm(self):
def finalize_vm(self, memory_vol):
try:
self._thaw_vm()
self._vm.drive_monitor.enable()
self._vm.volume_monitor.enable()
if self._memory_params:
self._vm.cif.teardownVolumePath(memory_vol)
if config.getboolean('vars', 'time_sync_snapshot_enable'):
Expand Down Expand Up @@ -464,12 +464,12 @@ def vm_conf_for_memory_snapshot():
self._snapshot_job['vmDrives'] = vm_drives_serialized
_write_snapshot_md(self._vm, self._snapshot_job, self._lock)

# We need to stop the drive monitoring for two reasons, one is to
# We need to stop the volume monitor for two reasons, one is to
# prevent spurious libvirt errors about missing drive paths (since
# we're changing them), and also to prevent to trigger a drive
# extension for the new volume with the apparent size of the old one
# (the apparentsize is updated as last step in updateDriveParameters)
self._vm.drive_monitor.disable()
self._vm.volume_monitor.disable()

try:
if self._should_freeze:
Expand Down
18 changes: 9 additions & 9 deletions lib/vdsm/virt/livemerge.py
Expand Up @@ -350,7 +350,7 @@ def _refresh_base(self, drive, base_info):
base_info['uuid'], base_info['apparentsize'],
base_info['capacity'])

self._vm.refresh_drive_volume({
self._vm.refresh_volume({
'domainID': drive.domainID,
'imageID': drive.imageID,
'name': drive.name,
Expand All @@ -371,7 +371,7 @@ def _start_commit(self, drive, job):
self._persist_jobs()

# Check that libvirt exposes full volume chain information
actual_chain = self._vm.drive_get_actual_volume_chain(drive)
actual_chain = self._vm.query_drive_volume_chain(drive)
if actual_chain is None:
self._untrack_job(job.id)
raise exception.MergeFailed(
Expand Down Expand Up @@ -460,7 +460,7 @@ def _start_extend(self, drive, job):

# Curent extend API extend to the next chunk based on current size. We
# need to lie about the current size to get a bigger allocation.
# TODO: Change extendDriveVolume so client can request a specific size.
# TODO: Change extend_volume so client can request a specific size.
max_alloc = job.extend["base_size"] + job.extend["top_size"]
capacity = job.extend["capacity"]

Expand All @@ -473,7 +473,7 @@ def _start_extend(self, drive, job):
job_id=job.id,
attempt=job.extend["attempt"])

self._vm.extendDriveVolume(
self._vm.extend_volume(
drive, job.base, max_alloc, capacity, callback=callback)

def _retry_extend(self, job):
Expand Down Expand Up @@ -884,21 +884,21 @@ def tryPivot(self):
# we can correct our metadata following the pivot we should not
# attempt to monitor drives.
# TODO: Stop monitoring only for the live merge disk
self.vm.drive_monitor.disable()
self.vm.volume_monitor.disable()

self.vm.log.info("Requesting pivot to complete active layer commit "
"(job %s)", self.job.id)
try:
flags = libvirt.VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT
self.vm._dom.blockJobAbort(self.drive.name, flags=flags)
except libvirt.libvirtError as e:
self.vm.drive_monitor.enable()
self.vm.volume_monitor.enable()
self._mark_leaf_legal()
if e.get_error_code() != libvirt.VIR_ERR_BLOCK_COPY_ACTIVE:
raise JobPivotError(self.job.id, e)
raise JobNotReadyError(self.job.id)
except:
self.vm.drive_monitor.enable()
self.vm.volume_monitor.enable()
raise

self._waitForXMLUpdate()
Expand Down Expand Up @@ -942,7 +942,7 @@ def run(self):
"(job %s)", self.job.id)
self.vm.sync_volume_chain(self.drive)
if self.doPivot:
self.vm.drive_monitor.enable()
self.vm.volume_monitor.enable()
chain_after_merge = [vol['volumeID']
for vol in self.drive.volumeChain]
if self.job.top not in chain_after_merge:
Expand Down Expand Up @@ -986,7 +986,7 @@ def _waitForXMLUpdate(self):
# is ongoing. If we are still in this loop when the VM is powered
# off, the merge will be resolved manually by engine using the
# reconcileVolumeChain verb.
actual_chain = self.vm.drive_get_actual_volume_chain(self.drive)
actual_chain = self.vm.query_drive_volume_chain(self.drive)
if actual_chain is None:
raise RuntimeError(
"Cannot get actual volume chain for drive {} alias {}"
Expand Down
5 changes: 2 additions & 3 deletions lib/vdsm/virt/migration.py
Expand Up @@ -514,9 +514,8 @@ def _regular_run(self):
self._recover(str(e))
self.log.exception("Failed to migrate")
finally:
# Enable drive monitor as it can be disabled during
# migration.
self._vm.drive_monitor.enable()
# Enable the volume monitor as it can be disabled during migration.
self._vm.volume_monitor.enable()

def _startUnderlyingMigration(self, startTime, machineParams):
if self.hibernating:
Expand Down
12 changes: 6 additions & 6 deletions lib/vdsm/virt/periodic.py
Expand Up @@ -349,7 +349,7 @@ class UpdateVolumes(_RunnableOnVm):
def required(self):
return (super(UpdateVolumes, self).required and
# Avoid queries from storage during recovery process
self._vm.drive_monitor.enabled())
self._vm.volume_monitor.enabled())

def _execute(self):
for drive in self._vm.getDiskDevices():
Expand All @@ -374,15 +374,15 @@ def _execute(self):
self._vm.updateVmJobs()


class DriveWatermarkMonitor(_RunnableOnVm):
class VolumeWatermarkMonitor(_RunnableOnVm):

@property
def required(self):
return (super(DriveWatermarkMonitor, self).required and
self._vm.drive_monitor.monitoring_needed())
return (super(VolumeWatermarkMonitor, self).required and
self._vm.volume_monitor.monitoring_needed())

def _execute(self):
self._vm.monitor_drives()
self._vm.monitor_volumes()


class _ExternalDataMonitor(_RunnableOnVm):
Expand Down Expand Up @@ -451,7 +451,7 @@ def per_vm_operation(func, period):
# from QEMU. It accesses storage and/or QEMU monitor, so can block,
# thus we need dispatching.
per_vm_operation(
DriveWatermarkMonitor,
VolumeWatermarkMonitor,
config.getint('vars', 'vm_watermark_interval')),

per_vm_operation(
Expand Down