Prepare for monitoring scratch disks #58

nirs · 2022-01-31T16:29:55Z

Preparing for monitoring scratch disks, refactor the code to related
to monitoring drives so it can monitor volumes.

Work in progress, posted as draft for early review.

nirs · 2022-01-31T23:51:11Z

Tests pass, but it was not tested yet in real system.

vjuranek

There are a few typos in commit messages, but there's no god way how to comment on commit message.

lib/vdsm/virt/vm.py

nirs · 2022-02-01T14:55:47Z

There are a few typos in commit messages, but there's no god way how to comment on commit message.

You can add a comment in the top of the first change, I don't see a better way.

nirs · 2022-02-01T21:50:24Z

@vjuranek I improved the callback name and unified all callbacks related to monitoring and extending volumes.

Tested flows:

Starting a stopping vms
Plug and unplug thin block disk
Extend flow when vm disk becomes full
Live storage migration (block to block, block to file, file to file, file to block)
backup (full, incremental)

I also fixed many issues in the commit messages, hopefully all of them.

The tests revealed lvm logging issue, see also #61.

mz-pdm

OK from the virt side, except for the typos.

lib/vdsm/virt/vm.py

tests/virt/drive_extension_test.py

lib/vdsm/virt/vm.py

lib/vdsm/storage/sp.py

nirs · 2022-02-02T14:44:12Z

@vjuranek @mz-pdm all comments addressed, should be ready for merge.

mz-pdm

Thanks for the fixes.

nirs · 2022-02-02T17:29:02Z

/ost

nirs · 2022-02-02T17:47:06Z

/ost

This function is needed to amend block info when replicating a non-chunked drive to chunked drive, but it was abused to set Drive.blockinfo, used to log changes in extension info. Convert Drive.blockinfo to a property, and move the logging code to the property setter. Set Drive.block_info after separately from the call to amend the block info. This change allows amending block info only for drives during replication, which will make it easier handle scratch disks in the monitoring flows. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

In the past we could monitor only the active volume of a drive, so drive monitor was the right term. However, now we can monitor also: - Scratch disk volume - Internal volume during live merge (blockCommit) - Target volume during disk replication (blockCopy) Modifying the Drive object to keep monitoring info is hard and does not look like the right way. The plan is to keep monitoring info in the volume monitor. When we want to monitor a volume, we will call: VolumeMonitor.start_monitoring(volume) This will set a block threshold on the volume using libvirt, and keep the volume state needed to handle block threshold events. When we want to stop monitoring a volume, we will call: VolumeMonitor.stop_monitoring(volume) This will disable block threshold events and remove the volume from the monitor. When we get a block threshold event, the monitor will update the volume state. In the periodic job, we will query the volumes that need to be extended and extend them. When a drive changes the path, for example at the end of live storage migration or live merge, we will remove the old volume from the volume monitor, and add the new volume. Starting this work with renaming to make the intent of the code clear: - drivemonitor module to thinp - this module will include all the code needed for handling thin provisioned block volumes. - drivemonitor.DriveMonitor class to thinp.VolumeMonitor. - VolumeMonitor.monitored_drives to VolumeMonitor.monitored_volumes. - VM.drive_montior to VM.volume_monitor. - VM.montior_drives to VM.montior_volumes - periodic.DriveWatermarkMonitor to periodic.VolumeWatermarkMonitor. - "drive monitor" to "volume montior" in doc/thin-provisioning.md. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

This method became public for no reason when working on extending volumes during live merge. Make it private again. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

There is no reason to use __ prefix for the methods related to extending drives. Using single _ prefix is enough to mark them as private. Using __ is needed when you want to allow a subclass to override a super class method but keep the option to call the super implementation. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

I want to make _drive_volume_index public, since it is needed for moving thin provisioning code out of VM class, so we need good way to name drive related queries. Start by renaming existing queries. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

This method works with any volume (e.g. scratch disk) and do not accept a drive object, so there is no reason to include the drive in the name. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

This is needed when we start to monitor drive active volume, and I want to move that code to the thinp module. Rename to query_drive_volume_index() to match query_drive_volume_chain() so we can call it outside of the VM. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

This method is a ugly hack for monitoring drives during disk replication. This hack should be eliminated when finish volume monitoring work, but for now we need it public so we can move the monitoring code to thinp. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Move VM.monitor_volumes() and VM.extend_drive_if_needed() to thinp.VolumeMonitor. This decouples the VM class from the VolumeMonitor class, which will make it easier to change the volume montior to work with volumes instead of drives. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

This should be used to extend any volume (e.g. scratch disk). Rename the public and private methods to reflect the intended use. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Order the extend methods according the flow: - extend volume (entry point) - extending replica (used during live storage migration) - extending volume - refreshing destination volume (used in migration) - refreshing local volume - verifying extension - updating drive size - resume vm Most of these method should move to thinp module. Reordering them helps to understand the dependencies and will make a cleaner patch when we move them. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

We have very confusing _resume_if_needed() and maybe_resume() - which method should be called to resume a VM? Rename _resume_if_needed() to extend_volume_completed(), live merge _extend_completed callback. Now it is clear that this should be called only when extend flow completed, and other code should call maybe_resume(). This new method is public so we can move the extend code to the thinp module. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

This will be called from volume monitor when extend volume completed successfully, so the volume monitor does not have to care about VM internals. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

We had 2 confusing methods for refreshing the destination volume during migration. Since both of them are small and simple, and the private method should not be called in any other flow, merging them makes the code easier to follow. While merging the methods, streamline the way we raise on errors. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Move the complex extending code to the thinp module. This will make it easier to monitor volumes instead of drives. The VM is not affected now by the way we monitor or extend volumes. The VM part during disk extension is now: - Call VolumeMonitor.monitor_volumes() periodically. This is done by the periodic module. - Call VolumeMonitor.extend_volume() when volumes need to be pre-extended. This is called when starting disk replication, and before starting the commit phase in live merge. This will be eliminated when we start to monitor the relevant volume. - Handle callbacks from VolumeMonitor._after_volume_extension: - Refresh migration volume - Refresh local volume - Resume if needed when extend volume completed Because extending volumes is implemented now by the volume monitor, we have to use the real object in live merge test. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Unify the callbacks name with other callbacks. The callback is named {method_name}_completed to make it easy to find the callback when you look at the method, or the other way around. The storage system refereed to the private callback for explaining why we don't refresh extended volumes on the SPM. This is bad, requiring unrelated changes when renaming private methods. Change the text to explain what we do without referring to the actual method name. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs · 2022-02-02T17:55:51Z

@rebased on master for trying ost workflow again.

nirs · 2022-02-02T17:55:57Z

/ost

sonarcloud · 2022-02-02T17:56:11Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
26 Code Smells

No Coverage information
0.2% Duplication

nirs · 2022-02-02T18:01:15Z

/ost

nirs · 2022-02-02T18:21:10Z

/ost

nirs · 2022-02-02T18:29:16Z

/ost

nirs · 2022-02-02T18:47:49Z

OST check is broken, if fails with:

RuntimeError: GH pr/commit/run https://api.github.com/repos/oVirt/vdsm/pulls/58
had no artifacts with RPM files.

Looks like the integration with github does not work reliably yet, so we cannot use
use it yet.

nirs requested review from almusil, mz-pdm, tinez and vjuranek as code owners January 31, 2022 16:29

nirs marked this pull request as draft January 31, 2022 16:30

nirs force-pushed the thinp branch 4 times, most recently from 622d662 to 5fdb211 Compare January 31, 2022 20:39

nirs marked this pull request as ready for review January 31, 2022 23:50

vjuranek reviewed Feb 1, 2022

View reviewed changes

lib/vdsm/virt/vm.py Outdated Show resolved Hide resolved

nirs force-pushed the thinp branch from 15bd0a5 to 9fa5d14 Compare February 1, 2022 21:46

nirs requested a review from vjuranek February 1, 2022 22:30

vjuranek previously approved these changes Feb 2, 2022

View reviewed changes

mz-pdm previously approved these changes Feb 2, 2022

View reviewed changes

nirs dismissed stale reviews from mz-pdm and vjuranek via 8c9e864 February 2, 2022 14:43

nirs force-pushed the thinp branch from 9fa5d14 to 8c9e864 Compare February 2, 2022 14:43

nirs requested review from mz-pdm and vjuranek February 2, 2022 14:43

mz-pdm previously approved these changes Feb 2, 2022

View reviewed changes

nirs added the verified Change was tested; please describe how it was tested in the PR label Feb 2, 2022

nirs mentioned this pull request Feb 2, 2022

Cleanup VolumeMonitor API #64

Merged

nirs added 15 commits February 2, 2022 19:53

vm: Make VM._after_volume_extension private

12d7fb3

This method became public for no reason when working on extending volumes during live merge. Make it private again. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

vm: Rename VM.refresh_drive_volume

3195024

This method works with any volume (e.g. scratch disk) and do not accept a drive object, so there is no reason to include the drive in the name. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

vm: Rename VM.extendDriveVolume

dc76a07

This should be used to extend any volume (e.g. scratch disk). Rename the public and private methods to reflect the intended use. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

vm: Make _refresh_destination_volume() public

6ac1591

This will be called from volume monitor when extend volume completed successfully, so the volume monitor does not have to care about VM internals. Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs dismissed mz-pdm’s stale review via a635bd2 February 2, 2022 17:53

nirs force-pushed the thinp branch from 8c9e864 to a635bd2 Compare February 2, 2022 17:53

vjuranek approved these changes Feb 2, 2022

View reviewed changes

vjuranek merged commit 6d53bed into oVirt:master Feb 2, 2022

nirs deleted the thinp branch March 20, 2022 12:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare for monitoring scratch disks #58

Prepare for monitoring scratch disks #58

nirs commented Jan 31, 2022

nirs commented Jan 31, 2022

vjuranek left a comment

nirs commented Feb 1, 2022

nirs commented Feb 1, 2022 •

edited

mz-pdm left a comment

nirs commented Feb 2, 2022

mz-pdm left a comment

nirs commented Feb 2, 2022

nirs commented Feb 2, 2022

nirs commented Feb 2, 2022

nirs commented Feb 2, 2022

sonarcloud bot commented Feb 2, 2022

nirs commented Feb 2, 2022

nirs commented Feb 2, 2022

nirs commented Feb 2, 2022

nirs commented Feb 2, 2022

Prepare for monitoring scratch disks #58

Prepare for monitoring scratch disks #58

Conversation

nirs commented Jan 31, 2022

nirs commented Jan 31, 2022

vjuranek left a comment

Choose a reason for hiding this comment

nirs commented Feb 1, 2022

nirs commented Feb 1, 2022 • edited

mz-pdm left a comment

Choose a reason for hiding this comment

nirs commented Feb 2, 2022

mz-pdm left a comment

Choose a reason for hiding this comment

nirs commented Feb 2, 2022

nirs commented Feb 2, 2022

nirs commented Feb 2, 2022

nirs commented Feb 2, 2022

sonarcloud bot commented Feb 2, 2022

nirs commented Feb 2, 2022

nirs commented Feb 2, 2022

nirs commented Feb 2, 2022

nirs commented Feb 2, 2022

nirs commented Feb 1, 2022 •

edited