Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare for monitoring scratch disks #58

Merged
merged 16 commits into from Feb 2, 2022
Merged

Prepare for monitoring scratch disks #58

merged 16 commits into from Feb 2, 2022

Conversation

nirs
Copy link
Member

@nirs nirs commented Jan 31, 2022

Preparing for monitoring scratch disks, refactor the code to related
to monitoring drives so it can monitor volumes.

Work in progress, posted as draft for early review.

@nirs nirs marked this pull request as draft January 31, 2022 16:30
@nirs nirs force-pushed the thinp branch 4 times, most recently from 622d662 to 5fdb211 Compare January 31, 2022 20:39
@nirs nirs marked this pull request as ready for review January 31, 2022 23:50
@nirs
Copy link
Member Author

nirs commented Jan 31, 2022

Tests pass, but it was not tested yet in real system.

Copy link
Member

@vjuranek vjuranek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few typos in commit messages, but there's no god way how to comment on commit message.

lib/vdsm/virt/vm.py Outdated Show resolved Hide resolved
@nirs
Copy link
Member Author

nirs commented Feb 1, 2022

There are a few typos in commit messages, but there's no god way how to comment on commit message.

You can add a comment in the top of the first change, I don't see a better way.

@nirs
Copy link
Member Author

nirs commented Feb 1, 2022

@vjuranek I improved the callback name and unified all callbacks related to monitoring and extending volumes.

Tested flows:

  • Starting a stopping vms
  • Plug and unplug thin block disk
  • Extend flow when vm disk becomes full
  • Live storage migration (block to block, block to file, file to file, file to block)
  • backup (full, incremental)

I also fixed many issues in the commit messages, hopefully all of them.

The tests revealed lvm logging issue, see also #61.

@nirs nirs requested a review from vjuranek February 1, 2022 22:30
vjuranek
vjuranek previously approved these changes Feb 2, 2022
mz-pdm
mz-pdm previously approved these changes Feb 2, 2022
Copy link
Member

@mz-pdm mz-pdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK from the virt side, except for the typos.

lib/vdsm/virt/vm.py Show resolved Hide resolved
lib/vdsm/virt/vm.py Show resolved Hide resolved
tests/virt/drive_extension_test.py Outdated Show resolved Hide resolved
lib/vdsm/virt/vm.py Outdated Show resolved Hide resolved
lib/vdsm/virt/vm.py Outdated Show resolved Hide resolved
lib/vdsm/virt/vm.py Outdated Show resolved Hide resolved
lib/vdsm/virt/vm.py Show resolved Hide resolved
lib/vdsm/storage/sp.py Outdated Show resolved Hide resolved
@nirs
Copy link
Member Author

nirs commented Feb 2, 2022

@vjuranek @mz-pdm all comments addressed, should be ready for merge.

mz-pdm
mz-pdm previously approved these changes Feb 2, 2022
Copy link
Member

@mz-pdm mz-pdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes.

@nirs nirs added the verified Change was tested; please describe how it was tested in the PR label Feb 2, 2022
@nirs nirs mentioned this pull request Feb 2, 2022
@nirs
Copy link
Member Author

nirs commented Feb 2, 2022

/ost

1 similar comment
@nirs
Copy link
Member Author

nirs commented Feb 2, 2022

/ost

This function is needed to amend block info when replicating a
non-chunked drive to chunked drive, but it was abused to set
Drive.blockinfo, used to log changes in extension info.

Convert Drive.blockinfo to a property, and move the logging code to the
property setter. Set Drive.block_info after separately from the call to
amend the block info.

This change allows amending block info only for drives during
replication, which will make it easier handle scratch disks in the
monitoring flows.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
In the past we could monitor only the active volume of a drive, so drive
monitor was the right term.

However, now we can monitor also:
- Scratch disk volume
- Internal volume during live merge (blockCommit)
- Target volume during disk replication (blockCopy)

Modifying the Drive object to keep monitoring info is hard and does not
look like the right way. The plan is to keep monitoring info in the
volume monitor.

When we want to monitor a volume, we will call:

   VolumeMonitor.start_monitoring(volume)

This will set a block threshold on the volume using libvirt, and keep
the volume state needed to handle block threshold events.

When we want to stop monitoring a volume, we will call:

   VolumeMonitor.stop_monitoring(volume)

This will disable block threshold events and remove the volume from
the monitor.

When we get a block threshold event, the monitor will update the volume
state.

In the periodic job, we will query the volumes that need to be extended
and extend them.

When a drive changes the path, for example at the end of live storage
migration or live merge, we will remove the old volume from the volume
monitor, and add the new volume.

Starting this work with renaming to make the intent of the code clear:

- drivemonitor module to thinp - this module will include all the code
  needed for handling thin provisioned block volumes.
- drivemonitor.DriveMonitor class to thinp.VolumeMonitor.
- VolumeMonitor.monitored_drives to VolumeMonitor.monitored_volumes.
- VM.drive_montior to VM.volume_monitor.
- VM.montior_drives to VM.montior_volumes
- periodic.DriveWatermarkMonitor to periodic.VolumeWatermarkMonitor.
- "drive monitor" to "volume montior" in doc/thin-provisioning.md.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This method became public for no reason when working on extending
volumes during live merge. Make it private again.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
There is no reason to use __ prefix for the methods related to extending
drives. Using single _ prefix is enough to mark them as private. Using
__ is needed when you want to allow a subclass to override a super class
method but keep the option to call the super implementation.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
I want to make _drive_volume_index public, since it is needed for moving
thin provisioning code out of VM class, so we need good way to name
drive related queries. Start by renaming existing queries.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This method works with any volume (e.g. scratch disk) and do not accept
a drive object, so there is no reason to include the drive in the name.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This is needed when we start to monitor drive active volume, and I want
to move that code to the thinp module. Rename to
query_drive_volume_index() to match query_drive_volume_chain() so we can
call it outside of the VM.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This method is a ugly hack for monitoring drives during disk
replication. This hack should be eliminated when finish volume
monitoring work, but for now we need it public so we can move the
monitoring code to thinp.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Move VM.monitor_volumes() and VM.extend_drive_if_needed() to
thinp.VolumeMonitor. This decouples the VM class from the VolumeMonitor
class, which will make it easier to change the volume montior to work
with volumes instead of drives.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This should be used to extend any volume (e.g. scratch disk). Rename the
public and private methods to reflect the intended use.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Order the extend methods according the flow:

- extend volume (entry point)
- extending replica (used during live storage migration)
- extending volume
- refreshing destination volume (used in migration)
- refreshing local volume
- verifying extension
- updating drive size
- resume vm

Most of these method should move to thinp module. Reordering them helps
to understand the dependencies and will make a cleaner patch when we
move them.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We have very confusing _resume_if_needed() and maybe_resume() - which
method should be called to resume a VM? Rename _resume_if_needed() to
extend_volume_completed(), live merge _extend_completed callback. Now
it is clear that this should be called only when extend flow completed,
and other code should call maybe_resume().

This new method is public so we can move the extend code to the thinp
module.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This will be called from volume monitor when extend volume completed
successfully, so the volume monitor does not have to care about VM
internals.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We had 2 confusing methods for refreshing the destination volume during
migration. Since both of them are small and simple, and the private
method should not be called in any other flow, merging them makes the
code easier to follow.

While merging the methods, streamline the way we raise on errors.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Move the complex extending code to the thinp module. This will make it
easier to monitor volumes instead of drives. The VM is not affected now
by the way we monitor or extend volumes.

The VM part during disk extension is now:

- Call VolumeMonitor.monitor_volumes() periodically. This is done by
  the periodic module.

- Call VolumeMonitor.extend_volume() when volumes need to be
  pre-extended. This is called when starting disk replication, and
  before starting the commit phase in live merge. This will be
  eliminated when we start to monitor the relevant volume.

- Handle callbacks from VolumeMonitor._after_volume_extension:
  - Refresh migration volume
  - Refresh local volume
  - Resume if needed when extend volume completed

Because extending volumes is implemented now by the volume monitor, we
have to use the real object in live merge test.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Unify the callbacks name with other callbacks. The callback is named
{method_name}_completed to make it easy to find the callback when you
look at the method, or the other way around.

The storage system refereed to the private callback for explaining why
we don't refresh extended volumes on the SPM. This is bad, requiring
unrelated changes when renaming private methods. Change the text to
explain what we do without referring to the actual method name.

Bug-Url: https://bugzilla.redhat.com/1913387
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@nirs
Copy link
Member Author

nirs commented Feb 2, 2022

@rebased on master for trying ost workflow again.

@nirs
Copy link
Member Author

nirs commented Feb 2, 2022

/ost

@sonarcloud
Copy link

sonarcloud bot commented Feb 2, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 26 Code Smells

No Coverage information No Coverage information
0.2% 0.2% Duplication

@nirs
Copy link
Member Author

nirs commented Feb 2, 2022

/ost

2 similar comments
@nirs
Copy link
Member Author

nirs commented Feb 2, 2022

/ost

@nirs
Copy link
Member Author

nirs commented Feb 2, 2022

/ost

@nirs
Copy link
Member Author

nirs commented Feb 2, 2022

OST check is broken, if fails with:

RuntimeError: GH pr/commit/run https://api.github.com/repos/oVirt/vdsm/pulls/58
had no artifacts with RPM files.

Looks like the integration with github does not work reliably yet, so we cannot use
use it yet.

@vjuranek vjuranek merged commit 6d53bed into oVirt:master Feb 2, 2022
@nirs nirs deleted the thinp branch March 20, 2022 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
verified Change was tested; please describe how it was tested in the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants