-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate the wait before extending a volume #124
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc/thin-provisioning.md should be updated.
This document is stale and needs a rewrite. I will post another PR for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet complete review, I'll continue tomorrow.
/ost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version with per-device locks looks much better, good job. It looks overall good to me, just one possible concern and several typos and other documentation issues.
b7e4140
to
e6f62ab
Compare
/ost |
@mz-pdm it will be nice to have this in this week build, to get more time for testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(There is no build this week.)
|
When monitoring drives, we have 2 expected threshold states: - UNSET: If the drive have enough free space, set a block threshold so we will get an event when the drive should be extended. If the drive does not have enough free space, mark it as EXCEEDED and start an extend flow. - EXCEEDED: Drive was marked as EXCEEDED after receiving a block threshold or enospc events, or we found that the drive does not have enough free space. We start an extend flow. Extract _handle_unset() and _handle_exceeded(), each handling one threshold state. _should_extend_volume() was split to 2 simpler helpers, _can_extend_drive() and _drive_needs_extend(). Signed-off-by: Nir Soffer <nsoffer@redhat.com>
The tests were using the old configuration (1 GiB chunk size, 50% utilization). This is correct, but make it hard to simulate some cases. Update to current configuration (2.5 GiB chunk size, 20% utilization) Signed-off-by: Nir Soffer <nsoffer@redhat.com>
The recent change to log a warning when exceeded drive cannot be extended since it was already extended to the maximum size revealed that we try to monitor and extend such drive every 2 seconds. Example failure flow with 20g drive: 1. Vdsm extends drive to 22g, and set threshold to 20g 2. Guest writes to offset 20g 3. Vdsm get block threshold event and mark drive as exceeded 4. Every 2 seconds, vdsm try to extend the drive and log a warning: WARN (periodic/0) [virt.vm] (vmId='d6eb4739-ccd3-4652-b6fc-7f4fd4a972ad') Drive already extended to maximum size ... Why we extend the drive to 22g if drive capacity is 20g? When using qcow2 format, writing 20g of guest data requires additional space for qcow2 metadata. Vdsm allows up to 1.1 * capacity (22g). This is not a new issue - it existed since we started to use block threshold few years ago. But since we did not log anything, the bug was hidden, consuming CPU cycles and increasing the chance for filling up the executor queue and delaying other tasks. To fix this issue, introduce a new threshold state: `DISABLED`. In this state the drive is not picked up for monitoring. If the drive is resized, we change the threshold state back to `UNSET` so the drive will be monitored again in the next monitor cycle. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We want to extend drives as soon as possible when receiving events from libvirt. However we may receive multiple events at the same time, or the periodic monitor may wake up at the same time we want to handle an event. Running the monitor concurrently can lead to double extend messages and confusing logs. Handle multiple calls by locking the drive during monitoring. If another thread try to monitor a drive while the drive monitor lock is held, it waits up 0.5 seconds for the lock and skip the check on timeout. A new `thinp:monitor_timeout` option was added to control the timeout. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
When a volume was refreshed, it was still marked as exceeded. If the periodic montior wakes up at this point, it can try to extend the drive again. This is unlikely event, but if it happens there is no way to fix it without shutting down the VM. Avoid the race by locking the drive monitor lock before the refresh, and releasing the lock after we set a new block threshold. If the monitor tries to access the drive at this point, it will wait until the completion callback is completed or skip the check on timeout. If we cannot acquire the drive monitor lock in 60.0 seconds to we abort the refresh to avoid blocking the mailbox thread pool. This may happen if the drive monitor lock is held by another monitor thread block on inaccessible storage. A new `thinp:refresh_timeout` option was added to control the timeout. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
To allow immediate extend when receiving a block threshold or enospc events, we need to handle the case when the monitor wakes up soon after we extended the drive, find that the drive is exceeded, and try to extend it again. This creates duplicate extend requests that should be harmless, but they make it hard to analyze extend stats. To avoid this issue we keep the time of the last extend in the drive. When the monitor handles an exceeded drive, it skips this drive if a previous extend was started recently, and the extend timeout did not expired. We need to handle also another case, when a drive was extended recently, but setting the block threshold failed, and it should be extended again. If the guest write quickly before the extend timeout expired, we want to extend the drive immediately. To allow immediate extend, add an "urgent" argument to _handle_exceeded(). It is set when we find that the drive needs to be extended in _handle_unset(). It will be also used when receiving a block threshold or enospc events. New `thinp:extend_timeout` option added to control the timeout. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
If the monitor is configured with an executor dispatch function, it dispatches a call to extend the drive when getting a block threshold or enospc events. The drive will be extended on the first available executor worker. Using a dispatch function keeps the thinp module decoupled from the periodic module, makes testing easy, and prepares for removing the periodic monitor and use thinp executor. The periodic monitor checks if VM is ready for commands before monitoring. Testing extending of 4 drives at the same time shows that this check returns False randomly for no reason. Since this cause a delay before extending the drive, we don't do this check when receiving an event. We also don't check migration status or use it when handling libvirt calls failures, since it should be needed for volume monitoring, disabled on migration destination VM. When extending a drive we hold the drive monitor lock, to avoid conflicts with the periodic monitor and extend completion threads. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Expose periodic.dispatch() function allowing immediate dispatching of calls on the periodic executor. This is useful when you want to handle libvirt events on the periodic executor. The first user of this facility is the thinp volume monitor. Now when we receive a block threshold or enospc events we use the periodic dispatch to extend the relevant drive immediately. This eliminates the 0-2 seconds wait after receiving an event. Here are test results from 4 runs, each writing 50 GiB to think disk at ~1300 MiB/s. Each run extends the disk 20 times. The VM was not paused during the test. | time | min | avg | max | |-------------|-------|-------|-------| | total | 0.77 | 1.15 | 1.39 | | extend | 0.55 | 0.92 | 1.14 | | refresh | 0.16 | 0.22 | 0.31 | | wait | 0.01 | 0.01 | 0.03 | Fixes: oVirt#85 Signed-off-by: Nir Soffer <nsoffer@redhat.com>
When receiving a block threshold event or when pausing because of ENOSPC,
extend the drive as soon as possible.
This change decreases the wait before sending an extend request from 0.0-2.0
seconds to 10-30 milliseconds.
Here are test results from 4 runs, each writing 50 GiB to think disk at
~1300 MiB/s. Each run extends the disk 20 times. The VM was not paused
during the test.
Fixes: #85