config: Increase thin provisioning threshold #82

nirs · 2022-02-22T10:12:52Z

Collecting extend stats shows that extend takes between 2.2 to 6.2
seconds, with average of 3.7 seconds. With the default thresholds:

[irs]
volume_utilization_chunk_mb = 1024
volume_utilization_percent = 50

This means that we extend the volume when free space is 512 MiB. Writing
more than 512 MiB in 3.7 seconds (138.4 MiB/s) will cause the VM to
pause with ENOSPC.

This configuration was too low 10 years ago, and we need to update it
for modern storage. Update the values to allow 4 times faster writes
before we pause with ENOSPC.

With the new configuration:

[irs]
volume_utilization_chunk_mb = 2560
volume_utilization_percent = 20

We extend the volume when free space is 2048 MiB.

Testing with old and new configuration show that we can cope now with 4x
times faster write rate before we VMs pause during extend.

Before:

write rate extends pauses

75 MiB/s 50 0
100 MiB/s 50 4
125 MiB/s 50 4
150 MiB/s 53 24

After:

write rate extends pauses

200 MiB/s 20 0
250 MiB/s 20 0
300 MiB/s 20 0
350 MiB/s 21 0
400 MiB/s 20 1
450 MiB/s 20 2
500 MiB/s 22 7
550 MiB/s 23 7

The downside of this change is allocating more space in the storage
domain. New empty disk will consume 2.5 GiB instead of 1 GiB.

Bug-Url: https://bugzilla.redhat.com/2051997
Signed-off-by: Nir Soffer nsoffer@redhat.com

nirs · 2022-02-22T10:57:13Z

Some storage tests fail, I guess some wrong tests assumes the old chunk size instead of
using mocked config.

michalskrivanek · 2022-02-22T11:36:41Z

can you elaborate a little bit of what are the factors affecting the time it takes to extend? I mean besides the actual storage speed. Are there any waits in the process, communication with SPM, etc?

michalskrivanek · 2022-02-22T12:03:36Z

also, is the write speed as perceived by the guest directly corresponding to the actual physical write speed? w're using O_DIRECT everywhere so I would assume yes? So we can roughly estimate the minimal values for a concrete underlying write speed? e.g. if we try to measure the write speed outside of oVirt with fio or even dd or something...

nirs · 2022-02-22T12:56:33Z

can you elaborate a little bit of what are the factors affecting the time it takes to extend? I mean besides the actual storage speed. Are there any waits in the process, communication with SPM, etc?

The time include lot of waiting since we use polling.

libvirt event thread get a block threshold event, and mark the drive
for extension
The periodic watermark monitor checks vms every 2 seconds. When it find
that a drive needs extension, it sends a request to the spm by writing
to the storage mailbox
The SPM check storage every 2 seconds. When it find the request, it
run the extend using the spm mailbox thread pool
The host is polling its mailbox for replies every 2 seconds. When it detects
the reply, it complete the extend on the host side and resume the vm if
needed.

We cannot optimize much the storage mailbox since checking the mailbox requires
reading from storage, and many hosts may check the mailbox at the same time.
Maybe we can check every 1 second instead of 2.

We can optimize the block threshold event handling - it should really post an
event that will wake up the watermark monitor immediately, and start the extend
flow. This will save 0-2 seconds from the total time. This requires rewrite
of the watermark monitor and separating it from the periodic executor which
is a change I wanted to do for long time.

vjuranek

linter says that from vdsm.config import config in blockvolume_test is unsed now and fails

nirs · 2022-02-22T13:02:52Z

also, is the write speed as perceived by the guest directly corresponding to the actual physical write speed? w're using O_DIRECT everywhere so I would assume yes? So we can roughly estimate the minimal values for a concrete underlying write speed? e.g. if we try to measure the write speed outside of oVirt with fio or even dd or something...

Yes, we use direct I/O, and the extend script is using direct I/O inside the
guest, so what we write inside the guest is exactly what is writen to the
actual storage.

Measuring write throughput should be done in the guest, there is big difference
between guest performance and host performance. Measuring is complected and it
is impossible to predict how fast the disk will grow on a system.

nirs · 2022-02-22T13:05:11Z

linter says that from vdsm.config import config in blockvolume_test is unsed now and fails

flake8 is correct, fixed in current version.

michalskrivanek · 2022-02-22T15:10:38Z

Measuring write throughput should be done in the guest, there is big difference
between guest performance and host performance. Measuring is complected and it
is impossible to predict how fast the disk will grow on a system.

yes, but isn't it just that it's always somewhat slower in guest, never faster? In that case it is still useful as then we can measure easily on the host and estimate for the worst case of near-host performance in guest

nirs · 2022-02-22T16:15:16Z

Measuring write throughput should be done in the guest, there is big difference
between guest performance and host performance. Measuring is complected and it
is impossible to predict how fast the disk will grow on a system.

yes, but isn't it just that it's always somewhat slower in guest, never faster? In that case it is still useful as then we can measure easily on the host and estimate for the worst case of near-host performance in guest

I don't see how it can be faster in the guest. But it is not possible to measure the host
and assume that the measurement is correct for the future, because the load on the storage,
the network and the host at the time of the measurement can be different at the time the
guest use the storage.

nirs · 2022-02-22T16:17:11Z

@vjuranek should be ready now.

Test depending on configuration options must use mock config object to avoid failing when configuration is modified, or when running on a host with non default config. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Set threshold test was using hard coded value assuming old vdsm configuration. Change the test to use vdsm configuration so it does not break when the configuration is changed. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

This horrible test was depending on default configuration instead of using the config, and is written in a way that make it hard to use the config. The horrible make_env() context manager using the crappy storagetestlib was mocking everything after creating the volumes that need mocking to consider the configuration. Change to make create everything inside the mock context. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Collecting extend stats shows that extend takes between 2.2 to 6.2 seconds, with average of 3.7 seconds. With the default thresholds: [irs] volume_utilization_chunk_mb = 1024 volume_utilization_percent = 50 This means that we extend the volume when free space is 512 MiB. Writing more than 512 MiB in 3.7 seconds (138.4 MiB/s) will cause the VM to pause with ENOSPC. This configuration was too low 10 years ago, and we need to update it for modern storage. Update the values to allow 4 times faster writes before we pause with ENOSPC. With the new configuration: [irs] volume_utilization_chunk_mb = 2560 volume_utilization_percent = 20 We extend the volume when free space is 2048 MiB. Testing with old and new configuration show that we can cope now with 4x times faster write rate before we VMs pause during extend. Before: write rate extends pauses ---------------------------- 75 MiB/s 50 0 100 MiB/s 50 4 125 MiB/s 50 4 150 MiB/s 53 24 After: write rate extends pauses ---------------------------- 200 MiB/s 20 0 250 MiB/s 20 0 300 MiB/s 20 0 350 MiB/s 21 0 400 MiB/s 20 1 450 MiB/s 20 2 500 MiB/s 22 7 550 MiB/s 23 7 The downside of this change is allocating more space in the storage domain. New empty disk will consume 2.5 GiB instead of 1 GiB. Bug-Url: https://bugzilla.redhat.com/2051997 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

sonarcloud · 2022-02-22T17:02:30Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

michalskrivanek · 2022-02-22T17:25:33Z

4x is a big improvement.
The extra overhead is small enough, 1.5GB extra for today's SD sizes is worth it.

Still, with a bit over 6 seconds it means we will still eventually pause with >300MiB/s writes even in quiet conditions. Is it high enough? Should we go for 500 increasing the overhead? OTOH it's really not a big deal to pause occasionally, we just want to avoid frequent pauses every time e.g. etcd is extended.

@mykaul FYI

mykaul · 2022-02-22T18:46:18Z

4x is a big improvement. The extra overhead is small enough, 1.5GB extra for today's SD sizes is worth it.

Still, with a bit over 6 seconds it means we will still eventually pause with >300MiB/s writes even in quiet conditions. Is it high enough? Should we go for 500 increasing the overhead? OTOH it's really not a big deal to pause occasionally, we just want to avoid frequent pauses every time e.g. etcd is extended.

@mykaul FYI

Looks good - time to change the defaults indeed.

nirs added verified Change was tested; please describe how it was tested in the PR storage labels Feb 22, 2022

nirs requested review from vjuranek, bennyz and barpavel February 22, 2022 10:12

nirs requested a review from tinez as a code owner February 22, 2022 10:12

bennyz previously approved these changes Feb 22, 2022

View reviewed changes

nirs marked this pull request as draft February 22, 2022 10:57

oVirt deleted a comment from ovirt-infra Feb 22, 2022

nirs dismissed bennyz’s stale review via 668443e February 22, 2022 12:42

nirs force-pushed the thinp-defaults branch from ff015e3 to 668443e Compare February 22, 2022 12:42

vjuranek approved these changes Feb 22, 2022

View reviewed changes

vjuranek suggested changes Feb 22, 2022

View reviewed changes

nirs force-pushed the thinp-defaults branch from 668443e to f92ff91 Compare February 22, 2022 13:04

nirs requested a review from vjuranek February 22, 2022 13:05

nirs force-pushed the thinp-defaults branch from f92ff91 to 83c81d7 Compare February 22, 2022 15:02

nirs force-pushed the thinp-defaults branch from 83c81d7 to 3105cdf Compare February 22, 2022 16:16

nirs marked this pull request as ready for review February 22, 2022 16:16

nirs requested a review from bennyz February 22, 2022 16:17

nirs added 2 commits February 22, 2022 18:59

tests: Fix unsafe use of config values

655b243

Test depending on configuration options must use mock config object to avoid failing when configuration is modified, or when running on a host with non default config. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

tests: Use config instead of hard coded value

a2594a8

Set threshold test was using hard coded value assuming old vdsm configuration. Change the test to use vdsm configuration so it does not break when the configuration is changed. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs added 2 commits February 22, 2022 18:59

nirs force-pushed the thinp-defaults branch from 3105cdf to 6861a0c Compare February 22, 2022 17:00

vjuranek approved these changes Feb 23, 2022

View reviewed changes

vjuranek merged commit a90335c into oVirt:master Feb 23, 2022

nirs mentioned this pull request Feb 23, 2022

Minimize wait after receiving block threshold event #85

Closed

nirs deleted the thinp-defaults branch March 20, 2022 12:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config: Increase thin provisioning threshold #82

config: Increase thin provisioning threshold #82

nirs commented Feb 22, 2022

nirs commented Feb 22, 2022

michalskrivanek commented Feb 22, 2022

michalskrivanek commented Feb 22, 2022

nirs commented Feb 22, 2022

vjuranek left a comment

nirs commented Feb 22, 2022

nirs commented Feb 22, 2022

michalskrivanek commented Feb 22, 2022

nirs commented Feb 22, 2022

nirs commented Feb 22, 2022

sonarcloud bot commented Feb 22, 2022

michalskrivanek commented Feb 22, 2022

mykaul commented Feb 22, 2022

config: Increase thin provisioning threshold #82

config: Increase thin provisioning threshold #82

Conversation

nirs commented Feb 22, 2022

write rate extends pauses

write rate extends pauses

nirs commented Feb 22, 2022

michalskrivanek commented Feb 22, 2022

michalskrivanek commented Feb 22, 2022

nirs commented Feb 22, 2022

vjuranek left a comment

Choose a reason for hiding this comment

nirs commented Feb 22, 2022

nirs commented Feb 22, 2022

michalskrivanek commented Feb 22, 2022

nirs commented Feb 22, 2022

nirs commented Feb 22, 2022

sonarcloud bot commented Feb 22, 2022

michalskrivanek commented Feb 22, 2022

mykaul commented Feb 22, 2022