New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config: Increase thin provisioning threshold #82
Conversation
Some storage tests fail, I guess some wrong tests assumes the old chunk size instead of |
can you elaborate a little bit of what are the factors affecting the time it takes to extend? I mean besides the actual storage speed. Are there any waits in the process, communication with SPM, etc? |
also, is the write speed as perceived by the guest directly corresponding to the actual physical write speed? w're using O_DIRECT everywhere so I would assume yes? So we can roughly estimate the minimal values for a concrete underlying write speed? e.g. if we try to measure the write speed outside of oVirt with fio or even dd or something... |
The time include lot of waiting since we use polling.
We cannot optimize much the storage mailbox since checking the mailbox requires We can optimize the block threshold event handling - it should really post an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linter says that from vdsm.config import config
in blockvolume_test
is unsed now and fails
Yes, we use direct I/O, and the extend script is using direct I/O inside the Measuring write throughput should be done in the guest, there is big difference |
flake8 is correct, fixed in current version. |
yes, but isn't it just that it's always somewhat slower in guest, never faster? In that case it is still useful as then we can measure easily on the host and estimate for the worst case of near-host performance in guest |
I don't see how it can be faster in the guest. But it is not possible to measure the host |
@vjuranek should be ready now. |
Test depending on configuration options must use mock config object to avoid failing when configuration is modified, or when running on a host with non default config. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Set threshold test was using hard coded value assuming old vdsm configuration. Change the test to use vdsm configuration so it does not break when the configuration is changed. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This horrible test was depending on default configuration instead of using the config, and is written in a way that make it hard to use the config. The horrible make_env() context manager using the crappy storagetestlib was mocking everything after creating the volumes that need mocking to consider the configuration. Change to make create everything inside the mock context. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Collecting extend stats shows that extend takes between 2.2 to 6.2 seconds, with average of 3.7 seconds. With the default thresholds: [irs] volume_utilization_chunk_mb = 1024 volume_utilization_percent = 50 This means that we extend the volume when free space is 512 MiB. Writing more than 512 MiB in 3.7 seconds (138.4 MiB/s) will cause the VM to pause with ENOSPC. This configuration was too low 10 years ago, and we need to update it for modern storage. Update the values to allow 4 times faster writes before we pause with ENOSPC. With the new configuration: [irs] volume_utilization_chunk_mb = 2560 volume_utilization_percent = 20 We extend the volume when free space is 2048 MiB. Testing with old and new configuration show that we can cope now with 4x times faster write rate before we VMs pause during extend. Before: write rate extends pauses ---------------------------- 75 MiB/s 50 0 100 MiB/s 50 4 125 MiB/s 50 4 150 MiB/s 53 24 After: write rate extends pauses ---------------------------- 200 MiB/s 20 0 250 MiB/s 20 0 300 MiB/s 20 0 350 MiB/s 21 0 400 MiB/s 20 1 450 MiB/s 20 2 500 MiB/s 22 7 550 MiB/s 23 7 The downside of this change is allocating more space in the storage domain. New empty disk will consume 2.5 GiB instead of 1 GiB. Bug-Url: https://bugzilla.redhat.com/2051997 Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
4x is a big improvement. Still, with a bit over 6 seconds it means we will still eventually pause with >300MiB/s writes even in quiet conditions. Is it high enough? Should we go for 500 increasing the overhead? OTOH it's really not a big deal to pause occasionally, we just want to avoid frequent pauses every time e.g. etcd is extended. @mykaul FYI |
Looks good - time to change the defaults indeed. |
Collecting extend stats shows that extend takes between 2.2 to 6.2
seconds, with average of 3.7 seconds. With the default thresholds:
[irs]
volume_utilization_chunk_mb = 1024
volume_utilization_percent = 50
This means that we extend the volume when free space is 512 MiB. Writing
more than 512 MiB in 3.7 seconds (138.4 MiB/s) will cause the VM to
pause with ENOSPC.
This configuration was too low 10 years ago, and we need to update it
for modern storage. Update the values to allow 4 times faster writes
before we pause with ENOSPC.
With the new configuration:
[irs]
volume_utilization_chunk_mb = 2560
volume_utilization_percent = 20
We extend the volume when free space is 2048 MiB.
Testing with old and new configuration show that we can cope now with 4x
times faster write rate before we VMs pause during extend.
Before:
write rate extends pauses
75 MiB/s 50 0
100 MiB/s 50 4
125 MiB/s 50 4
150 MiB/s 53 24
After:
write rate extends pauses
200 MiB/s 20 0
250 MiB/s 20 0
300 MiB/s 20 0
350 MiB/s 21 0
400 MiB/s 20 1
450 MiB/s 20 2
500 MiB/s 22 7
550 MiB/s 23 7
The downside of this change is allocating more space in the storage
domain. New empty disk will consume 2.5 GiB instead of 1 GiB.
Bug-Url: https://bugzilla.redhat.com/2051997
Signed-off-by: Nir Soffer nsoffer@redhat.com