-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Highly uneven IO size during write on Zvol #3871
Comments
Would there be a change if you go below 1/2 (half) of your RAM size with ARC ? What are the other settings of your pools and zvols ? (compression ? noatime ? xattrs ?) |
@redtex I suspect your device mapper devices appear to be non-rotational which is causing 057b87c to launch [EDIT] It looks like you can poke a 1 into those files (echo 1 > /sys/block/dm-X/queue/rotational) before importing to pool which might fix the problem if so. |
Hi !!!
Have set ARC to 5G - same behavior.
|
comment below...
Average I/O size seems to be around 5k, confirmed by iostat data above. This usually implies
This has nothing to do with the ARC, so you can look elsewhere. It is not clear, from data provided, what the sample interval is. If the sample interval is small, |
The sample interval is 1 second. |
So, what advise will be ? To downgrade to 0.6.4.2 ? |
Upd: so, I discovered with help of iotop, that those 5-second cycle is txg_sync process, which flushes async_writes to discs. But when I try to strace it - I get error:
Maybe I can somehow get debug info by other way ? |
@redtex these are kernel threads so you can't follow them with strace. My suggestion would be to first roll back to 0.6.4.2 and characterize the behavior there. Then we'll have a much better idea how it's changed and why that might be. |
Is it possible, to use pool upgraded to 0.6.5 - features large_blocks and filesystem_limits are enabled, but not used - with zfs 0.6.4 ?? |
If you're able to import the pool r/w then it's safe to write to the pool. |
Hi !!!
Where
Regards, Wadim. |
@behlendorf there is any ideas ? |
@redtex Wandering back into this issue due to the #4512 reference. I've reviewed your last iostat output and it certainly does show a difference between the 2 top-level mirror vdevs. By any chance, was this pool originally created with a single mirror and then later the second mirror added? Although an earlier |
@redtex One other thing to check is that both your top-level vdevs have the same ashift. You can run |
@dweeezil Yes, pool was originally created with two mirrored vdevs. Vdevs consists of four same SAS 512 bytes/sector disks, so ashift for each vdev is 9. |
Today, I've discovered how to reproduce this issue: Configuration: zfs non-default tunables: First, create a fresh mirrored pool with 2 vdevs:
Run fio test: results for ZoL 0.6.4.2
As you see - noting unusual, all operations spread equally through physical disks. results for ZoL 0.7.0-rc1
Again, noting unusual, all operations spread equally through physical disks. Results even better than 0.6.4.2, especially writes: Now, turning primarycache=all results for ZoL 0.6.4.2
It's ok, all operations spread equally through physical disks. Results almost same, because ARC<=2Gb, but test volume=10GB drumroll......
Yes, that's it - one pair of disks are overloaded, and we see slightly different avgrq-sz. So, results twice worse, than without data caching: |
@redtex Thanks. I'll get this set up on my test rig today. |
@redtex Does this issue happen if the same test is run on the VM host? |
@redtex To clarify further, you are running zfs in the guest, correct? I don't have a CentOS guest handy at the moment so will be running this in a Ubuntu 14.04 guest with a 3.19 kernel initially. |
Yes, I'm running this tests on CentOS 7 VM. But absolutely same behaviour presents on pure hardware. |
@redtex I just ran my first 2 tests: one with primarycache=metadata and the other with primarycache=all and didn't see much difference.
and
The iostat didn't show anything terribly weird. When run with 1 second interval, the numbers were pretty much all over the place. This is with current master so I'll be trying next with an actual 0.7.0-rc1 since that doesn't have the highly re-worked ARC code due to the compressed ARC. I may just run these tests on the host now if that makes no difference. For your VM guest, however, I was wondering if you used |
Well.. my fio test run on 4Gb RAM virtual machine, i.e. ARC has 2Gb. So with this setup ARC fills up for 3 minutes. Until ARC not fully filled - the iostat did not shows anything unusual. Maybe, my disks (yes they has cache=none option) too fast, maybe you has to run tests some longer, than 5min, which is duration of my fio job. |
Here is my results from Fedora 24, kernel 4.7.5-200.fc24.x86_64
and
The uneven IO is present, but more rarely, than with CentOS 7 kernel 3.10 Of course, disks the same - actually, it's the same pool from CentOS tests, connected to Fedora 24 VM. |
Hi, I'm sorry if below info isn't relative, but at first glance the observed situation on FreeBSD looks similar with the current issue. |
@igsol looks like you point to https://www.illumos.org/issues/7090 , which is already ported in 3dfb57a to master branch. EDIT my bad, mixed up, it's the different commit and not ported to ZoL. |
@igsol thanks for pointing this out. We should adapt the fix from FreeBSD and see how it impacts performance. However, I don't see how it could be the root cause of this exact issue. The problematic function was only first enable by default in 0.7.0-rc3 and this issue predates that. |
Sure, you are right. In any case I am glad that the suspicious comparison fixed in FreeBSD will get attention of right people in ZoL. |
Upgraded production system from 0.6.4.2 to 0.7.6 |
Hi !!!
On a host - Centos 7.1 3.10.0-229.14.1.el7.x86_64, 32G RAM, ZoL 0.6.5.2 - which serves VM images in zvols via iSCSI (SCST), there is very strange situation:
After upgrade from 0.6.4 to 0.6.5 I noticed a significant performance drop - which seems like near 100% disks busy in iostat. It looks like:
and zpool iostat is
So, it's clearly seen, that one of mirrors has many times lower size of IO (avgrq-sz) which leads to huge performance drop.
And I noticed, that such behavior starts after some significant time of work (several hours). After system reboot.
The ARC size is 20G.
The text was updated successfully, but these errors were encountered: