-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random reads by 9 z_wr_iss threads crippling sequential write performance #7594
Comments
Bump Does anyone know what |
Are you running zed (the ZFS event daemon)? |
It should be running by default, so yes |
If your pool has autoexpand=on you might be getting bitten by #7366. The
symptom sounds very familiar. Disabling either zed or autoexpand would
prevent the problem for now.
…On 19 June 2018 23:27:18 Atemu ***@***.***> wrote:
It should be running by default, so yes
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
I can't help you then, sorry! ^_^ |
Laymen's guess - you have dedupe enabled, and you only have 16 GB of RAM. ~2.8 GB is taken up by your DDT (assuming it's all in memory). Writes trigger reads on dedup'd pools if the entire DDT is not in RAM. |
That's what I thought as well but the ARC is almost never full and limiting it to metadata only doesn't help either (it's just even smaller). Btw is it intended that ARC empties itself automatically when it isn't even near the limit? |
the z_wr_iss taskq handles (wr)ites (iss)ued. In the ZIO pipeline writes can cause reads, but this In general, you should get more response from the mailing list because github is a poor tool for |
Thanks a bunch for your response, that was very insightful! Do you know of a way to force the space maps to be cached all the time? |
Yes, the zfs-module-parameters(8) tunable to control unloading of space maps is |
I set |
This is old, but I'd like to offer a few suggestions that may help. Step 1 with these situations is almost always to use zpool iostat -r to your advantage. Find out what is being read - is it pool data, or is it metadata? What is your typical recordsize? Are they showing up as sync or async reads? If it's pool data (reads are the same size as your recordsize), then you may be seeing RMW reads during TxG commit, as it brings in the missing pieces of each block that need to be written. These can get nasty and dominate the workload in some cases as we found out on Solaris with ZFS in the early days. Generally, RMW for sync writes will create sync reads, while RMW for async writes will create async reads. If they look like they may be RMW reads, you can try: Decreasing your recordsize Decreasing the recordsize has a number of effects and may or may not be a practical decision. You need to look at your write distribution and try to figure out if it's worth doing. ZFS does so much better with large records, I consider going below about 32K a last resort except for small special-purpose cases. If you delay TxG commit, you may be able to get all the pieces of a block before you need to write it. This is usually a matter of growing the txg timeout, dirty_data_sync, and possibly dirty_data_max. This is most useful with applications that drip out small writes and periodically do a "checkpoint" with a lot of big sequential writes. MongoDB is one example. The TxG commit, and RMW reads that issue from it, can be limited with zfs_sync_taskq_batch_pct. The default (75%) is greatly oversized for most modern systems. Try decreasing it sharply (to 5-10%) and see if the commit and the reads get more orderly and stress the system less. This would be my first step. You can usually go quite low before the speed of the TxG commit itself is impacted. Finally, you can deprioritize RMW reads at the vdev level. If they're sync reads (RMW for earlier sync writes), there are a lot of ways to do this, but they tend to rely on the fact that sync reads are either feast or famine, they are either not a problem or they bury the box. Try decreasing the sync read minimum active to 1-2, and you will move other IO classes up to much higher in the queueing order. I'm working on an article on this to clarify more. Hope this helps. |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
System information
Describe the problem you're observing
I have high random read access by 9
z_wr_iss
threads while writing data to the zpool.Each of those threads reads 100-300k/s which results in 100% disk utilization in
iostat -x
and cripples the sequential write performance of the writing application.After ~30s they stop reading and I get very good write performance until they start reading again after a couple seconds.
I am not running into single- or multithreaded CPU bottlenecks and RAM is not limiting anything either (as far as I can tell).
Describe how to reproduce the problem
/dev/urandom
or copy any non empty file)iotop
and sort by read speedInclude any warning/errors/backtraces from the system logs
There are no errors, only unexpected/unwanted behavior; I'll put stats here instead:
zpool status -D
cat /proc/spl/kstat/zfs/arcstats
/r/zfs thread about my issue
The text was updated successfully, but these errors were encountered: