New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disk_wait displayed by zpool iostat depends on sampling interval #7694
Comments
|
I have to add that wait time histogram profile does not depend (much) on sampling period. Histogram: |
|
Similarly, asyncq_read pending count depends on sampling time. It can be my misunderstanding but how it is possible to have 40 pending async reads within 10 sec sampling period; 200 to 400 for 1 sec sampling; 2K to 3K during 100 ms sampling window? |
|
I don't see a bug here. But know that due to the transactional nature of ZFS, continuous measurements sampled at rates faster than the commit interval are of questionable use. I suggest moving the conversation to the email list as a bug tracker is not the appropriate forum. |
|
I will move discussion to the mail list but let me put cross check computation here to illustrate the point. "Sampling faster than the commit interval" is not an issue here but I'll use higher interval durations for the illustration. The histogram of wait times above was taken with 5 second interval. The distribution shape and position is about the same when it was taken over the longer (or shorter) interval ; and it does not change by factor two for sure when I change sampling rate by factor two. average_wait_time = sum( N[i] * t[i] ) / sum ( N[i] ), For the histogram below: sum ( N[i] ) = 741 + 343 + 379 +496 + 1150 + 1390 + 1350 + 1017 + 374 + 76 + 3 wait_time = 329702.579/7319 = 45.0 ms. From the shape of distribution below it is apparent 45 ms is the right number: The third entry in the original posting taken with 1 sec interval gives the measurement with numerically correct value 40 ms ~= 45 ms. The measured "average" wait time 8ms taken with 5 sec interval is not consistent with 45 ms disk wait time we computed from the histogram. The average disk wait time taken over 100 sec >> commit interval shows disk wait time average 385us << 45 ms. This is not right. Here is illustration that the shape of the density distribution of disk wait time over the time does not depend on sampling interval : Overall, it seems the wait time computation has a bug and/or does not do proper normalization. |
|
The bug (with iostats -l) is a confusion of the units used: When calculating the first two it makes sense to do x/interval_duration (x being the increase in total bytes or number of requests over the duration of the interval, interval_duration in seconds) to 'scale' from amount/interval_duration to amount/second. But applying the same math for the latter (*_wait latencies) is wrong as there is no interval_duration component in the values (these are time/requests to get to average_time/request). Currently the only correct continuous *_wait figures from zpool iostat -l are with duration=1 (as then the wrong math is a x/1 nop). |
|
IIRC, the calculation takes the diff of the latency histograms before and after the interval and takes the average of that. It's not super accurate at small polling intervals or if there's not a lot of IO. I think we poll at 10 or 20 seconds in our sampling scripts. |
|
Arn't there kstat_io and kstat_timer structures (starting at zero on import, never reset) being maintained per pool and vdev already? With these the easiest would be to make a local copy of them (or the interesting parts) at the beginning of each interval, while at the same time create a diff vs. the previous state to get the increments since the last cycle (with the local copy being initialized to zero this would automatically give 'since boot' figures) and output the result, rinse&repeat. Should be way cheaper than having to sift through diffs of the histograms and it would be accurate even for small polling intervals... |
|
@GregorKopka : I agree with your analysis. I found old machine with zfs on solaris; displayed disk latency value does not depend on sampling interval and stays 17ms for default; and 10 sec and 100 sec: Back to linux machine. Here is lengthy output taken for /usr/bin/iostat and zpool iostat. Drive reads: Watch stats, compare 'disk_wait read' and 'r_await' 1 second: 10 second 100 second |
|
I have a very similar (if not the same) problem here... System information
Describe the problem you're observing
Describe how to reproduce the problemAs you can see, all latency number are expressed in Rather, what Using a 1s sampling interval, as suggested above, seems to produce reasonable numbers. Moreover, multiplying |
|
@shodanshok As you're after latency: the -w switch should give what you look for. |
|
Yes, the latency histogram is working properly. Thanks. |
Bandwidth and iops are average per second while *_wait are averages per request for latency or, for queue depths, an instantaneous measurement at the end of an interval (according to man zpool). When calculating the first two it makes sense to do x/interval_duration (x being the increase in total bytes or number of requests over the duration of the interval, interval_duration in seconds) to 'scale' from amount/interval_duration to amount/second. But applying the same math for the latter (*_wait latencies/queue) is wrong as there is no interval_duration component in the values (these are time/requests to get to average_time/request or already an absulute number). As of this bug currently the only correct continuous *_wait figures for both latencies and queue depths from 'zpool iostat -l' are with duration=1 as then the wrong math cancels itself (x/1 is a nop). This removes temporal scaling from latency and queue depth figures. Closes: openzfs#7694 Signed-off-by: Gregor Kopka <gregor@kopka.net>
Bandwidth and iops are average per second while *_wait are averages per request for latency or, for queue depths, an instantaneous measurement at the end of an interval (according to man zpool). When calculating the first two it makes sense to do x/interval_duration (x being the increase in total bytes or number of requests over the duration of the interval, interval_duration in seconds) to 'scale' from amount/interval_duration to amount/second. But applying the same math for the latter (*_wait latencies/queue) is wrong as there is no interval_duration component in the values (these are time/requests to get to average_time/request or already an absulute number). As of this bug currently the only correct continuous *_wait figures for both latencies and queue depths from 'zpool iostat -l' are with duration=1 as then the wrong math cancels itself (x/1 is a nop). This removes temporal scaling from latency and queue depth figures. Closes: openzfs#7694 Signed-off-by: Gregor Kopka <gregor@kopka.net>
Bandwidth and iops are average per second while *_wait are averages per request for latency or, for queue depths, an instantaneous measurement at the end of an interval (according to man zpool). When calculating the first two it makes sense to do x/interval_duration (x being the increase in total bytes or number of requests over the duration of the interval, interval_duration in seconds) to 'scale' from amount/interval_duration to amount/second. But applying the same math for the latter (*_wait latencies/queue) is wrong as there is no interval_duration component in the values (these are time/requests to get to average_time/request or already an absulute number). As of this bug currently the only correct continuous *_wait figures for both latencies and queue depths from 'zpool iostat -l' are with duration=1 as then the wrong math cancels itself (x/1 is a nop). This removes temporal scaling from latency and queue depth figures. Closes: openzfs#7694 Signed-off-by: Gregor Kopka <gregor@kopka.net>
Bandwidth and iops are average per second while *_wait are averages per request for latency or, for queue depths, an instantaneous measurement at the end of an interval (according to man zpool). When calculating the first two it makes sense to do x/interval_duration (x being the increase in total bytes or number of requests over the duration of the interval, interval_duration in seconds) to 'scale' from amount/interval_duration to amount/second. But applying the same math for the latter (*_wait latencies/queue) is wrong as there is no interval_duration component in the values (these are time/requests to get to average_time/request or already an absulute number). This bug leads to the only correct continuous *_wait figures for both latencies and queue depths from 'zpool iostat -l' are with duration=1 as then the wrong math cancels itself (x/1 is a nop). This removes temporal scaling from latency and queue depth figures. Closes: openzfs#7694 Signed-off-by: Gregor Kopka <gregor@kopka.net>
Bandwidth and iops are average per second while *_wait are averages per request for latency or, for queue depths, an instantaneous measurement at the end of an interval (according to man zpool). When calculating the first two it makes sense to do x/interval_duration (x being the increase in total bytes or number of requests over the duration of the interval, interval_duration in seconds) to 'scale' from amount/interval_duration to amount/second. But applying the same math for the latter (*_wait latencies/queue) is wrong as there is no interval_duration component in the values (these are time/requests to get to average_time/request or already an absulute number). This bug leads to the only correct continuous *_wait figures for both latencies and queue depths from 'zpool iostat -l/q' being with duration=1 as then the wrong math cancels itself (x/1 is a nop). This removes temporal scaling from latency and queue depth figures. Closes: openzfs#7694 Signed-off-by: Gregor Kopka <gregor@kopka.net>
Bandwidth and iops are average per second while *_wait are averages per request for latency or, for queue depths, an instantaneous measurement at the end of an interval (according to man zpool). When calculating the first two it makes sense to do x/interval_duration (x being the increase in total bytes or number of requests over the duration of the interval, interval_duration in seconds) to 'scale' from amount/interval_duration to amount/second. But applying the same math for the latter (*_wait latencies/queue) is wrong as there is no interval_duration component in the values (these are time/requests to get to average_time/request or already an absulute number). This bug leads to the only correct continuous *_wait figures for both latencies and queue depths from 'zpool iostat -l/q' being with duration=1 as then the wrong math cancels itself (x/1 is a nop). This removes temporal scaling from latency and queue depth figures. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gregor Kopka <gregor@kopka.net> Closes openzfs#7945 Closes openzfs#7694
Bandwidth and iops are average per second while *_wait are averages per request for latency or, for queue depths, an instantaneous measurement at the end of an interval (according to man zpool). When calculating the first two it makes sense to do x/interval_duration (x being the increase in total bytes or number of requests over the duration of the interval, interval_duration in seconds) to 'scale' from amount/interval_duration to amount/second. But applying the same math for the latter (*_wait latencies/queue) is wrong as there is no interval_duration component in the values (these are time/requests to get to average_time/request or already an absulute number). This bug leads to the only correct continuous *_wait figures for both latencies and queue depths from 'zpool iostat -l/q' being with duration=1 as then the wrong math cancels itself (x/1 is a nop). This removes temporal scaling from latency and queue depth figures. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gregor Kopka <gregor@kopka.net> Closes openzfs#7945 Closes openzfs#7694
Bandwidth and iops are average per second while *_wait are averages per request for latency or, for queue depths, an instantaneous measurement at the end of an interval (according to man zpool). When calculating the first two it makes sense to do x/interval_duration (x being the increase in total bytes or number of requests over the duration of the interval, interval_duration in seconds) to 'scale' from amount/interval_duration to amount/second. But applying the same math for the latter (*_wait latencies/queue) is wrong as there is no interval_duration component in the values (these are time/requests to get to average_time/request or already an absulute number). This bug leads to the only correct continuous *_wait figures for both latencies and queue depths from 'zpool iostat -l/q' being with duration=1 as then the wrong math cancels itself (x/1 is a nop). This removes temporal scaling from latency and queue depth figures. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gregor Kopka <gregor@kopka.net> Closes #7945 Closes #7694
Bandwidth and iops are average per second while *_wait are averages per request for latency or, for queue depths, an instantaneous measurement at the end of an interval (according to man zpool). When calculating the first two it makes sense to do x/interval_duration (x being the increase in total bytes or number of requests over the duration of the interval, interval_duration in seconds) to 'scale' from amount/interval_duration to amount/second. But applying the same math for the latter (*_wait latencies/queue) is wrong as there is no interval_duration component in the values (these are time/requests to get to average_time/request or already an absulute number). This bug leads to the only correct continuous *_wait figures for both latencies and queue depths from 'zpool iostat -l/q' being with duration=1 as then the wrong math cancels itself (x/1 is a nop). This removes temporal scaling from latency and queue depth figures. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gregor Kopka <gregor@kopka.net> Closes openzfs#7945 Closes openzfs#7694
System information
Describe the problem you're observing
disk_wait delay displayed by zpool iostat depends on sampling interval of iostat (it must not).
The load is static during the test. I'm reading 16 10GB files with dd; all samples were taken during the same run.
There is computation error in disk wait time: it shall not depend on measurement or display options.
syncq_wait/asyncq_wait may need check too.
Displayed disk wait vs sampling interval :
Describe how to reproduce the problem
Read 10 GB files in 16 parallel streams to create contention in reads:
zpool is raidz2 on 12 HDD.
Run zpool iostat with sampling interval 5 sec; 2 sec; 1 sec; 0.5 sec; 0.1 sec:
Run zpool iostat with sampling interval 100 sec :
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: