You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
A single recorded time (sample) affects a Timer's max() for a longer time than percentiles produced by the Timer. With default DistributionStatisticConfig the effective expiry is one minute for the percentiles and three minutes for max. The issue is related to how TimeWindowMax and AbstractTimeWindowHistogram interpret the value of DistributionStatisticConfig.expiry. Both use ring buffers of same size to implement the decay, but while the former only moves by one buffer position in intervals equal to expiry, the latter is implemented to do a full rotation of the buffer in the same time.
0s: (1.92937984E8 at 50.0%) max 200
30s: (1.92937984E8 at 50.0%) max 200
60s: (0.0 at 50.0%) max 200
90s: (0.0 at 50.0%) max 200
120s: (0.0 at 50.0%) max 200
150s: (0.0 at 50.0%) max 200
180s: (0.0 at 50.0%) max 0
Expected behaviour
A single sample ceases to affect percentiles and timer max at the same point in time.
Related issues
In #2751 there is a complaint about max not expiring in expected time, but response was that the TimeWindowMax uses expiry (and bufferLength) from DistributionStatisticConfig right.
The text was updated successfully, but these errors were encountered:
gdabski
changed the title
Timer max expiry inconsistent with histogram expiry
Timer max expiry inconsistent with percentiles expiry
Jul 16, 2022
I just stepped onto this issue.
I think I found the root cause - it is the way how durationBetweenRotatesMillis is initialized. For some reasons expiry time from configuration is divided there by bufferLength (ageBuckets variable) unlike in TimeWindowMax where expiry time is taken directly. This means that effective expiry time (when metric is reset to 0 after single request) for max metric is expiry * bufferLength but for percentiles it is only expiry.
AFAIK documentation does not reflect this difference and for me it is confusing so I'm assuming it is a bug.
For the default configuration where expiry/step is set to 1 minute and bufferLength is set to 3, assuming metrics are scrapped every minute, in unfortunate case when scrapping occurs immediately after metrics are rotated, requests from 20 seconds window out of 1 minute (33% of data) are not taken into account in percentile metrics at all...
I'm planning to create a PR changing durationBetweenRotatesMillis initialization.
Describe the bug
A single recorded time (sample) affects a
Timer
'smax()
for a longer time than percentiles produced by theTimer
. With defaultDistributionStatisticConfig
the effective expiry is one minute for the percentiles and three minutes for max. The issue is related to howTimeWindowMax
andAbstractTimeWindowHistogram
interpret the value ofDistributionStatisticConfig.expiry.
Both use ring buffers of same size to implement the decay, but while the former only moves by one buffer position in intervals equal toexpiry
, the latter is implemented to do a full rotation of the buffer in the same time.Environment
To Reproduce
Prints:
Expected behaviour
A single sample ceases to affect percentiles and timer max at the same point in time.
Related issues
In #2751 there is a complaint about max not expiring in expected time, but response was that the
TimeWindowMax
usesexpiry
(andbufferLength
) fromDistributionStatisticConfig
right.The text was updated successfully, but these errors were encountered: