Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upSeries quarantined on query #2478
Comments
This comment has been minimized.
This comment has been minimized.
|
Sadly, 1.5.0 and 1.5.1 were both corrupting time series data. Even after upgrade to 1.5.2, those corruptions will occasionally lead to series quarantining (once those corruptions are hit when loading or persisting data from or to disk). The corruptions were affecting especially short-lived and/or sparse series, so you are right in the line of fire here. |
This comment has been minimized.
This comment has been minimized.
|
Thank you for quick response. Are there any patterns or methods by which it would be possible to determine or predict what TS are affected? |
This comment has been minimized.
This comment has been minimized.
|
The corruption happened on an unlucky coincidence of conditions, where the main one is that the series was in memory despite having all chunks persisted. This only happens to a series that doesn't receive samples anymore but is not yet archived. With a lot of series churn, you get that case more often. |
This comment has been minimized.
This comment has been minimized.
|
Thanks once again. |
funkelnd
closed this
Mar 7, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
funkelnd commentedMar 7, 2017
What did you do?
Queried for metric using query_range endpoint. On first try, got time series data.
What did you expect to see?
To see data returned on consecutive query.
What did you see instead? Under which circumstances?
Got no data. Later, only data available was with timestamps newer than second query.
Environment
There are about 300k metrics from various sources queried with 1 minute interval. There is 6 month retention set on data and it is being gathered for 9 months, so retention has long kicked in. There is large number of labels used, which basically are slightly modified versions of metric names. So each individual metric has a unique label. I am aware that this is anti-pattern, and this approach is being reworked. Same issue was present at least in 1.5.1. There have been ~250 occurrences in about 1.5 months, but non before.
Linux 2.6.32-573.22.1.el6.x86_64 x86_64
1.5.2