New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Directory : provide a last_seen hint #183
Comments
Which component would handle the LASTSEEN query? |
It would be the Directory (through egress). LASTSEEN would be just a FIND which export the lastseen field as a value. It could also be a FIND directly... since it's not really intrusive and integrate the current output format. /find could handle it with a &lastseen=true (default to false) QS, appending the lastseen TS value at the end of the output : |
I stated that Stores would be a better place to implement this, but actually on Ingress there is already metadataCache that could be use for this. Today the assigned value is just null, but we could set the current timestamp at the time an update query is done. Then a regular scan would be performed over the map, comparing values to 'now' and would trigger pushMetadataMessage if now - lastseen > Configuration.DIRECTORY_GTS_FRESHNESS_LIMIT. Alternativement, metadataCache could just be a LRU, and on eviction, a pushEvictionMetadataMessage is performed and consumed by Directory to update its index entry. In this case, IndexSpec would just mention the GTS as "evicted". |
This behavior cannot be implemented as you suggest, because this would basically be the equivalent of a massive periodic cache flush of all the series which are actively updated. We will need to introduce some jitter so the pushes to the metadata topic are spread in time. The notion of 'lastseen' also needs to be defined more precisely, you seem to imply that it only tracks datapoints being pushed on the GTS, but what about deletes and/or attribute updates (meta)? Let's think about it more thoroughly and update this issue as ideas become less blurry. |
I agree on a jitter to manage the cache flush. 'lastseen' correspond to the last timestamp we saw a datapoint. It doesn't inherit the datapoint's own timestamp. Deletes would suppress both the GTS from Directory and from MetadataCache, regardless the lastseen value. Attributes only operates at Directory level and don't intervene with 'lastseen' which is only updated from Updates queries. 'lastseen' is just a hint (no atomic garantee), that shows the last Update activity for a given GTS. If we use a standard pushMetadataMessage() for cache flush, in case of deletes to be consumed, the flush needs to be stopped and reset and delayed for a while. This way we avoid a deleted GTS from Directory to be sent back from the cache flush. Otherwise, pushMetadataMessage() could be leveraged for this, and would prevent too much intrusive modifications. Basically it would work like if Ingress wouldn't have any MetadataCache. This feature could be controled through : // // // Example : ingress.metadata.cache.flush.enable = true => 24h to update 8.64M GTS ingress.metadata.cache.flush.period = 1000 => 24h to update 86.4M GTS (imply not receiving Delete queries) ingress.metadata.cache.flush.period = 100 => 24h to update 864M GTS (36M/h) |
We can add a third parameter to cache flush policy : ingress.metadata.cache.flush.enable = true This means while iterating over the map, we compare the 'lastseen' value with current timestamp. If (now - lastseen < TTL), we push. If (now - lastseen > TTL) then do nothing. Then we avoid pushing older GTS that didn't seen any activity. This can dramatically reduce the amount of messages. |
Sum up on last proposition. New configuration keys :
The idea is to leverage the ingress metadata cache for advertising updates on series. If (now - current_lastseen) < freshness : do nothing Lastseen should be enabled by default. Since it will be documented, it should be a normal behaviour, and disabling it should be considered custom deployment. From a user perspective : FIND : you could integrate it with FIND, answering a 0 TS key with lastseen as value :
LASTSEEN : if we don't want to break the FIND output, we can introduce a LASTSEEN function that is a FIND with the lastseen hint. FETCH / FIND with filtering on lastseen by passing PARAM_LASTSEEN(TS) and PARAM_LASTSEEN_FRESH (boolean) when in MAP parameter instead of default LIST parameter. DELETE : $WTOKEN 'selector' NULL NULL -n DELETE API /api/v0/find : the url query parameter &lastseen=on would add the last seen as value like in GTS InputFormat :
|
Current proposition in PR #200 |
To know if a serie is still active or not, you can fetch the last available datapoint and check its date matching your criteria.
If data auto eviction is enabled, then you won't even be able to lookup this last datapoint.
It would be very useful to have a last_seen hint per GTS that give the timestamp (second is enough) corresponding to the last update on a GTS.
There is two possibilities to implements this :
Store component is CPU bound and uses less memory than Ingress, since it maintains the meta caching, so in case of distributed deployment, it would be better implement this on the Store side.
It would maintain a structure, like a concurrent hash map, setting the last timestamp as a value for a key corresponding to the TS ID.
Data could be sampled, and produced in best effort to either directory (standalone) or a dedicated Kafka topic.
The struct IndexSpec could have the last_seen field so that we can at the end perform a LASTSEEN with an optional parameter :
[ 'RTOKEN' 'class_pattern' { labels } ] LASTSEEN
result would be :
This way, we can leverage all frameworks to manipulate the result (FILTER, ...) and easily get series older than n days.
This would be very helpful to manage the Directory.
The text was updated successfully, but these errors were encountered: