New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose age of oldest entry in the translog #28189
Comments
I think you can set the translog size limitation to big number and just monitor the size of the translog. That said, I also think it make sense to expose this information in the stats. I'm marking this as an adopt me and a low hanging fruit. |
Also, if you want to put up a PR, I can help with guidance where needed. |
Thanks, I'll put something together for this |
Had a look at this as a starting point for contribution. Looking how to implement this it looks like it would require:
As this is labelled low hanging fruit I am wondering if I have thought about this correctly or not? |
I did an initial pass at an implementation for this and it is a bit trickier than it appears. The translog itself does not appear to track any timestamps. Instead, it is relying on the file system metadata ( While it might be nice if we just made operations in the translog tracked a timestamp, I think that would be overkill for solving the problem at hand. What I was thinking is in Alternatively, we continue to rely on the file system metadata and say that this feature only supports file systems that track creation time. |
Thanks for the insight! Implementation of the file create time with fallback to |
@justinwyer thanks for your interest.
This seems reasonable.
Why do think we need something more? we already have something to get a stats object?
What were thinking to do here? I think we can use the current list of readers. Something like
I'd prefer to use exactly the same information that the deletion policy uses. Why did you think it's need to change to creation time? I'm fine with caching things on the readers. We have to figure out what the age stats should say about the writer. I'm ok with just returning now for that one. |
The idea was the age of the oldest operation would be the current time minus the timestamp of the first operation in the oldest generation. Since there isn't a timestamp per operation then this can be approximated by the creation time of the oldest generation. This would tell you how long the translog has you covered for. It sounds like I was over complicating it though, the last modified time on the oldest generation should be good too. |
Do we have a preference for time handling ie. using Joda So my question for the testing, are we happy with testing that we get a number and that it's older than some arbitrary amount (1 at least) of milliseconds? |
…translog stats item. This uses the last modified time of the earliest generation.
Excellent, thanks @justinwyer and @bleskes |
Expose the age of translog files in the translog stats. This is useful to reason about your translog retention policy. Closes #28189
As per the faster recovery benefits discussed here, it is desirable to find the right size for your particular setup. However, unless I am not looking in the right places, I haven't been able to find a good way to figure out the right size of the translog from the information that is currently exposed. I think it would be helpful if the translog stats were extended to also expose the age of oldest entry to aid with the monitoring and sizing of the translog (
index.translog.retention.size
).For example, suppose I typically expect my nodes to go down for ~30 minutes (i.e. maintenance/upgrades/patches). I would want to make sure that my translog is sized properly to hold operations for at least that long (probably more as buffer).
If this feature were present then I might see that my shards have their translog's sized at 512mb and the oldest entry is 20m. I might then conclude that doubling my translog will retain enough entries for 40 minutes, covering my expected downtime.
I realize for the scenario I have described, you could do something like setting
index.translog.retention.age=40m
andindex.translog.retention.size=$big_number
but it would be nice to take the guess work out of the equation (what is $big_number?). By adding this stat, you could monitor it over time and alarm on it.The text was updated successfully, but these errors were encountered: