-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce translog size and age based retention policies #25147
Introduce translog size and age based retention policies #25147
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a couple of comments
final IndexSettings indexSettings = engineConfig.getIndexSettings(); | ||
translogDeletionPolicy.setMaxRetentionAgeInMillis(indexSettings.getTranslogRetentionAge().getMillis()); | ||
translogDeletionPolicy.setRetentionSizeInBytes(indexSettings.getTranslogRetentionSize().getBytes()); | ||
translog.trimUnreferencedReaders(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do this in the settings loop. if these settings changed and the user wants them to take effect they should call _flush
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
return Math.min(minByAgeAndSize, Math.min(minByView, minTranslogGenerationForRecovery)); | ||
} | ||
|
||
private long getMinTranslogGenBySize(List<TranslogReader> readers, TranslogWriter writer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you wanna make them pkg private then you can test them in isolation. I'd love to see that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. I took it a step further and made them static, makes things easier.
assert path.getFileName().toString().endsWith(Translog.TRANSLOG_FILE_SUFFIX) : "new file ends with old suffix: " + path; | ||
assert checkpoint.numOps >= 0 : "expected at least 0 operatin but got: " + checkpoint.numOps; | ||
assert checkpoint.numOps >= 0 : "expected at least 0 operation but got: " + checkpoint.numOps; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just an idea, should we use the last modification time on the file instead? I mean it should be ok to do that no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tja, maybe. I tried to avoid the entire what metadata is supported where discussion. I mean we walk to great length to avoid directory traversals. I think this is simple, solid and entirely self contained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I agree with this. it's redundant information and no traversal needed. you have the file you just need to read the time created. I am really careful with any kind of added metadata here you know why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really careful with any kind of added metadata here you know why
I totally hear you. It's clearly and edge case. One other aspect of they way I set it up now is that it's easy to test. With FS based metadata we'd probably have to mock something, which is also doable but messier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lean towards using the filesystem mtime here. I do not think this would be messy to mock, only a method that you can override in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I personally feel it might end up messier but I trust your judgement. I'll roll back the extra inlined creation date and move to using FS mtime. We can always re-add the inline creation date if we run into trouble. Do-The-Minimal-Change-Needed™️
initialMinTranslogGen, minTranslogGenerationSupplier, System.currentTimeMillis()); | ||
} | ||
|
||
static TranslogWriter create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugly, can you put multiple on the same line
…on_size_age # Conflicts: # core/src/test/java/org/elasticsearch/index/translog/TranslogTests.java
# Conflicts: # core/src/test/java/org/elasticsearch/index/translog/TranslogTests.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments, all minor, I skipped all the parts that had anything to do with ctime given our previous discussion.
Property.IndexScope); | ||
|
||
public static final Setting<TimeValue> INDEX_TRANSLOG_RETENTION_AGE_SETTING = | ||
Setting.timeSetting("index.translog.retention.age", TimeValue.timeValueHours(-1), TimeValue.timeValueMillis(-1), Property.Dynamic, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value violates the minimum value? I'm surprised that that's permitted and something doesn't break elsewhere but can you clean this up here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
Setting.byteSizeSetting("index.translog.retention.size", new ByteSizeValue(-1, ByteSizeUnit.MB), Property.Dynamic, | ||
Property.IndexScope); | ||
|
||
public static final Setting<TimeValue> INDEX_TRANSLOG_RETENTION_AGE_SETTING = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please add Javadocs explaining this setting?
@@ -111,6 +112,14 @@ | |||
Setting.byteSizeSetting("index.translog.flush_threshold_size", new ByteSizeValue(512, ByteSizeUnit.MB), Property.Dynamic, | |||
Property.IndexScope); | |||
|
|||
public static final Setting<ByteSizeValue> INDEX_TRANSLOG_RETENTION_SIZE_SETTING = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please add Javadocs explaining this setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Between size and age you have an inconsistent ordering, sometimes size then age, sometimes age then size. Can you please pick a consistent ordering for the pairing?
@@ -265,6 +276,9 @@ public IndexSettings(final IndexMetaData indexMetaData, final Settings nodeSetti | |||
syncInterval = INDEX_TRANSLOG_SYNC_INTERVAL_SETTING.get(settings); | |||
refreshInterval = scopedSettings.get(INDEX_REFRESH_INTERVAL_SETTING); | |||
flushThresholdSize = scopedSettings.get(INDEX_TRANSLOG_FLUSH_THRESHOLD_SIZE_SETTING); | |||
translogRetentionAge = scopedSettings.get(INDEX_TRANSLOG_RETENTION_AGE_SETTING); | |||
translogRetentionSize = scopedSettings.get(INDEX_TRANSLOG_RETENTION_SIZE_SETTING); | |||
flushThresholdSize = scopedSettings.get(INDEX_TRANSLOG_FLUSH_THRESHOLD_SIZE_SETTING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicates line 278?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep :(
private long maxRetentionAgeInMillis; | ||
|
||
public TranslogDeletionPolicy(long retentionSizeInBytes, long maxRetentionAgeInMillis) { | ||
this.retentionSizeInBytes = retentionSizeInBytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no max
on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question. Will add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will actually remove the max from the other one. Shorter.
long minByAge = getMinTranslogGenByAge(readers, writer, maxRetentionAgeInMillis, currentTime()); | ||
long minBySize = getMinTranslogGenBySize(readers, writer, retentionSizeInBytes); | ||
long minByAgeAndSize = Math.max(minByAge, minBySize); | ||
if (minByAgeAndSize == Long.MIN_VALUE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a matter of taste but I think this is clearer:
diff --git a/core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java b/core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java
index 9dd72526ca..216c1f243c 100644
--- a/core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java
+++ b/core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java
@@ -102,10 +102,12 @@ public class TranslogDeletionPolicy {
long minByView = getMinTranslogGenRequiredByViews();
long minByAge = getMinTranslogGenByAge(readers, writer, maxRetentionAgeInMillis, currentTime());
long minBySize = getMinTranslogGenBySize(readers, writer, retentionSizeInBytes);
- long minByAgeAndSize = Math.max(minByAge, minBySize);
- if (minByAgeAndSize == Long.MIN_VALUE) {
+ final long minByAgeAndSize;
+ if (minBySize == Long.MIN_VALUE && minByAge == Long.MAX_VALUE) {
// both size and age are disabled;
minByAgeAndSize = Long.MAX_VALUE;
+ } else {
+ minByAgeAndSize = Math.max(minBySize, minByAge);
}
return Math.min(minByAgeAndSize, Math.min(minByView, minTranslogGenerationForRecovery));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
} | ||
translog.getDeletionPolicy().setRetentionAgeInMillis(randomBoolean() ? 100 : -1); | ||
assertBusy(() -> { | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to clean this up but I need another PR
@s1monw @jasontedor can you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me I think the only question is whether or not we should fsync the metadata when we close a writer into a reader. I do not seeing this being a question that should hold up this PR. I also do not see this being a challenging thing to do, we already sync (without syncing metadata) when we close into a reader, so I think we can merely push a boolean down to change this. Again, separate PR! LGTM.
Thx @s1monw & @jasontedor for the multiple iterations |
…y-context * 'master' of github.com:elastic/elasticsearch: (21 commits) [DOCS] Clarify expected availability of HDFS for the HDFS Repository (elastic#25220) Remove some redundant 140 character checkstyle suppressions [Docs] more fix for the parent-join docs [Docs] Fix cross reference for parent-join field More advices around search speed and disk usage. (elastic#25252) Add documentation for the new parent-join field (elastic#25227) [analysis-icu] Allow setting unicodeSetFilter (elastic#20814) Introduce translog size and age based retention policies (elastic#25147) Add needs methods for specific variables to Painless script context factories. (elastic#25267) Improves snapshot logging and snapshoth deletion error handling (elastic#25264) Add unit test for PathHierarchyTokenizerFactory (elastic#24984) Deprecate tribe service Moved more token filters to analysis-common module. [Test] Make sure that SearchAfterSortedDocQueryTests uses a single threaded searcher [DOCS] Defined es-test-dir and plugins-examples-dir in index.asciidoc. (elastic#25232) Test fix - removed superfluous assertion (elastic#25247) [Test] restore BWC for parent-join now that the new mapping format is in 5.x Add a section named "relations" in the ParentJoinFieldMapper (elastic#25248) test: Ported more OldIndexBackwardsCompatibilityIT tests to full cluster restart qa tests. (elastic#25173) fix: Sort Processor does not have proper behavior with targetField (elastic#25237) ...
#25147 added the translog deletion policy but didn't enable it by default. This PR enables a default retention of 512MB (same maximum size of the current translog) and an age of 12 hours (i.e., after 12 hours all translog files will be deleted). This increases to chance to have an ops based recovery, even if the primary flushed or the replica was offline for a few hours. In order to see which parts of the translog are committed into lucene the translog stats are extended to include information about uncommitted operations. Views now include all translog ops and guarantee, as before, that those will not go away. Snapshotting a view allows to filter out generations that are not relevant based on a specific sequence number. Relates to #10708
This PR extends the TranslogDeletionPolicy to allow keeping the translog files longer than what is needed for recovery from lucene. Specifically, we allow specifying the total size of the files and their maximum age (i.e., keep up to 512MB but no longer than 3 hours). This will allow making ops based recoveries more common.
To achieve the above the translog files are extended to contain a timestamp, indicating when they were created.
Note that the default size and age still set to 0, maintaining current behavior. This is needed as the other components in the system are not yet ready for a longer translog retention. I will adapt those in follow up PRs.
Relates to #10708