Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce translog size and age based retention policies #25147

Merged
merged 17 commits into from
Jun 16, 2017

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Jun 9, 2017

This PR extends the TranslogDeletionPolicy to allow keeping the translog files longer than what is needed for recovery from lucene. Specifically, we allow specifying the total size of the files and their maximum age (i.e., keep up to 512MB but no longer than 3 hours). This will allow making ops based recoveries more common.

To achieve the above the translog files are extended to contain a timestamp, indicating when they were created.

Note that the default size and age still set to 0, maintaining current behavior. This is needed as the other components in the system are not yet ready for a longer translog retention. I will adapt those in follow up PRs.

Relates to #10708

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple of comments

final IndexSettings indexSettings = engineConfig.getIndexSettings();
translogDeletionPolicy.setMaxRetentionAgeInMillis(indexSettings.getTranslogRetentionAge().getMillis());
translogDeletionPolicy.setRetentionSizeInBytes(indexSettings.getTranslogRetentionSize().getBytes());
translog.trimUnreferencedReaders();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this in the settings loop. if these settings changed and the user wants them to take effect they should call _flush

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

return Math.min(minByAgeAndSize, Math.min(minByView, minTranslogGenerationForRecovery));
}

private long getMinTranslogGenBySize(List<TranslogReader> readers, TranslogWriter writer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you wanna make them pkg private then you can test them in isolation. I'd love to see that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. I took it a step further and made them static, makes things easier.

assert path.getFileName().toString().endsWith(Translog.TRANSLOG_FILE_SUFFIX) : "new file ends with old suffix: " + path;
assert checkpoint.numOps >= 0 : "expected at least 0 operatin but got: " + checkpoint.numOps;
assert checkpoint.numOps >= 0 : "expected at least 0 operation but got: " + checkpoint.numOps;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just an idea, should we use the last modification time on the file instead? I mean it should be ok to do that no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tja, maybe. I tried to avoid the entire what metadata is supported where discussion. I mean we walk to great length to avoid directory traversals. I think this is simple, solid and entirely self contained?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I agree with this. it's redundant information and no traversal needed. you have the file you just need to read the time created. I am really careful with any kind of added metadata here you know why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really careful with any kind of added metadata here you know why

I totally hear you. It's clearly and edge case. One other aspect of they way I set it up now is that it's easy to test. With FS based metadata we'd probably have to mock something, which is also doable but messier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean towards using the filesystem mtime here. I do not think this would be messy to mock, only a method that you can override in tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I personally feel it might end up messier but I trust your judgement. I'll roll back the extra inlined creation date and move to using FS mtime. We can always re-add the inline creation date if we run into trouble. Do-The-Minimal-Change-Needed™️

initialMinTranslogGen, minTranslogGenerationSupplier, System.currentTimeMillis());
}

static TranslogWriter create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugly, can you put multiple on the same line

…on_size_age

# Conflicts:
#	core/src/test/java/org/elasticsearch/index/translog/TranslogTests.java
# Conflicts:
#	core/src/test/java/org/elasticsearch/index/translog/TranslogTests.java
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, all minor, I skipped all the parts that had anything to do with ctime given our previous discussion.

Property.IndexScope);

public static final Setting<TimeValue> INDEX_TRANSLOG_RETENTION_AGE_SETTING =
Setting.timeSetting("index.translog.retention.age", TimeValue.timeValueHours(-1), TimeValue.timeValueMillis(-1), Property.Dynamic,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value violates the minimum value? I'm surprised that that's permitted and something doesn't break elsewhere but can you clean this up here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Setting.byteSizeSetting("index.translog.retention.size", new ByteSizeValue(-1, ByteSizeUnit.MB), Property.Dynamic,
Property.IndexScope);

public static final Setting<TimeValue> INDEX_TRANSLOG_RETENTION_AGE_SETTING =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add Javadocs explaining this setting?

@@ -111,6 +112,14 @@
Setting.byteSizeSetting("index.translog.flush_threshold_size", new ByteSizeValue(512, ByteSizeUnit.MB), Property.Dynamic,
Property.IndexScope);

public static final Setting<ByteSizeValue> INDEX_TRANSLOG_RETENTION_SIZE_SETTING =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add Javadocs explaining this setting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between size and age you have an inconsistent ordering, sometimes size then age, sometimes age then size. Can you please pick a consistent ordering for the pairing?

@@ -265,6 +276,9 @@ public IndexSettings(final IndexMetaData indexMetaData, final Settings nodeSetti
syncInterval = INDEX_TRANSLOG_SYNC_INTERVAL_SETTING.get(settings);
refreshInterval = scopedSettings.get(INDEX_REFRESH_INTERVAL_SETTING);
flushThresholdSize = scopedSettings.get(INDEX_TRANSLOG_FLUSH_THRESHOLD_SIZE_SETTING);
translogRetentionAge = scopedSettings.get(INDEX_TRANSLOG_RETENTION_AGE_SETTING);
translogRetentionSize = scopedSettings.get(INDEX_TRANSLOG_RETENTION_SIZE_SETTING);
flushThresholdSize = scopedSettings.get(INDEX_TRANSLOG_FLUSH_THRESHOLD_SIZE_SETTING);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates line 278?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep :(

private long maxRetentionAgeInMillis;

public TranslogDeletionPolicy(long retentionSizeInBytes, long maxRetentionAgeInMillis) {
this.retentionSizeInBytes = retentionSizeInBytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no max on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. Will add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will actually remove the max from the other one. Shorter.

long minByAge = getMinTranslogGenByAge(readers, writer, maxRetentionAgeInMillis, currentTime());
long minBySize = getMinTranslogGenBySize(readers, writer, retentionSizeInBytes);
long minByAgeAndSize = Math.max(minByAge, minBySize);
if (minByAgeAndSize == Long.MIN_VALUE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a matter of taste but I think this is clearer:

diff --git a/core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java b/core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java
index 9dd72526ca..216c1f243c 100644
--- a/core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java
+++ b/core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java
@@ -102,10 +102,12 @@ public class TranslogDeletionPolicy {
         long minByView = getMinTranslogGenRequiredByViews();
         long minByAge = getMinTranslogGenByAge(readers, writer, maxRetentionAgeInMillis, currentTime());
         long minBySize = getMinTranslogGenBySize(readers, writer, retentionSizeInBytes);
-        long minByAgeAndSize = Math.max(minByAge, minBySize);
-        if (minByAgeAndSize == Long.MIN_VALUE) {
+        final long minByAgeAndSize;
+        if (minBySize == Long.MIN_VALUE && minByAge == Long.MAX_VALUE) {
             // both size and age are disabled;
             minByAgeAndSize = Long.MAX_VALUE;
+        } else {
+            minByAgeAndSize = Math.max(minBySize, minByAge);
         }
         return Math.min(minByAgeAndSize, Math.min(minByView, minTranslogGenerationForRecovery));
     }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

}
translog.getDeletionPolicy().setRetentionAgeInMillis(randomBoolean() ? 100 : -1);
assertBusy(() -> {
try {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to clean this up but I need another PR

@bleskes
Copy link
Contributor Author

bleskes commented Jun 15, 2017

@s1monw @jasontedor can you take another look?

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me I think the only question is whether or not we should fsync the metadata when we close a writer into a reader. I do not seeing this being a question that should hold up this PR. I also do not see this being a challenging thing to do, we already sync (without syncing metadata) when we close into a reader, so I think we can merely push a boolean down to change this. Again, separate PR! LGTM.

@bleskes bleskes merged commit 9ddea53 into elastic:master Jun 16, 2017
@bleskes bleskes deleted the translog_rentention_size_age branch June 16, 2017 07:09
@bleskes
Copy link
Contributor Author

bleskes commented Jun 16, 2017

Thx @s1monw & @jasontedor for the multiple iterations

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jun 16, 2017
…y-context

* 'master' of github.com:elastic/elasticsearch: (21 commits)
  [DOCS] Clarify expected availability of HDFS for the HDFS Repository (elastic#25220)
  Remove some redundant 140 character checkstyle suppressions
  [Docs] more fix for the parent-join docs
  [Docs] Fix cross reference for parent-join field
  More advices around search speed and disk usage. (elastic#25252)
  Add documentation for the new parent-join field (elastic#25227)
  [analysis-icu] Allow setting unicodeSetFilter (elastic#20814)
  Introduce translog size and age based retention policies (elastic#25147)
  Add needs methods for specific variables to Painless script context factories. (elastic#25267)
  Improves snapshot logging and snapshoth deletion error handling (elastic#25264)
  Add unit test for PathHierarchyTokenizerFactory (elastic#24984)
  Deprecate tribe service
  Moved more token filters to analysis-common module.
  [Test] Make sure that SearchAfterSortedDocQueryTests uses a single threaded searcher
  [DOCS] Defined es-test-dir and plugins-examples-dir in index.asciidoc.  (elastic#25232)
  Test fix - removed superfluous assertion (elastic#25247)
  [Test] restore BWC for parent-join now that the new mapping format is in 5.x
  Add a section named "relations" in the ParentJoinFieldMapper (elastic#25248)
  test: Ported more OldIndexBackwardsCompatibilityIT tests to full cluster restart qa tests. (elastic#25173)
  fix: Sort Processor does not have proper behavior with targetField (elastic#25237)
  ...
bleskes added a commit that referenced this pull request Jun 22, 2017
#25147  added the translog deletion policy but didn't enable it by default. This PR enables a default retention of 512MB (same maximum size of the current translog) and an age of 12 hours (i.e., after 12 hours all translog files will be deleted). This increases to chance to have an ops based recovery, even if the primary flushed or the replica was offline for a few hours.

In order to see which parts of the translog are committed into lucene the translog stats are extended to include information about uncommitted operations.

Views now include all translog ops and guarantee, as before, that those will not go away. Snapshotting a view allows to filter out generations that are not relevant based on a specific sequence number.

Relates to #10708
@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Translog :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >feature v6.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants