Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce translog size and age based retention policies #25147

Merged
merged 17 commits into from Jun 16, 2017

Conversation

Projects
None yet
4 participants
@bleskes
Copy link
Member

commented Jun 9, 2017

This PR extends the TranslogDeletionPolicy to allow keeping the translog files longer than what is needed for recovery from lucene. Specifically, we allow specifying the total size of the files and their maximum age (i.e., keep up to 512MB but no longer than 3 hours). This will allow making ops based recoveries more common.

To achieve the above the translog files are extended to contain a timestamp, indicating when they were created.

Note that the default size and age still set to 0, maintaining current behavior. This is needed as the other components in the system are not yet ready for a longer translog retention. I will adapt those in follow up PRs.

Relates to #10708

bleskes added some commits Jun 9, 2017

@bleskes bleskes requested review from s1monw and jasontedor Jun 9, 2017

@s1monw
Copy link
Contributor

left a comment

left a couple of comments

final IndexSettings indexSettings = engineConfig.getIndexSettings();
translogDeletionPolicy.setMaxRetentionAgeInMillis(indexSettings.getTranslogRetentionAge().getMillis());
translogDeletionPolicy.setRetentionSizeInBytes(indexSettings.getTranslogRetentionSize().getBytes());
translog.trimUnreferencedReaders();

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 9, 2017

Contributor

I don't think we should do this in the settings loop. if these settings changed and the user wants them to take effect they should call _flush

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 9, 2017

Author Member

agreed.

return Math.min(minByAgeAndSize, Math.min(minByView, minTranslogGenerationForRecovery));
}

private long getMinTranslogGenBySize(List<TranslogReader> readers, TranslogWriter writer) {

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 9, 2017

Contributor

do you wanna make them pkg private then you can test them in isolation. I'd love to see that

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 9, 2017

Author Member

good idea. I took it a step further and made them static, makes things easier.

assert path.getFileName().toString().endsWith(Translog.TRANSLOG_FILE_SUFFIX) : "new file ends with old suffix: " + path;
assert checkpoint.numOps >= 0 : "expected at least 0 operatin but got: " + checkpoint.numOps;
assert checkpoint.numOps >= 0 : "expected at least 0 operation but got: " + checkpoint.numOps;

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 9, 2017

Contributor

just an idea, should we use the last modification time on the file instead? I mean it should be ok to do that no?

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 9, 2017

Author Member

tja, maybe. I tried to avoid the entire what metadata is supported where discussion. I mean we walk to great length to avoid directory traversals. I think this is simple, solid and entirely self contained?

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 14, 2017

Contributor

I am not sure I agree with this. it's redundant information and no traversal needed. you have the file you just need to read the time created. I am really careful with any kind of added metadata here you know why.

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 14, 2017

Author Member

I am really careful with any kind of added metadata here you know why

I totally hear you. It's clearly and edge case. One other aspect of they way I set it up now is that it's easy to test. With FS based metadata we'd probably have to mock something, which is also doable but messier.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Jun 14, 2017

Member

I lean towards using the filesystem mtime here. I do not think this would be messy to mock, only a method that you can override in tests?

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 14, 2017

Author Member

OK. I personally feel it might end up messier but I trust your judgement. I'll roll back the extra inlined creation date and move to using FS mtime. We can always re-add the inline creation date if we run into trouble. Do-The-Minimal-Change-Needed™️

initialMinTranslogGen, minTranslogGenerationSupplier, System.currentTimeMillis());
}

static TranslogWriter create(

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 9, 2017

Contributor

ugly, can you put multiple on the same line

bleskes added some commits Jun 9, 2017

Merge remote-tracking branch 'upstream/master' into translog_rententi…
…on_size_age

# Conflicts:
#	core/src/test/java/org/elasticsearch/index/translog/TranslogTests.java
Merge branch 'master' into translog_rentention_size_age
# Conflicts:
#	core/src/test/java/org/elasticsearch/index/translog/TranslogTests.java

@bleskes bleskes requested a review from s1monw Jun 11, 2017

@s1monw

s1monw approved these changes Jun 14, 2017

Copy link
Contributor

left a comment

LGTM

@jasontedor
Copy link
Member

left a comment

I left a few comments, all minor, I skipped all the parts that had anything to do with ctime given our previous discussion.

Property.IndexScope);

public static final Setting<TimeValue> INDEX_TRANSLOG_RETENTION_AGE_SETTING =
Setting.timeSetting("index.translog.retention.age", TimeValue.timeValueHours(-1), TimeValue.timeValueMillis(-1), Property.Dynamic,

This comment has been minimized.

Copy link
@jasontedor

jasontedor Jun 15, 2017

Member

The default value violates the minimum value? I'm surprised that that's permitted and something doesn't break elsewhere but can you clean this up here?

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 15, 2017

Author Member

good catch

Setting.byteSizeSetting("index.translog.retention.size", new ByteSizeValue(-1, ByteSizeUnit.MB), Property.Dynamic,
Property.IndexScope);

public static final Setting<TimeValue> INDEX_TRANSLOG_RETENTION_AGE_SETTING =

This comment has been minimized.

Copy link
@jasontedor

jasontedor Jun 15, 2017

Member

Would you please add Javadocs explaining this setting?

@@ -111,6 +112,14 @@
Setting.byteSizeSetting("index.translog.flush_threshold_size", new ByteSizeValue(512, ByteSizeUnit.MB), Property.Dynamic,
Property.IndexScope);

public static final Setting<ByteSizeValue> INDEX_TRANSLOG_RETENTION_SIZE_SETTING =

This comment has been minimized.

Copy link
@jasontedor

jasontedor Jun 15, 2017

Member

Would you please add Javadocs explaining this setting?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Jun 15, 2017

Member

Between size and age you have an inconsistent ordering, sometimes size then age, sometimes age then size. Can you please pick a consistent ordering for the pairing?

@@ -265,6 +276,9 @@ public IndexSettings(final IndexMetaData indexMetaData, final Settings nodeSetti
syncInterval = INDEX_TRANSLOG_SYNC_INTERVAL_SETTING.get(settings);
refreshInterval = scopedSettings.get(INDEX_REFRESH_INTERVAL_SETTING);
flushThresholdSize = scopedSettings.get(INDEX_TRANSLOG_FLUSH_THRESHOLD_SIZE_SETTING);
translogRetentionAge = scopedSettings.get(INDEX_TRANSLOG_RETENTION_AGE_SETTING);
translogRetentionSize = scopedSettings.get(INDEX_TRANSLOG_RETENTION_SIZE_SETTING);
flushThresholdSize = scopedSettings.get(INDEX_TRANSLOG_FLUSH_THRESHOLD_SIZE_SETTING);

This comment has been minimized.

Copy link
@jasontedor

jasontedor Jun 15, 2017

Member

This duplicates line 278?

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 15, 2017

Author Member

yep :(

private long maxRetentionAgeInMillis;

public TranslogDeletionPolicy(long retentionSizeInBytes, long maxRetentionAgeInMillis) {
this.retentionSizeInBytes = retentionSizeInBytes;

This comment has been minimized.

Copy link
@jasontedor

jasontedor Jun 15, 2017

Member

Why no max on this?

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 15, 2017

Author Member

good question. Will add.

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 15, 2017

Author Member

I will actually remove the max from the other one. Shorter.

long minByAge = getMinTranslogGenByAge(readers, writer, maxRetentionAgeInMillis, currentTime());
long minBySize = getMinTranslogGenBySize(readers, writer, retentionSizeInBytes);
long minByAgeAndSize = Math.max(minByAge, minBySize);
if (minByAgeAndSize == Long.MIN_VALUE) {

This comment has been minimized.

Copy link
@jasontedor

jasontedor Jun 15, 2017

Member

It's a matter of taste but I think this is clearer:

diff --git a/core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java b/core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java
index 9dd72526ca..216c1f243c 100644
--- a/core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java
+++ b/core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java
@@ -102,10 +102,12 @@ public class TranslogDeletionPolicy {
         long minByView = getMinTranslogGenRequiredByViews();
         long minByAge = getMinTranslogGenByAge(readers, writer, maxRetentionAgeInMillis, currentTime());
         long minBySize = getMinTranslogGenBySize(readers, writer, retentionSizeInBytes);
-        long minByAgeAndSize = Math.max(minByAge, minBySize);
-        if (minByAgeAndSize == Long.MIN_VALUE) {
+        final long minByAgeAndSize;
+        if (minBySize == Long.MIN_VALUE && minByAge == Long.MAX_VALUE) {
             // both size and age are disabled;
             minByAgeAndSize = Long.MAX_VALUE;
+        } else {
+            minByAgeAndSize = Math.max(minBySize, minByAge);
         }
         return Math.min(minByAgeAndSize, Math.min(minByView, minTranslogGenerationForRecovery));
     }

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 15, 2017

Author Member

sure

bleskes added some commits Jun 15, 2017

}
translog.getDeletionPolicy().setRetentionAgeInMillis(randomBoolean() ? 100 : -1);
assertBusy(() -> {
try {

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 15, 2017

Author Member

I'm going to clean this up but I need another PR

@bleskes bleskes requested review from s1monw and jasontedor Jun 15, 2017

@bleskes

This comment has been minimized.

Copy link
Member Author

commented Jun 15, 2017

@s1monw @jasontedor can you take another look?

@s1monw

s1monw approved these changes Jun 15, 2017

Copy link
Contributor

left a comment

LGTM thanks

@jasontedor
Copy link
Member

left a comment

For me I think the only question is whether or not we should fsync the metadata when we close a writer into a reader. I do not seeing this being a question that should hold up this PR. I also do not see this being a challenging thing to do, we already sync (without syncing metadata) when we close into a reader, so I think we can merely push a boolean down to change this. Again, separate PR! LGTM.

@bleskes bleskes merged commit 9ddea53 into elastic:master Jun 16, 2017

2 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci Build finished.
Details

@bleskes bleskes deleted the bleskes:translog_rentention_size_age branch Jun 16, 2017

@bleskes

This comment has been minimized.

Copy link
Member Author

commented Jun 16, 2017

Thx @s1monw & @jasontedor for the multiple iterations

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jun 16, 2017

Merge branch 'master' of github.com:elastic/elasticsearch into primar…
…y-context

* 'master' of github.com:elastic/elasticsearch: (21 commits)
  [DOCS] Clarify expected availability of HDFS for the HDFS Repository (elastic#25220)
  Remove some redundant 140 character checkstyle suppressions
  [Docs] more fix for the parent-join docs
  [Docs] Fix cross reference for parent-join field
  More advices around search speed and disk usage. (elastic#25252)
  Add documentation for the new parent-join field (elastic#25227)
  [analysis-icu] Allow setting unicodeSetFilter (elastic#20814)
  Introduce translog size and age based retention policies (elastic#25147)
  Add needs methods for specific variables to Painless script context factories. (elastic#25267)
  Improves snapshot logging and snapshoth deletion error handling (elastic#25264)
  Add unit test for PathHierarchyTokenizerFactory (elastic#24984)
  Deprecate tribe service
  Moved more token filters to analysis-common module.
  [Test] Make sure that SearchAfterSortedDocQueryTests uses a single threaded searcher
  [DOCS] Defined es-test-dir and plugins-examples-dir in index.asciidoc.  (elastic#25232)
  Test fix - removed superfluous assertion (elastic#25247)
  [Test] restore BWC for parent-join now that the new mapping format is in 5.x
  Add a section named "relations" in the ParentJoinFieldMapper (elastic#25248)
  test: Ported more OldIndexBackwardsCompatibilityIT tests to full cluster restart qa tests. (elastic#25173)
  fix: Sort Processor does not have proper behavior with targetField (elastic#25237)
  ...

bleskes added a commit that referenced this pull request Jun 22, 2017

Enable a long translog retention policy by default (#25294)
#25147  added the translog deletion policy but didn't enable it by default. This PR enables a default retention of 512MB (same maximum size of the current translog) and an age of 12 hours (i.e., after 12 hours all translog files will be deleted). This increases to chance to have an ops based recovery, even if the primary flushed or the replica was offline for a few hours.

In order to see which parts of the translog are committed into lucene the translog stats are extended to include information about uncommitted operations.

Views now include all translog ops and guarantee, as before, that those will not go away. Snapshotting a view allows to filter out generations that are not relevant based on a specific sequence number.

Relates to #10708

@clintongormley clintongormley added v6.0.0-beta1 and removed v6.0.0 labels Jul 25, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.