Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce sequence-number-aware translog #22822

Merged

Conversation

Projects
None yet
5 participants
@jasontedor
Copy link
Member

commented Jan 26, 2017

Today, the relationship between Lucene and the translog is rather simple: every document not in Lucene is guaranteed to be in the translog. We need a stronger guarantee from the translog though, namely that it can replay all operations after a certain sequence number. For this to be possible, the translog has to made sequence-number aware. As a first step, we introduce the min and max sequence numbers into the translog so that each generation knows the possible range of operations contained in the generation. This will enable future work to keep around all generations containing operations after a certain sequence number (e.g., the global checkpoint).

Relates #10708

core/src/main/java/org/elasticsearch/index/translog/BaseTranslogReader.java Outdated
public abstract int totalOperations();
public abstract int totalOperations();

abstract long getMinSeqNo();

This comment has been minimized.

Copy link
@nik9000

nik9000 Jan 26, 2017

Contributor

I think it'd be nice to have javadocs for these.

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 27, 2017

Contributor

++

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 1, 2017

Author Member

I pushed abd491cbf3a62f74d2c391876aee7b3e57166b0c.

this.offset = offset;
this.numOps = numOps;
this.generation = generation;
this.minSeqNo = minSeqNo;
this.maxSeqNo = maxSeqNo;

This comment has been minimized.

Copy link
@nik9000

nik9000 Jan 26, 2017

Contributor

Is it worth checking that minSeqNo <= maxSeqNo?

This comment has been minimized.

Copy link
@nik9000

nik9000 Jan 26, 2017

Contributor

Or maybe something like minSeqNo <= maxSeqNo || minSeqNo == Long.MAX_VALUE && maxSeqNo == Long.MIN_VALUE if we want to keep the merging easier.

This comment has been minimized.

Copy link
@bleskes

bleskes Jan 27, 2017

Member

+1 for assertions.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 1, 2017

Author Member

I pushed e9370c84925e00d11d74e6785004c9de7f9feefc.

core/src/main/java/org/elasticsearch/index/translog/Translog.java Outdated
@@ -198,7 +198,7 @@ public Translog(
logger.debug("wipe translog location - creating new translog");
Files.createDirectories(location);
final long generation = 1;
Checkpoint checkpoint = new Checkpoint(0, 0, generation, globalCheckpointSupplier.getAsLong());
Checkpoint checkpoint = new Checkpoint(0, 0, generation, Long.MAX_VALUE, Long.MIN_VALUE, globalCheckpointSupplier.getAsLong());

This comment has been minimized.

Copy link
@nik9000

nik9000 Jan 26, 2017

Contributor

Negative values for translog seem to have a special meaning. Maybe this should be 0, 0?

This comment has been minimized.

Copy link
@nik9000

nik9000 Jan 26, 2017

Contributor

Or one of the constants in SequenceNumberService?

This comment has been minimized.

Copy link
@bleskes

bleskes Jan 27, 2017

Member

+1 to constants. I'll comment about it at the end.

core/src/main/java/org/elasticsearch/index/translog/Translog.java Outdated
@@ -419,22 +419,21 @@ public Location add(Operation operation) throws IOException {
out.writeInt(operationSize);
out.seek(end);
final ReleasablePagedBytesReference bytes = out.bytes();
try (ReleasableLock lock = readLock.acquire()) {
try (final ReleasableLock ignored = readLock.acquire()) {

This comment has been minimized.

Copy link
@nik9000

nik9000 Jan 26, 2017

Contributor

These are implicitly final already. I do like the name change though.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 3, 2017

Author Member

I opened #22960 so this travesty does not happen again.

This comment has been minimized.

Copy link
@nik9000

nik9000 Feb 3, 2017

Contributor

I wouldn't go so far as to say it is a travesty, but, thanks!

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 6, 2017

Author Member

I pushed 4311c1fc4bfeb9830e688d47ac3729b9d7acf8e3.

core/src/main/java/org/elasticsearch/index/translog/TranslogReader.java Outdated
protected final AtomicBoolean closed = new AtomicBoolean(false);

/**
* Create a reader of translog file channel. The length parameter should be consistent with totalOperations and point
* at the end of the last operation in this snapshot.
*/
public TranslogReader(long generation, FileChannel channel, Path path, long firstOperationOffset, long length, int totalOperations) {
public TranslogReader(
final long generation,

This comment has been minimized.

Copy link
@nik9000

nik9000 Jan 26, 2017

Contributor

Can you indent these so they don't look like method body?

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 27, 2017

Contributor

++ I hate this - sorry

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 6, 2017

Author Member

I pushed 8786360f5f4f40fa31ab273464128abb34c440a4 and 48403f4baa62d1bd6223ea26a547ccb6b63ef6eb.

@bleskes

This comment has been minimized.

Copy link
Member

commented Jan 26, 2017

cool.. I quickly read through it and the basics look good. I will give it a careful look tomorrow morning (I want to look critically at the initial min/max values and also w.r.t the BWC situation where incoming seqnos can be -1)

@s1monw
Copy link
Contributor

left a comment

left some comments

core/src/main/java/org/elasticsearch/index/translog/BaseTranslogReader.java Outdated
public abstract int totalOperations();
public abstract int totalOperations();

abstract long getMinSeqNo();

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 27, 2017

Contributor

++

@@ -35,11 +36,13 @@
import java.nio.file.OpenOption;
import java.nio.file.Path;

class Checkpoint {
final class Checkpoint {

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 27, 2017

Contributor

++

core/src/main/java/org/elasticsearch/index/translog/Checkpoint.java Outdated
}

// reads a checksummed checkpoint introduced in ES 5.0.0
static Checkpoint readChecksummedV1(DataInput in) throws IOException {
return new Checkpoint(in.readLong(), in.readInt(), in.readLong(), SequenceNumbersService.UNASSIGNED_SEQ_NO);
return new Checkpoint(

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 27, 2017

Contributor

can we wrap this in 2 lines instead of N this makes my eyes bleed

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 7, 2017

Author Member

That sounds horrible, you should really get that checked out. 😛

I pushed a commit that should address your concerns.

core/src/main/java/org/elasticsearch/index/translog/Translog.java Outdated
@@ -778,6 +777,8 @@ public static Type fromId(byte id) {

Source getSource();

long seqNo();

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 27, 2017

Contributor

can this be getSeqNo?

core/src/main/java/org/elasticsearch/index/translog/Translog.java Outdated
@@ -1147,6 +1150,7 @@ public String toString() {
private final long primaryTerm;
private final String reason;

@Override
public long seqNo() {

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 27, 2017

Contributor

maybe getSeqNo?

core/src/main/java/org/elasticsearch/index/translog/TranslogReader.java Outdated
protected final AtomicBoolean closed = new AtomicBoolean(false);

/**
* Create a reader of translog file channel. The length parameter should be consistent with totalOperations and point
* at the end of the last operation in this snapshot.
*/
public TranslogReader(long generation, FileChannel channel, Path path, long firstOperationOffset, long length, int totalOperations) {
public TranslogReader(
final long generation,

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 27, 2017

Contributor

++ I hate this - sorry

core/src/main/java/org/elasticsearch/index/translog/TranslogReader.java Outdated
@@ -116,7 +143,15 @@ public static TranslogReader open(FileChannel channel, Path path, Checkpoint che
throw new TranslogCorruptedException("expected shard UUID " + uuidBytes + " but got: " + ref +
" this translog file belongs to a different translog. path:" + path);
}
return new TranslogReader(checkpoint.generation, channel, path, ref.length + CodecUtil.headerLength(TranslogWriter.TRANSLOG_CODEC) + Integer.BYTES, checkpoint.offset, checkpoint.numOps);
return new TranslogReader(
checkpoint.generation,

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 27, 2017

Contributor

please stop doing this.

This comment has been minimized.

Copy link
@bleskes

bleskes Jan 27, 2017

Member

maybe make the constructor accept a checkpoint and that will make many of these parameters go away..

core/src/test/java/org/elasticsearch/index/translog/TranslogVersionTests.java Outdated
channel,
path,
new Checkpoint(
Files.size(path),

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 27, 2017

Contributor

really?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 7, 2017

Author Member

This is better now.

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java Outdated
ex.addSuppressed(inner);
}
throw ex;
}
totalOffset += data.length();
operationCounter++;
minSeqNo = Math.min(minSeqNo, seqNo);

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 27, 2017

Contributor

can we add a comment that the order we assign these values matters to ensure we always have a >= relationship?

This comment has been minimized.

Copy link
@bleskes

bleskes Jan 27, 2017

Member

we need to take BWC aspect here into account. The incoming minSeqNo can be unassigned (if the ops comes from an old primary) but as soon as we start receiving valid seqNo we should never go back. At least, I hope that's where end up being and I want to start building the assertions based on that.

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java Outdated
@@ -191,6 +205,14 @@ public int totalOperations() {
return operationCounter;
}

public long getMinSeqNo() {

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 27, 2017

Contributor

I wonder if it's possible to read these out of order and therefore get inconsistent values? like if you read max first and then min? Also I wonder if the default value can be confusing?

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 27, 2017

Contributor

to be really consistent I think we need to return a tuple and read it under the lock?

This comment has been minimized.

Copy link
@bleskes

bleskes Jan 27, 2017

Member

I think we can wait and see the usage (but document the caveat - other stats are not consistent either). At the moment it's only used for snapshots which is created under lock. I think that the future for recovery/translog trimming will not care about the the writer's values. An alternative is to not expose these but rather the last checkpoint, which is consistent.

This comment has been minimized.

Copy link
@nik9000

nik9000 Feb 8, 2017

Contributor

Yeah, I'd prefer not to expose these unless we're sure we need them because of the concurrency issues. Or just expose them for testing if we need them there.

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 12, 2017

Member

@jasontedor it seems the consensus leans towards exposing/use the last checkpoint. I presume you had a reason to leave as is. Can you elaborate?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 16, 2017

Author Member

I pushed bdbbe83399a22d2df91aa5ed684d86d3579dc023.

@bleskes
Copy link
Member

left a comment

I did a more detailed pass and left a bunch of comments.

On top of it, I think we should extend the translog stats with the min max (which might need the lock/tuple that Simon discussed)

About those constants. Without BWC I would argue we should initialize the min/max with the no ops performed constants. The tricky part that with BWC the translog can up full with ops and still the min/max would be no_ops_performed which is strange and will lead to bugs imo. As such, I would recommend using unassigned until the first operation with a seq# enters the system, at which point we switch to normal min/max.

this.offset = offset;
this.numOps = numOps;
this.generation = generation;
this.minSeqNo = minSeqNo;
this.maxSeqNo = maxSeqNo;

This comment has been minimized.

Copy link
@bleskes

bleskes Jan 27, 2017

Member

+1 for assertions.

core/src/main/java/org/elasticsearch/index/translog/Checkpoint.java Outdated
in.readLong(),
in.readInt(),
in.readLong(),
SequenceNumbersService.NO_OPS_PERFORMED,

This comment has been minimized.

Copy link
@bleskes

bleskes Jan 27, 2017

Member

I think we should use unassigned here too.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 7, 2017

Author Member

This is addressed.

core/src/main/java/org/elasticsearch/index/translog/Checkpoint.java Outdated
}

// reads checkpoint from ES < 5.0.0
static Checkpoint readNonChecksummed(DataInput in) throws IOException {
return new Checkpoint(in.readLong(), in.readInt(), in.readLong(), SequenceNumbersService.UNASSIGNED_SEQ_NO);
return new Checkpoint(

This comment has been minimized.

Copy link
@bleskes

bleskes Jan 27, 2017

Member

we can trash this in master, right?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 7, 2017

Author Member

I pushed 8e9091bf41d2d28cd46485e760c235b895e8a65f.

core/src/main/java/org/elasticsearch/index/translog/Translog.java Outdated
@@ -198,7 +198,7 @@ public Translog(
logger.debug("wipe translog location - creating new translog");
Files.createDirectories(location);
final long generation = 1;
Checkpoint checkpoint = new Checkpoint(0, 0, generation, globalCheckpointSupplier.getAsLong());
Checkpoint checkpoint = new Checkpoint(0, 0, generation, Long.MAX_VALUE, Long.MIN_VALUE, globalCheckpointSupplier.getAsLong());

This comment has been minimized.

Copy link
@bleskes

bleskes Jan 27, 2017

Member

+1 to constants. I'll comment about it at the end.

core/src/main/java/org/elasticsearch/index/translog/TranslogReader.java Outdated
@@ -116,7 +143,15 @@ public static TranslogReader open(FileChannel channel, Path path, Checkpoint che
throw new TranslogCorruptedException("expected shard UUID " + uuidBytes + " but got: " + ref +
" this translog file belongs to a different translog. path:" + path);
}
return new TranslogReader(checkpoint.generation, channel, path, ref.length + CodecUtil.headerLength(TranslogWriter.TRANSLOG_CODEC) + Integer.BYTES, checkpoint.offset, checkpoint.numOps);
return new TranslogReader(
checkpoint.generation,

This comment has been minimized.

Copy link
@bleskes

bleskes Jan 27, 2017

Member

maybe make the constructor accept a checkpoint and that will make many of these parameters go away..

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java Outdated
@@ -191,6 +205,14 @@ public int totalOperations() {
return operationCounter;
}

public long getMinSeqNo() {

This comment has been minimized.

Copy link
@bleskes

bleskes Jan 27, 2017

Member

I think we can wait and see the usage (but document the caveat - other stats are not consistent either). At the moment it's only used for snapshots which is created under lock. I think that the future for recovery/translog trimming will not care about the the writer's values. An alternative is to not expose these but rather the last checkpoint, which is consistent.

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java Outdated
@@ -295,7 +331,7 @@ public boolean syncUpTo(long offset) throws IOException {
try {
channel.force(false);
checkpoint =
writeCheckpoint(channelFactory, offsetToSync, opsCounter, globalCheckpoint, path.getParent(), generation);
writeCheckpoint(channelFactory, offsetToSync, opsCounter, minSeqNo, maxSeqNo, globalCheckpoint, path.getParent(), generation);

This comment has been minimized.

Copy link
@bleskes

bleskes Jan 27, 2017

Member

we should capture these under lock.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 7, 2017

Author Member

I pushed 7802d6ec2d2f3fdb2dbb1c79c60b3fe6c605291d.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 7, 2017

Author Member

This is done in 7802d6ec2d2f3fdb2dbb1c79c60b3fe6c605291d.

core/src/main/java/org/elasticsearch/index/translog/TruncateTranslogCommand.java Outdated
@@ -168,7 +168,8 @@ protected void execute(Terminal terminal, OptionSet options, Environment env) th

/** Write a checkpoint file to the given location with the given generation */
public static void writeEmptyCheckpoint(Path filename, int translogLength, long translogGeneration) throws IOException {
Checkpoint emptyCheckpoint = new Checkpoint(translogLength, 0, translogGeneration, SequenceNumbersService.UNASSIGNED_SEQ_NO);
Checkpoint emptyCheckpoint =
new Checkpoint(translogLength, 0, translogGeneration, Long.MAX_VALUE, Long.MIN_VALUE, SequenceNumbersService.UNASSIGNED_SEQ_NO);

This comment has been minimized.

Copy link
@bleskes

bleskes Jan 27, 2017

Member

I think we should use a static method on the translog class to make sure this is consistent with what the translog does when it creates an empty translog.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 6, 2017

Author Member

I pushed e6b356d885d22524bc899987525d6e655fffd31a although I left unsettled the issue of the default values for the min and max sequence number (I will address these later).

core/src/test/java/org/elasticsearch/index/translog/TranslogTests.java Outdated
@@ -1208,12 +1220,12 @@ public void testRecoveryUncommittedCorruptedCheckpoint() throws IOException {
TranslogConfig config = translog.getConfig();
Path ckp = config.getTranslogPath().resolve(Translog.CHECKPOINT_FILE_NAME);
Checkpoint read = Checkpoint.read(ckp);
Checkpoint corrupted = new Checkpoint(0, 0, 0, SequenceNumbersService.UNASSIGNED_SEQ_NO);
Checkpoint corrupted = new Checkpoint(0, 0, 0, SequenceNumbersService.NO_OPS_PERFORMED, SequenceNumbersService.NO_OPS_PERFORMED, SequenceNumbersService.UNASSIGNED_SEQ_NO);

This comment has been minimized.

Copy link
@bleskes

bleskes Jan 27, 2017

Member

this another example we need the "empty" checkpoint utility - this is different than what the translog does.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 6, 2017

Author Member

I pushed e6b356d885d22524bc899987525d6e655fffd31a although I left unsettled the issue of the default values for the min and max sequence number (I will address these later).

for (int i = 0; i < numOps; i++) {
out.reset(bytes);
out.writeInt(i);
writer.add(new BytesArray(bytes));
long seqNo;
do {

This comment has been minimized.

Copy link
@bleskes

bleskes Jan 27, 2017

Member

once we add it, can we test that we do the right things with unassigned seq nos due to BWC?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 7, 2017

Author Member

I pushed 390ee829db21bae606446a5393c35af055a47c02.

@bleskes bleskes referenced this pull request Feb 1, 2017

Closed

Add Sequence Numbers to write operations #10708

57 of 64 tasks complete

@jasontedor jasontedor force-pushed the jasontedor:introduce-sequence-number-aware-translog branch 2 times, most recently Feb 6, 2017

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Feb 7, 2017

@nik9000 @bleskes @s1monw I think that this is ready for another round of reviews.

core/src/main/java/org/elasticsearch/index/translog/Checkpoint.java Outdated
// reads checkpoint from ES < 5.0.0
static Checkpoint readNonChecksummed(DataInput in) throws IOException {
return new Checkpoint(in.readLong(), in.readInt(), in.readLong(), SequenceNumbersService.UNASSIGNED_SEQ_NO);
final long minSeqNo = Translog.INITIAL_MIN_SEQ_NO;

This comment has been minimized.

Copy link
@nik9000

nik9000 Feb 8, 2017

Contributor

I don't think this should have the same number as an empty checkpoint. That way we can tell the difference between a checkpoint that didn't have any sequence numbers and one that was empty. I guess we could just look at the size, but still, I think it'd be more clear to reserve some int for this.

core/src/main/java/org/elasticsearch/index/translog/Checkpoint.java Outdated
out.writeLong(globalCheckpoint);
}

static Checkpoint emptyTranslogCheckpoint(final long offset, final long generation, final long globalCheckpoint) {
return new Checkpoint(offset, 0, generation, Translog.INITIAL_MIN_SEQ_NO, Translog.INITIAL_MAX_SEQ_NO, globalCheckpoint);

This comment has been minimized.

Copy link
@nik9000

nik9000 Feb 8, 2017

Contributor

Hmmm - now that you have it this way and those are the same numbers I think it'd be more clear to use UNASSIGNED_SEQ_NO. I think it is confusing to have three names for -2 because they'll all be unexpectedly equal, you know?

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java Outdated
ex.addSuppressed(inner);
}
throw ex;
}
totalOffset += data.length();
operationCounter++;

if (minSeqNo == Translog.INITIAL_MIN_SEQ_NO) {

This comment has been minimized.

Copy link
@nik9000

nik9000 Feb 8, 2017

Contributor

I'd be happy with Long.MAX_VALUE here so long as we're careful not to let them escape from this class.

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 12, 2017

Member

can we assert operation counter is 0? (and move the counter increment?)

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 16, 2017

Author Member

I pushed e8f11578cade5be8fdbb9c96d3478e07c0532ebf.

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java Outdated
minSeqNo = Math.min(minSeqNo, seqNo);
}

if (maxSeqNo == Translog.INITIAL_MAX_SEQ_NO) {

This comment has been minimized.

Copy link
@nik9000

nik9000 Feb 8, 2017

Contributor

Same here.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 16, 2017

Author Member

I pushed e8f11578cade5be8fdbb9c96d3478e07c0532ebf.

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java Outdated
@@ -191,6 +205,14 @@ public int totalOperations() {
return operationCounter;
}

public long getMinSeqNo() {

This comment has been minimized.

Copy link
@nik9000

nik9000 Feb 8, 2017

Contributor

Yeah, I'd prefer not to expose these unless we're sure we need them because of the concurrency issues. Or just expose them for testing if we need them there.

@bleskes
Copy link
Member

left a comment

Thx @jasontedor . I went through it again and I think we're close. Like Nik, I have concerns around have the new constants INITIAL_MIN_SEQ_NO. If we go the extra mile (which I think is good), I would go with setting the initial values to the intuitive NO_OPS_PERFORMED. Any incoming unassigned seq# should set it to UNASSINGED_SEQ_NO and I think that from that point on all the logic works.

Also - did you see my ask for extending the translog stats?

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java Outdated
final TranslogWriter writer =
new TranslogWriter(channelFactory, shardId, checkpoint, channel, file, bufferSize, globalCheckpointSupplier);
return writer;
writeCheckpoint(channelFactory, headerLength, 0, Translog.INITIAL_MIN_SEQ_NO, Translog.INITIAL_MAX_SEQ_NO, globalCheckpointSupplier.getAsLong(), file.getParent(), fileGeneration);

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 12, 2017

Member

shouldn't we sue the createEmptyCheckpoint method?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 16, 2017

Author Member

I pushed fc40b8f8e77b8cab1dd0544334e9d01f57b5f837.

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java Outdated
ex.addSuppressed(inner);
}
throw ex;
}
totalOffset += data.length();
operationCounter++;

if (minSeqNo == Translog.INITIAL_MIN_SEQ_NO) {

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 12, 2017

Member

can we assert operation counter is 0? (and move the counter increment?)

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java Outdated
minSeqNo = seqNo;
}
} else {
assert seqNo != SequenceNumbersService.UNASSIGNED_SEQ_NO;

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 12, 2017

Member

I think this is confusing (I'm leaving my original comment below, to show how I got confused). IMO we should always set the minSeqNo / maxSeqNo upon the first value we see. If it is UNASSIGNED_SEQ_NO, then so be it. It tells us that the translog contains operations without seq#, which is currently impossible to know.

hmm... is this correct? what want to check here is that if the incoming value is UNASSIGNED_SEQ_NO then the current value of minSeqNo must be UNASSIGNED_SEQ_NO . This is to say - once seq# are added to the mix, we never go back (we may need to change some logic somewhere else to do so, but I think it's the right move).

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 16, 2017

Author Member

I pushed e8f11578cade5be8fdbb9c96d3478e07c0532ebf.

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java Outdated
maxSeqNo = seqNo;
}
} else {
assert seqNo != SequenceNumbersService.UNASSIGNED_SEQ_NO;

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 12, 2017

Member

same comments rely to here.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 16, 2017

Author Member

I pushed e8f11578cade5be8fdbb9c96d3478e07c0532ebf.

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java Outdated
@@ -191,6 +205,14 @@ public int totalOperations() {
return operationCounter;
}

public long getMinSeqNo() {

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 12, 2017

Member

@jasontedor it seems the consensus leans towards exposing/use the last checkpoint. I presume you had a reason to leave as is. Can you elaborate?

@@ -1008,42 +1032,6 @@ public void testTranslogWriter() throws IOException {
IOUtils.close(writer);
}

public void testFailWriterWhileClosing() throws IOException {

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 12, 2017

Member

good that we have one thing less to worry about, but I wonder if it's a good idea to remove the test. I mean these transitions are always tricky and it's good to have it well tested?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 16, 2017

Author Member

This test was testing a specific scenario: constructing the new translog reader could fail because we did a file operation (reading the channel position) in that construction. That file operation was removed, so that failure scenario can not occur anymore. Are you saying that you want the test to remain, just without testing for the failure scenario?

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 16, 2017

Member

yeah, I get why you removed it but It think it's indeed good to keep around - it seems we don't have any other test for the closeIntoReader method.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 16, 2017

Author Member

I pushed introduce-sequence-number-aware-translog.

core/src/test/java/org/elasticsearch/index/translog/TranslogTests.java Outdated
long minSeqNo = Translog.INITIAL_MIN_SEQ_NO;
long maxSeqNo = Translog.INITIAL_MAX_SEQ_NO;
final Set<Long> seenSeqNos = new HashSet<>();
boolean opsHaveValidSequenceNumbers = false;

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 12, 2017

Member

should we randomize the start to simulate "no BWC mode"?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 16, 2017

Author Member

I pushed e8f11578cade5be8fdbb9c96d3478e07c0532ebf.

core/src/test/java/org/elasticsearch/index/translog/TranslogTests.java Outdated
} while (seenSeqNos.contains(seqNo));
if (seqNo != SequenceNumbersService.UNASSIGNED_SEQ_NO) {
seenSeqNos.add(seqNo);
if (minSeqNo == Translog.INITIAL_MIN_SEQ_NO) {

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 12, 2017

Member

wouldn't it be simpler to calculate those in the end based on seenSeqNos? we can also wrap this whole thing in a opsHaveValidSequenceNumbers chcek and then the inner loop won't need to change for UNASSIGNED_SEQ_NO

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 16, 2017

Author Member

I pushed e8f11578cade5be8fdbb9c96d3478e07c0532ebf.

@jasontedor jasontedor force-pushed the jasontedor:introduce-sequence-number-aware-translog branch Feb 16, 2017

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Feb 16, 2017

Also - did you see my ask for extending the translog stats?

Certainly this makes sense at the shard level, but it does not make sense when we report node level or index level stats. Thus, I think they should be left out?

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Feb 16, 2017

Thanks @nik9000 and @bleskes; I think I've addressed your feedback.

@bleskes

This comment has been minimized.

Copy link
Member

commented Feb 17, 2017

test this please

@bleskes
Copy link
Member

left a comment

Thx @jasontedor for the hard work. This LGTM. I left some minor comments but there is no need for an extra review on my end. @s1monw do you want to do another pass?

Re the stats - I see what you're saying. I'm not happy with it as I think this will give us a way to debug this and it will be good to expose, but I agree there is no clear cut easy solution that's worth pushing for in this PR. Potentially we should just remove translog stats from common stats, I'm not sure how much value the aggregation per index/node/cluster adds here.

core/src/main/java/org/elasticsearch/index/translog/Checkpoint.java Outdated
final long maxSeqNo = Translog.INITIAL_MAX_SEQ_NO;
return new Checkpoint(in.readLong(), in.readInt(), in.readLong(), minSeqNo, maxSeqNo, SequenceNumbersService.UNASSIGNED_SEQ_NO);
static Checkpoint readChecksummedV1(final DataInput in) throws IOException {
final long minSeqNo = SequenceNumbersService.NO_OPS_PERFORMED;

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 17, 2017

Member

these should be UNASSIGNED_SEQ_NO no? (unless maybe if total count is 0, but I'm not sure it's worth the code song and dance it will require)

core/src/test/java/org/elasticsearch/index/translog/TranslogTests.java Outdated
writer.add(new BytesArray(bytes), randomNonNegativeLong());
}
writer.sync();
try (TranslogReader reader = writer.closeIntoReader()) {

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 17, 2017

Member

I think we need to check the transfer of the checkpoint as well, no?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 20, 2017

Author Member

I pushed 48f02c2bedbc08e10099605030b92b8ca1bd34cd.

protected final long length;
private final int totalOperations;

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 17, 2017

Member

I'm not sure it's worth having these extra fields now that we have a checkpoint? code that needs it can either call the getter methods or access the checkpoint directly?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 17, 2017

Author Member

Yeah, I thought about removing them and decided the code read easier with them than with checkpoint.numOps and checkpoint.offset everywhere so reverted it. I can remove them if you feel strongly.

This comment has been minimized.

Copy link
@nik9000

nik9000 Feb 17, 2017

Contributor

I'd like to remove the fields but I'm fine with doing it in a dead simple followup. I like not doing it in this PR to keep it smaller.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 17, 2017

Author Member

Okay, I'm fine with it being a follow-up.

core/src/main/java/org/elasticsearch/index/translog/Checkpoint.java Outdated
return new Checkpoint(in.readLong(), in.readInt(), in.readLong(), SequenceNumbersService.UNASSIGNED_SEQ_NO);
// reads a checksummed checkpoint introduced in ES 5.0.0
static Checkpoint readChecksummedV1(final DataInput in) throws IOException {
final long minSeqNo = SequenceNumbersService.NO_OPS_PERFORMED;

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 17, 2017

Member

copying the commit comment as the code moved around:

these should be UNASSIGNED_SEQ_NO no? (unless maybe if total count is 0, but I'm not sure it's worth the code song and dance it will require)

@bleskes bleskes requested a review from s1monw Feb 17, 2017

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java Outdated
return getLastSyncedCheckpoint();
}

@Override
public long sizeInBytes() {
return totalOffset;
}

/**
* closes this writer and transfers it's underlying file channel to a new immutable reader

This comment has been minimized.

Copy link
@nik9000

nik9000 Feb 17, 2017

Contributor

Leftover.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 17, 2017

Author Member

I pushed ca0a0adf1820c68f096d9167b9f6f1efdac2c51d.

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java Outdated
@@ -80,6 +85,8 @@ public TranslogWriter(
this.outputStream = new BufferedChannelOutputStream(java.nio.channels.Channels.newOutputStream(channel), bufferSize.bytesAsInt());
this.lastSyncedCheckpoint = initialCheckpoint;
this.totalOffset = initialCheckpoint.offset;
this.minSeqNo = initialCheckpoint.minSeqNo;

This comment has been minimized.

Copy link
@nik9000

nik9000 Feb 17, 2017

Contributor

I think this is right but it is confusing me some. Maybe you can explain it or maybe add a comment?

So my problem is that on figure reading it looks like you are keeping the minimum sequence number as made by some previous writer. So if you make more than one of these writers then this minimum value will stay the minimum value from the first writer. Would it make more sense to use the NO_OPS_PERFORMED constant here? Or maybe add a comment about what it when why we want this "dragging backwards" behavior sometimes.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 17, 2017

Author Member

This is only ever called with a new empty checkpoint. We can assert, would that make it better?

This comment has been minimized.

Copy link
@nik9000

nik9000 Feb 17, 2017

Contributor

Yes please. That way I don't start thinking about crazy stuff that we're not doing.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 17, 2017

Author Member

I pushed 52f68e1979b80f9ae4a3c7e462546bd779110963.

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Feb 17, 2017

retest this please

@bleskes

This comment has been minimized.

Copy link
Member

commented Feb 20, 2017

test this please

@s1monw
Copy link
Contributor

left a comment

I left some comments nothing major

core/src/main/java/org/elasticsearch/index/translog/Checkpoint.java Outdated
// reads a checksummed checkpoint introduced in ES 5.0.0
static Checkpoint readChecksummedV1(DataInput in) throws IOException {
return new Checkpoint(in.readLong(), in.readInt(), in.readLong(), SequenceNumbersService.UNASSIGNED_SEQ_NO);
static Checkpoint readChecksummedV2(final DataInput in) throws IOException {

This comment has been minimized.

Copy link
@s1monw

s1monw Feb 20, 2017

Contributor

can we add a comment when this version was introduced. Maybe instead of V1 and V2 we use the version in the method name like Pre6_0_0?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 20, 2017

Author Member

I pushed a commit that does this.

core/src/main/java/org/elasticsearch/index/translog/Checkpoint.java Outdated
CodecUtil.checksumEntireFile(indexInput);
final int fileVersion = CodecUtil.checkHeader(indexInput, CHECKPOINT_CODEC, INITIAL_VERSION, CURRENT_VERSION);
if (fileVersion == INITIAL_VERSION) {
assert indexInput.length() == V1_FILE_SIZE;

This comment has been minimized.

Copy link
@s1monw

s1monw Feb 20, 2017

Contributor

can we have the actual length in the message? this would be aweful if we didn't have it. same below

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 20, 2017

Author Member

I pushed a commit that does this.

core/src/main/java/org/elasticsearch/index/translog/Checkpoint.java Outdated
assert indexInput.length() == FILE_SIZE;
return Checkpoint.readChecksummedV2(indexInput);
}
assert fileVersion == CURRENT_VERSION;

This comment has been minimized.

Copy link
@s1monw

s1monw Feb 20, 2017

Contributor

can we get the version in the message

core/src/main/java/org/elasticsearch/index/translog/TranslogReader.java Outdated
*/
public static TranslogReader open(FileChannel channel, Path path, Checkpoint checkpoint, String translogUUID) throws IOException {
public static TranslogReader open(
final FileChannel channel,

This comment has been minimized.

Copy link
@s1monw

s1monw Feb 20, 2017

Contributor

ugh can we have one line?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 20, 2017

Author Member

I pushed a commit just for you and only for you that does this.

core/src/main/java/org/elasticsearch/index/translog/TranslogReader.java Outdated
@@ -138,6 +158,11 @@ public int totalOperations() {
return totalOperations;
}

@Override
Checkpoint getCheckpoint() {

This comment has been minimized.

Copy link
@s1monw

s1monw Feb 20, 2017

Contributor

can this method be final?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 20, 2017

Author Member

Yes, I pushed a commit that does this.

@@ -56,6 +55,11 @@ public int totalOperations() {
}

@Override
Checkpoint getCheckpoint() {

This comment has been minimized.

Copy link
@s1monw

s1monw Feb 20, 2017

Contributor

can this method be final?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 20, 2017

Author Member

A final modifier here would be redundant because the class is final.

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java Outdated
@@ -80,6 +85,10 @@ public TranslogWriter(
this.outputStream = new BufferedChannelOutputStream(java.nio.channels.Channels.newOutputStream(channel), bufferSize.bytesAsInt());
this.lastSyncedCheckpoint = initialCheckpoint;
this.totalOffset = initialCheckpoint.offset;
assert initialCheckpoint.minSeqNo == SequenceNumbersService.NO_OPS_PERFORMED;

This comment has been minimized.

Copy link
@s1monw

s1monw Feb 20, 2017

Contributor

can we get the actual value in the message?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 20, 2017

Author Member

I agree. I pushed a commit that does this.

ex.addSuppressed(inner);
}
throw ex;
}
totalOffset += data.length();

if (minSeqNo == SequenceNumbersService.NO_OPS_PERFORMED) {

This comment has been minimized.

Copy link
@s1monw

s1monw Feb 20, 2017

Contributor

any chance we can get this logic as SequenceNumbersService#max and SequenceNumbersService#minwhere we can document and test it separately?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Feb 20, 2017

Author Member

I agree, I pushed a commit that does this (and added tests).

Introduce sequence-number-aware translog
Today, the relationship between Lucene and the translog is rather
simple: every document not in Lucene is guaranteed to be in the
translog. We need a stronger guarantee from the translog though, namely
that it can replay all operations after a certain sequence number. For
this to be possible, the translog has to made sequence-number aware. As
a first step, we introduce the min and max sequence numbers into the
translog so that each generation knows the possible range of operations
contained in the generation. This will enable future work to keep around
all generations containing operations after a certain sequence number
(e.g., the global checkpoint).

@jasontedor jasontedor force-pushed the jasontedor:introduce-sequence-number-aware-translog branch to 772a513 Feb 20, 2017

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Feb 20, 2017

I left some comments nothing major

Thanks @s1monw. I pushed 772a513.

@s1monw

s1monw approved these changes Feb 20, 2017

Copy link
Contributor

left a comment

LGTM

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Feb 20, 2017

@jasontedor looks awesome

jasontedor added some commits Feb 20, 2017

Merge branch 'master' into introduce-sequence-number-aware-translog
* master:
  Mark IP range aggregator test as awaits fix
  Add note and link to 'tune for disk usage' (#23252)

@jasontedor jasontedor merged commit 4c2bd5f into elastic:master Feb 20, 2017

1 of 2 checks passed

elasticsearch-ci Build started sha1 is merged.
Details
CLA Commit author is a member of Elasticsearch
Details

@jasontedor jasontedor deleted the jasontedor:introduce-sequence-number-aware-translog branch Feb 20, 2017

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Feb 20, 2017

Thanks @nik9000, @bleskes, and @s1monw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.