New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add `elasticsearch-translog` CLI tool with `truncate` command #19342

Merged
merged 1 commit into from Jul 26, 2016

Conversation

Projects
None yet
5 participants
@dakrone
Member

dakrone commented Jul 8, 2016

This adds the bin/elasticsearch-translog bin file that will be used
for CLI tasks pertaining to Elasticsearch. Currently it implements only
a single sub-command, truncate-translog, that creates a truncated
translog for a given folder.

Here's what running the tool looks like:

λ bin/elasticsearch-translog truncate -d data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/
Checking existing translog files
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!   WARNING: Elasticsearch MUST be stopped before running this tool   !
!                                                                     !
!   WARNING:    Documents inside of translog files will be lost       !
!                                                                     !
!   WARNING:          The following files will be DELETED!            !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-10.tlog
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-18.tlog
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-21.tlog
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-12.ckp
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-25.ckp
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-29.tlog
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-2.tlog
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-5.tlog
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-41.ckp
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-6.ckp
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-37.ckp
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-24.ckp
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-11.ckp

Continue and DELETE files? [y/N] y
Reading translog UUID information from Lucene commit from shard at [data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index]
Translog Generation: 3
Translog UUID      : AxqC4rocTC6e0fwsljAh-Q
Removing existing translog files
Creating new empty checkpoint at [data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog.ckp]
Creating new empty translog at [data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-3.tlog]
Done.

It also includes a -b batch operation that can be used to skip the
confirmation diaglog.

Resolves #19123

@mikemccand

View changes

core/src/main/java/org/elasticsearch/index/translog/TruncateTranslogCommand.java Outdated
final IndexWriterConfig iwc = new IndexWriterConfig(null);
iwc.setOpenMode(IndexWriterConfig.OpenMode.APPEND);
try (Directory dir = FSDirectory.open(location, NativeFSLockFactory.INSTANCE)) {
try (IndexWriter iw = new IndexWriter(dir, iwc)) {

This comment has been minimized.

@mikemccand

mikemccand Jul 14, 2016

Contributor

Instead of using IndexWriter here, you could use DirectoryReader.listCommits (there should be at most IndexCommit returned) and then call IndexCommit.getUserData() instead. Seems safer since DirectoryReader cannot do any writing on the index, doesn't acquire the write lock, etc.

This comment has been minimized.

@dakrone

dakrone Jul 14, 2016

Member

Okay, that sounds much better!

At the same time though, acquiring the write lock would be good, because even though there is a warning that this should not be run when ES is running, trying the lock seems like it would be a good idea

This comment has been minimized.

@jasontedor

jasontedor Jul 14, 2016

Member

It won't always be the case that there will be one index commit, sequence numbers will change this assumption.

This comment has been minimized.

@mikemccand

mikemccand Jul 14, 2016

Contributor

Hmm, although, it's nice to acquire the write.lock since this will "confirm" that ES is in fact not running (or at least that this index is not opened).

Maybe, higher up, you could obtain the write.lock, and hold it for the entire duration of listing files, deleting, etc., and if the write lock obtain fails, notify the user to triple-check that ES is not running?

Something like:

try (dir.obtainLock(IndexWriter.WRITE_LOCK_NAME) {
  .. do scary stuff ..
} catch (LockObtainFailedException lofe) {
  .. notify user to triple-check ..
}

should work I think?

This comment has been minimized.

@dakrone

dakrone Jul 14, 2016

Member

It won't always be the case that there will be one index commit, sequence numbers will change this assumption.

If there are multiple commits, what does IndexWriter.getCommitData() return? I am guessing it reads the "latest" commit's data?

This comment has been minimized.

@mikemccand

mikemccand Jul 14, 2016

Contributor

If there are multiple commits, what does IndexWriter.getCommitData() return? I am guessing it reads the "latest" commit's data?

Yes, the latest.

This comment has been minimized.

@mikemccand

mikemccand Jul 14, 2016

Contributor

It won't always be the case that there will be one index commit, sequence numbers will change this assumption.

OK. DirectoryReader.listCommits returns the index commits in order, so we could use the last one here?

This comment has been minimized.

@mikemccand

mikemccand Jul 14, 2016

Contributor

At the same time though, acquiring the write lock would be good, because even though there is a warning that this should not be run when ES is running, trying the lock seems like it would be a good idea

Definitely, +1

@jdconrad

This comment has been minimized.

Contributor

jdconrad commented Jul 14, 2016

As someone not familiar with much about the trans-log, I'm curious as to when you would want to use this command, maybe you give me an example? Thanks in advance.

@dakrone

This comment has been minimized.

Member

dakrone commented Jul 14, 2016

As someone not familiar with much about the trans-log, I'm curious as to when you would want to use this command, maybe you give me an example? Thanks in advance.

Sure! So starting in 5.x, if your translog gets corrupted in any way (human error, bitflip on the hard drive, etc), without this you should have to trash the entire shard. This is because you can no longer just delete the translog if you don't care about losing the data, you also can't replace the corrupt translog with a clean one from another shard (in contains the shard UUID so it would know and flip out).

This tool allows you to accept the loss of all the documents in the translog by creating a new, empty translog for the given shard, so you can at least recover the data in the shard, even if you lose the translog documents.

@mikemccand

View changes

core/src/main/java/org/elasticsearch/index/translog/TruncateTranslogCommand.java Outdated
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x2b,
// 0 ops (int) since this will be an empty translog
0x00, 0x00, 0x00, 0x00 };

This comment has been minimized.

@mikemccand

mikemccand Jul 14, 2016

Contributor

Can we somehow factor out these bytes w/ e.g. Checkpoint.java so that if we change the header of the checkpoint / translog files, this tool is fixed too? Otherwise if we fail to do this, a user could run this tool and it makes a corrupt translog instead!

@mikemccand

View changes

core/src/main/java/org/elasticsearch/index/translog/TruncateTranslogCommand.java Outdated
terminal.println("! WARNING: Elasticsearch MUST be stopped before running this tool !");
terminal.println("! !");
terminal.println("! WARNING: The following files will be DELETED! !");
terminal.println("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!");

This comment has been minimized.

@mikemccand

mikemccand Jul 14, 2016

Contributor

Maybe also explain that previously indexed documents will be lost?

@dakrone

This comment has been minimized.

Member

dakrone commented Jul 14, 2016

@mikemccand thanks for taking a look! I pushed three more commits that changes this to:

  • Use DirectoryReader.listCommits() and IndexCommit.getUserData() to read the UUID and generation without opening an IndexWriter
  • Hold the lock open for the duration of the tool running to prevent the shard from being changed while this is running
  • Enhance the warning message to mention that documents in the translogs will be lost
  • Use the Checkpoint class to write the empty checkpoint instead of a raw byte array
throw new ElasticsearchException("unable to find a shard at [" + idxLocation + "], which must exist and be a directory");
}
// Hold the lock open for the duration of the tool running

This comment has been minimized.

@mikemccand

mikemccand Jul 15, 2016

Contributor

Hmm, this isn't actually acquiring the write lock ... it's just opening the directory.

To get the write lock you should also do Lock writeLock = dir.obtainLock(IndexWriter.WRITE_LOCK_NAME) ... I think you can do it as another resource acquired inside this try (the Lock.close method releases the lock).

Maybe add a test confirming that trying to run this tool on a still-open index fails?

This comment has been minimized.

@dakrone

dakrone Jul 15, 2016

Member

Oh okay! I thought the directory automatically acquired the lock. I'll add this and add a test that checks it as you suggested.

@mikemccand

View changes

core/src/main/java/org/elasticsearch/index/translog/TruncateTranslogCommand.java Outdated
/** Write a checkpoint file to the given location with the given generation */
public static void writeEmptyCheckpoint(Path filename, long translogGeneration) throws IOException {
try (OutputStreamDataOutput out = new OutputStreamDataOutput(new FileOutputStream(filename.toFile()))) {
Checkpoint emptyCheckpoint = new Checkpoint(43, 0, translogGeneration);

This comment has been minimized.

@mikemccand

mikemccand Jul 15, 2016

Contributor

Maybe add a comment explaining 43? Or, better, maybe derive it programatically, e.g. have writeEmptyTranslog return the number of bytes it wrote?

@mikemccand

View changes

core/src/main/java/org/elasticsearch/index/translog/TruncateTranslogCommand.java Outdated
public class TruncateTranslogCommand extends SettingCommand {
/** A prefix of the bytes for an empty translog, the translog UUID is appended afterward */
private static byte[] EMPTY_TRANSLOG_PREFIX = new byte[] {

This comment has been minimized.

@mikemccand

mikemccand Jul 15, 2016

Contributor

Can we also use single source for these bytes? E.g., maybe factor out a static TranslogWriter.writeHeader method or something?

@mikemccand

View changes

core/src/main/java/org/elasticsearch/index/translog/TruncateTranslogCommand.java Outdated
public static void writeEmptyTranslog(Path filename, String translogUUID) throws IOException {
try (FileOutputStream out = new FileOutputStream(filename.toFile())) {
out.write(EMPTY_TRANSLOG_PREFIX);
out.write(translogUUID.getBytes());

This comment has been minimized.

@mikemccand

mikemccand Jul 15, 2016

Contributor

I think you should fsync after writing these bytes, e.g. out.getChannel().force(true), like the translog code does? Would be horribly unlucky if the user ran this tool then their box crashed and it left the translog corrupt.

@mikemccand

View changes

core/src/main/java/org/elasticsearch/index/translog/TruncateTranslogCommand.java Outdated
public static void writeEmptyCheckpoint(Path filename, long translogGeneration) throws IOException {
try (OutputStreamDataOutput out = new OutputStreamDataOutput(new FileOutputStream(filename.toFile()))) {
Checkpoint emptyCheckpoint = new Checkpoint(43, 0, translogGeneration);
emptyCheckpoint.write(out);

This comment has been minimized.

@mikemccand

mikemccand Jul 15, 2016

Contributor

And also Channel.force(true) here?

@mikemccand

This comment has been minimized.

Contributor

mikemccand commented Jul 15, 2016

Thanks @dakrone, I left a few more comments, I think it's close!

@dakrone

This comment has been minimized.

Member

dakrone commented Jul 15, 2016

Thanks @mikemccand! I pushed more commits:

  • Actually acquire the write lock this time (and add a test for it that fails it if didn't acquire the lock)
  • Fsync all the things that need fsyncing
  • Don't use any raw byte arrays, goes through TranslogWriter and returns the length instead of hardcoding "43" in the checkpoint
@mikemccand

This comment has been minimized.

Contributor

mikemccand commented Jul 15, 2016

LGTM, thanks @dakrone!

@jasontedor

View changes

core/src/main/java/org/elasticsearch/cli/ESToolCli.java Outdated
import org.elasticsearch.node.internal.InternalSettingsPreparer;
public class ESToolCli extends MultiCommand {

This comment has been minimized.

@jasontedor

jasontedor Jul 15, 2016

Member

Sorry for stepping into this review right at the end, but I'm curious what the reasoning is behind having a single tool command is instead of a command per tool?

This comment has been minimized.

@dakrone

dakrone Jul 15, 2016

Member

We don't have to maintain as many .bat and shell script files. We also don't pollute the $PATH with multiple tools that all have separate names (naming is hard!). That's my 2 cents for it

This comment has been minimized.

@clintongormley

clintongormley Jul 15, 2016

Member

It's also likely that we will extend this tool later on to add more commands

This comment has been minimized.

@jasontedor

jasontedor Jul 15, 2016

Member

We don't have to maintain as many .bat and shell script files.

This is correct, but I'm not convinced that maintenance will be reduced. The concern I have with a single tool command is that it means any time that we do need to do maintenance, we now need to consider 1, 2, ..., n downstream commands.

We also don't pollute the $PATH with multiple tools that all have separate names (naming is hard!).

Prefixing with elasticsearch- pretty much solves that.

This comment has been minimized.

@dakrone

dakrone Jul 15, 2016

Member

The concern I have with a single tool command is that it means any time that we do need to do maintenance, we now need to consider 1, 2, ..., n downstream commands.

I'm not sure I follow, how would this be different than having it in separate tools?

This comment has been minimized.

@dakrone

dakrone Jul 15, 2016

Member

They're all separate classes with separate CLI option parsing and can be in different packages, etc. They're totally separate commands?

This comment has been minimized.

@jasontedor

jasontedor Jul 15, 2016

Member

We are talking about them using the same script, and anyway as constructed they will share settings and log configuration.

This comment has been minimized.

@dakrone

dakrone Jul 15, 2016

Member

I can take out the logging configuration and System.getProperty if you think that would make it better. I don't think it matters too much since they will use Terminal for "printing" and any tool that does need settings should take a path.conf directory option the way that the migrate tool does in the security plugin to construct them appropriately.

This comment has been minimized.

@dakrone

dakrone Jul 20, 2016

Member

Okay, @clintongormley and I talked about this and decided that for the time being, we'll move this to a separate tool (I'm going to go with elasticsearch-translog as the name, with a subcommand of truncate)

This comment has been minimized.

@jasontedor

jasontedor Jul 21, 2016

Member

Okay, @clintongormley and I talked about this and decided that for the time being, we'll move this to a separate tool

+1

Add 'elasticsearch-translog' CLI tool with 'translog' command
This adds the `bin/elasticsearch-translate` bin file that will be used
for CLI tasks pertaining to Elasticsearch. Currently it implements only
a single sub-command, `truncate-translog`, that creates a truncated
translog for a given folder.

Here's what running the tool looks like:

```
λ bin/elasticsearch-translog truncate -d data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/
Checking existing translog files
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!   WARNING: Elasticsearch MUST be stopped before running this tool   !
!                                                                     !
!   WARNING:    Documents inside of translog files will be lost       !
!                                                                     !
!   WARNING:          The following files will be DELETED!            !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-10.tlog
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-18.tlog
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-21.tlog
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-12.ckp
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-25.ckp
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-29.tlog
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-2.tlog
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-5.tlog
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-41.ckp
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-6.ckp
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-37.ckp
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-24.ckp
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-11.ckp

Continue and DELETE files? [y/N] y
Reading translog UUID information from Lucene commit from shard at [data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index]
Translog Generation: 3
Translog UUID      : AxqC4rocTC6e0fwsljAh-Q
Removing existing translog files
Creating new empty checkpoint at [data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog.ckp]
Creating new empty translog at [data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-3.tlog]
Done.
```

It also includes a `-b` batch operation that can be used to skip the
confirmation diaglog.

Resolves #19123

@dakrone dakrone changed the title from Add 'elasticsearch-tool' CLI tool with 'truncate-translog' command to Add 'elasticsearch-translog' CLI tool with 'truncate' command Jul 26, 2016

@dakrone dakrone merged commit ac53c90 into elastic:master Jul 26, 2016

1 of 2 checks passed

elasticsearch-ci Build finished.
Details
CLA Commit author has signed the CLA
Details

@clintongormley clintongormley changed the title from Add 'elasticsearch-translog' CLI tool with 'truncate' command to Add `elasticsearch-translog` CLI tool with `truncate` command Jul 29, 2016

@dakrone dakrone deleted the dakrone:translog-cli branch Aug 16, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment