Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node should start up despite of a lingering `.es_temp_file` #21210

Merged
merged 5 commits into from Aug 1, 2017

Conversation

Projects
None yet
9 participants
@bleskes
Copy link
Member

commented Oct 31, 2016

When ES starts up we verify we can write to all data files by creating and deleting a temp file called .es_temp_file. If for some reason the file was successfully created but not successfully deleted, we still shut down correctly but subsequent start attempts will fail with a file already exists exception.

This PR makes sure to first clean any existing .es_temp_file

Superseeds #21007

Node should start up despite of a lingering `.es_temp_file`
When ES starts up we verify we can write to all data files by creating and deleting a temp file called `.es_temp_file`. If for some reason the file was successfully created but not successfully deleted, we still shut down correctly but subsequent start attempts will fail with a file already exists exception.

 This commit makes sure to first clean any existing `.es_temp_file`
@bleskes

This comment has been minimized.

Copy link
Member Author

commented Oct 31, 2016

@s1monw can you take a look please?

@jasontedor

This comment has been minimized.

Copy link
Member

commented Oct 31, 2016

If we are going to make this change, I think we should make the same change for the atomic move temp file.

@bleskes

This comment has been minimized.

Copy link
Member Author

commented Oct 31, 2016

@jasontedor agreed. Pushed another commit

@@ -891,11 +894,12 @@ public void ensureAtomicMoveSupported() throws IOException {
final NodePath[] nodePaths = nodePaths();
for (NodePath nodePath : nodePaths) {
assert Files.isDirectory(nodePath.path) : nodePath.path + " is not a directory";
final Path src = nodePath.path.resolve("__es__.tmp");
final Path target = nodePath.path.resolve("__es__.final");
final Path src = nodePath.path.resolve(TEMP_FILE_NAME + ".src");

This comment has been minimized.

Copy link
@s1monw

s1monw Nov 1, 2016

Contributor

maybe don't change the name if you want to delete it first

try {
Files.deleteIfExists(src);

This comment has been minimized.

Copy link
@s1monw

s1monw Nov 1, 2016

Contributor

if we do this, then we can also just do if (Files.exists(src)) here?

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Nov 1, 2016

left some comments... I really wonder if we should do this by default or if we should do this only if somebody specifies a cmd option / system property?

@bleskes

This comment has been minimized.

Copy link
Member Author

commented Nov 1, 2016

I really wonder if we should do this by default or if we should do this only if somebody specifies a cmd option / system property?

I think it's good for these tests not to have side effects on failures? what's your concern?

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Nov 1, 2016

I think it's good for these tests not to have side effects on failures? what's your concern?

the failure is what a complete node going down due to powerloss? Pretty rare... I am concerned we hide bugs if we just go and add such a leniency

@bleskes

This comment has been minimized.

Copy link
Member Author

commented Nov 1, 2016

I am concerned we hide bugs if we just go and add such a leniency

Not sure I follow - we want to make sure that we can write to those folders? I think potentially deleting files in those folders is inline with the check and just strengthen it?

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Nov 1, 2016

Not sure I follow - we want to make sure that we can write to those folders? I think potentially deleting files in those folders is inline with the check and just strengthen it?

we have a precondition that is not met, if it's there it's an exceptional situation and potentially a bug.

@perlei

This comment has been minimized.

Copy link

commented Jan 16, 2017

Im experiencing this with elasticsearch 5.1.1 sometimes when i restart the VM that elasticsearch is running on (Well this is a dev-vm so it might be resets, i dont know, but it happens from time to time), it can not start, and I get the following :

 Caused by: java.nio.file.FileAlreadyExistsException: /mnt/data/elasticsearch/data/nodes/0/indices/dZ4wmoCmQtCM-PiD7pn8QQ/2/_state/.es_temp_file
 	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88) ~[?:?]
 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
 	at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[?:?]
 	at java.nio.file.Files.newByteChannel(Files.java:361) ~[?:1.8.0_112]
 	at java.nio.file.Files.createFile(Files.java:632) ~[?:1.8.0_112]
 	at org.elasticsearch.env.NodeEnvironment.tryWriteTempFile(NodeEnvironment.java:1033) ~[elasticsearch-5.1.1.jar:5.1.1]
 	at org.elasticsearch.env.NodeEnvironment.assertCanWrite(NodeEnvironment.java:1019) ~[elasticsearch-5.1.1.jar:5.1.1]
 	at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:276) ~[elasticsearch-5.1.1.jar:5.1.1]
 	at org.elasticsearch.node.Node.<init>(Node.java:249) ~[elasticsearch-5.1.1.jar:5.1.1]
 	at org.elasticsearch.node.Node.<init>(Node.java:229) ~[elasticsearch-5.1.1.jar:5.1.1]
 	at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:214) ~[elasticsearch-5.1.1.jar:5.1.1]
 	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:214) ~[elasticsearch-5.1.1.jar:5.1.1]
 	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:306) ~[elasticsearch-5.1.1.jar:5.1.1]
 	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-5.1.1.jar:5.1.1]

My workaround is to add the following line:

find /mnt/data/elasticsearch/data/ -name .es_temp_file -print | xargs -n 1 rm -rf

To the elasticsearch startup script.

@clintongormley clintongormley added v5.2.1 and removed v5.2.0 labels Jan 24, 2017

@clintongormley clintongormley removed the v5.4.1 label May 15, 2017

@clintongormley clintongormley added v5.4.3 and removed v5.4.2 labels Jun 14, 2017

@clintongormley clintongormley added v5.4.4 and removed v5.4.3 labels Jun 27, 2017

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jun 29, 2017

@bleskes I think I changed my mind, lets do this. The reason why I think we should do it is that if you press ctrl+c while stuff is starting up we might leak the file. It's ok to delete it first since we retry to write it immediately and that is it's only purpose, to ensure we can write.

@s1monw

s1monw approved these changes Jun 29, 2017

@bleskes bleskes merged commit 9f1d116 into elastic:master Aug 1, 2017

2 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci Build finished.
Details

@bleskes bleskes deleted the bleskes:node_cleanup_temp branch Aug 1, 2017

bleskes added a commit that referenced this pull request Aug 2, 2017

Node should start up despite of a lingering `.es_temp_file` (#21210)
When ES starts up we verify we can write to all data folders and that they support atomic moves. We do so by creating and deleting temp files. If for some reason the files was successfully created but not successfully deleted, we still shut down correctly but subsequent start attempts will fail with a file already exists exception.

This commit makes sure to first clean any existing temporary files.

Superseeds #21007

bleskes added a commit that referenced this pull request Aug 2, 2017

Node should start up despite of a lingering `.es_temp_file` (#21210)
When ES starts up we verify we can write to all data folders and that they support atomic moves. We do so by creating and deleting temp files. If for some reason the files was successfully created but not successfully deleted, we still shut down correctly but subsequent start attempts will fail with a file already exists exception.

This commit makes sure to first clean any existing temporary files.

Superseeds #21007

bleskes added a commit that referenced this pull request Aug 2, 2017

Node should start up despite of a lingering `.es_temp_file` (#21210)
When ES starts up we verify we can write to all data folders and that they support atomic moves. We do so by creating and deleting temp files. If for some reason the files was successfully created but not successfully deleted, we still shut down correctly but subsequent start attempts will fail with a file already exists exception.

This commit makes sure to first clean any existing temporary files.

Superseeds #21007

@lcawl lcawl removed the v6.1.0 label Dec 12, 2017

@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.