Add index.data_path setting #8819

dakrone · 2014-12-08T11:40:49Z

This allows specifying the path an index will be at.

index.data_path is specified in the settings when creating an index,
and can not be dynamically changed.

An example request would look like:

POST /myindex
{
  "settings": {
    "number_of_shards": 2,
    "data_path": "/tmp/myindex"
  }
}

And would put data in /tmp/myindex/0/index/0 and /tmp/myindex/0/index/1

Since this can be used to write data to arbitrary locations on disk, it
requires enabling the node.enable_custom_paths setting in
elasticsearch.yml on all nodes.

I found that the NodeEnvironment abstraction works well for index-specific
data paths, and passing the index settings in to the various methods gives
us more flexibility in the future with regard to adding any other environment-
specific settings.

dakrone · 2014-12-08T11:41:32Z

I've left out documenting this feature, as I suspect the settings/configuration may look different after feedback on this PR, before merging this I will write documentation.

rjernst · 2014-12-08T16:08:40Z

Why is the templating needed? This seems like something a user should never mess with? For example, restoring snapshotted index would then not work on a cluster that didn't have the same template setup..

dakrone · 2014-12-08T16:18:31Z

Why is the templating needed? This seems like something a user should never mess with?

The templating is another feature because we originally discussed shard-level folder settings, it's not required for anything. I agree it does add complexity and I'd be totally fine with removing it. @clintongormley do you know if there's a reason we need to support custom templating for shard-specific folders, or can we stick with the default template all the time?

s1monw · 2014-12-09T08:44:01Z

src/main/java/org/elasticsearch/action/admin/indices/create/CreateIndexRequest.java

@@ -115,6 +115,10 @@ public ActionRequestValidationException validate() {
        if (number_of_replicas != null && number_of_replicas < 0) {
            validationException = addValidationError("index must have 0 or more replica shards", validationException);
        }
+        String customTemplate = settings.get("data_template", settings.get(IndexMetaData.SETTING_DATA_PATH_TEMPLATE, null));


can data_template be a constant?

s1monw · 2014-12-09T08:55:05Z

@dakrone I left a bunch of comments but everything minor. I also wonder if we should remove the templating for now and maybe make it a different PR just to push out the discussion? What do you think?

dakrone · 2014-12-09T08:56:28Z

@s1monw sounds reasonable to me, I will add a commit to remove it and address the feedback

dakrone · 2014-12-09T12:23:39Z

@s1monw I've removed the templating and added a custom data_path usage 30% of the time in ElasticsearchIntegrationTests, as well as addressing your other comments.

s1monw · 2014-12-09T13:38:04Z

left one comment other than that LGTM

javanna · 2014-12-11T10:11:32Z

src/test/java/org/elasticsearch/indices/IndicesCustomDataPathTests.java

+/**
+ * Tests for custom data path locations and templates
+ */
+@ElasticsearchIntegrationTest.ClusterScope(scope = ElasticsearchIntegrationTest.Scope.SUITE, maxNumDataNodes=1, minNumDataNodes=1)


why a single data node? If you need that, you can use numDataNodes=1 instead

dakrone · 2014-12-12T15:30:39Z

I've squashed, rebased (since master was heavily changed), and added a node-level setting node.enable_custom_paths (defaults to false) for allowing or disallowing custom data paths.

@s1monw can you take one last look? None of the original logic has been changed.

rjernst · 2014-12-12T21:26:40Z

@dakrone Why do we need one setting controlling another? They are both in yml, so anyone who can change that at the node level can change both?

rjernst · 2014-12-12T21:27:35Z

Oh I see...one is at the index level and one at yml...still, is it really necessary?

dakrone · 2014-12-12T21:42:21Z

Oh I see...one is at the index level and one at yml...still, is it really necessary?

index.data_path is set per-index, not in elasticsearch.yml at all. node.enable_custom_paths in elasticsearch.yml (as you mentioned, just clarifying).

As to whether it's necessary, yes, because this is intended for mixed clusters where some indices use the path.data directory, and others use a (perhaps slower or faster) disk.

This allows specifying the path an index will be at. `index.data_path` is specified in the settings when creating an index, and can not be dynamically changed. An example request would look like: POST /myindex { "settings": { "number_of_shards": 2, "data_path": "/tmp/myindex" } } And would put data in /tmp/myindex/0/index/0 and /tmp/myindex/0/index/1 Since this can be used to write data to arbitrary locations on disk, it requires enabling the `node.enable_custom_paths` setting in elasticsearch.yml on all nodes.

dakrone · 2014-12-17T08:40:27Z

This is having trouble on Windows, I have reverted it for now while I look into it, I'll open a new PR when fixed.

dakrone added review v2.0.0-beta1 >feature labels Dec 8, 2014

s1monw self-assigned this Dec 8, 2014

s1monw reviewed Dec 9, 2014
View reviewed changes

dakrone force-pushed the custom-index-data-path3 branch 2 times, most recently from da29e79 to 4c715a7 Compare December 9, 2014 12:21

dakrone changed the title ~~Add index.data_path and index.data_template settings~~ Add index.data_path setting Dec 9, 2014

javanna reviewed Dec 11, 2014
View reviewed changes

dakrone force-pushed the custom-index-data-path3 branch 3 times, most recently from ab221dc to 517736a Compare December 12, 2014 15:08

dakrone force-pushed the custom-index-data-path3 branch from 517736a to 0d31a3b Compare December 16, 2014 15:17

dakrone mentioned this pull request Dec 16, 2014

Shadow replicas (segment-based replication) #8976

Closed

dakrone force-pushed the custom-index-data-path3 branch from 0d31a3b to b2ec19a Compare December 16, 2014 17:25

dakrone merged commit b2ec19a into elastic:master Dec 16, 2014

dakrone added the v1.5.0 label Dec 16, 2014

dakrone mentioned this pull request Dec 22, 2014

Add index.data_path setting #9033

Merged

clintongormley added :Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. and removed review v1.5.0 v2.0.0-beta1 labels Mar 19, 2015

dakrone deleted the custom-index-data-path3 branch April 6, 2015 17:50

jimmyjones2 mentioned this pull request Jun 4, 2015

Shadow replica documentation #11497

Closed

clintongormley added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add index.data_path setting #8819

Add index.data_path setting #8819

dakrone commented Dec 8, 2014

dakrone commented Dec 8, 2014

rjernst commented Dec 8, 2014

dakrone commented Dec 8, 2014

s1monw Dec 9, 2014

s1monw commented Dec 9, 2014

dakrone commented Dec 9, 2014

dakrone commented Dec 9, 2014

s1monw commented Dec 9, 2014

javanna Dec 11, 2014

dakrone commented Dec 12, 2014

rjernst commented Dec 12, 2014

rjernst commented Dec 12, 2014

dakrone commented Dec 12, 2014

dakrone commented Dec 17, 2014

Add index.data_path setting #8819

Add index.data_path setting #8819

Conversation

dakrone commented Dec 8, 2014

dakrone commented Dec 8, 2014

rjernst commented Dec 8, 2014

dakrone commented Dec 8, 2014

s1monw Dec 9, 2014

Choose a reason for hiding this comment

s1monw commented Dec 9, 2014

dakrone commented Dec 9, 2014

dakrone commented Dec 9, 2014

s1monw commented Dec 9, 2014

javanna Dec 11, 2014

Choose a reason for hiding this comment

dakrone commented Dec 12, 2014

rjernst commented Dec 12, 2014

rjernst commented Dec 12, 2014

dakrone commented Dec 12, 2014

dakrone commented Dec 17, 2014