Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index.data_path setting #8819

Merged
merged 1 commit into from Dec 16, 2014
Merged

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Dec 8, 2014

This allows specifying the path an index will be at.

index.data_path is specified in the settings when creating an index,
and can not be dynamically changed.

An example request would look like:

POST /myindex
{
  "settings": {
    "number_of_shards": 2,
    "data_path": "/tmp/myindex"
  }
}

And would put data in /tmp/myindex/0/index/0 and /tmp/myindex/0/index/1

Since this can be used to write data to arbitrary locations on disk, it
requires enabling the node.enable_custom_paths setting in
elasticsearch.yml on all nodes.

I found that the NodeEnvironment abstraction works well for index-specific
data paths, and passing the index settings in to the various methods gives
us more flexibility in the future with regard to adding any other environment-
specific settings.

@dakrone
Copy link
Member Author

dakrone commented Dec 8, 2014

I've left out documenting this feature, as I suspect the settings/configuration may look different after feedback on this PR, before merging this I will write documentation.

@rjernst
Copy link
Member

rjernst commented Dec 8, 2014

Why is the templating needed? This seems like something a user should never mess with? For example, restoring snapshotted index would then not work on a cluster that didn't have the same template setup..

@dakrone
Copy link
Member Author

dakrone commented Dec 8, 2014

Why is the templating needed? This seems like something a user should never mess with?

The templating is another feature because we originally discussed shard-level folder settings, it's not required for anything. I agree it does add complexity and I'd be totally fine with removing it. @clintongormley do you know if there's a reason we need to support custom templating for shard-specific folders, or can we stick with the default template all the time?

@s1monw s1monw self-assigned this Dec 8, 2014
@@ -115,6 +115,10 @@ public ActionRequestValidationException validate() {
if (number_of_replicas != null && number_of_replicas < 0) {
validationException = addValidationError("index must have 0 or more replica shards", validationException);
}
String customTemplate = settings.get("data_template", settings.get(IndexMetaData.SETTING_DATA_PATH_TEMPLATE, null));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can data_template be a constant?

@s1monw
Copy link
Contributor

s1monw commented Dec 9, 2014

@dakrone I left a bunch of comments but everything minor. I also wonder if we should remove the templating for now and maybe make it a different PR just to push out the discussion? What do you think?

@dakrone
Copy link
Member Author

dakrone commented Dec 9, 2014

@s1monw sounds reasonable to me, I will add a commit to remove it and address the feedback

@dakrone dakrone force-pushed the custom-index-data-path3 branch 2 times, most recently from da29e79 to 4c715a7 Compare December 9, 2014 12:21
@dakrone dakrone changed the title Add index.data_path and index.data_template settings Add index.data_path setting Dec 9, 2014
@dakrone
Copy link
Member Author

dakrone commented Dec 9, 2014

@s1monw I've removed the templating and added a custom data_path usage 30% of the time in ElasticsearchIntegrationTests, as well as addressing your other comments.

@s1monw
Copy link
Contributor

s1monw commented Dec 9, 2014

left one comment other than that LGTM

/**
* Tests for custom data path locations and templates
*/
@ElasticsearchIntegrationTest.ClusterScope(scope = ElasticsearchIntegrationTest.Scope.SUITE, maxNumDataNodes=1, minNumDataNodes=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why a single data node? If you need that, you can use numDataNodes=1 instead

@dakrone dakrone force-pushed the custom-index-data-path3 branch 3 times, most recently from ab221dc to 517736a Compare December 12, 2014 15:08
@dakrone
Copy link
Member Author

dakrone commented Dec 12, 2014

I've squashed, rebased (since master was heavily changed), and added a node-level setting node.enable_custom_paths (defaults to false) for allowing or disallowing custom data paths.

@s1monw can you take one last look? None of the original logic has been changed.

@rjernst
Copy link
Member

rjernst commented Dec 12, 2014

@dakrone Why do we need one setting controlling another? They are both in yml, so anyone who can change that at the node level can change both?

@rjernst
Copy link
Member

rjernst commented Dec 12, 2014

Oh I see...one is at the index level and one at yml...still, is it really necessary?

@dakrone
Copy link
Member Author

dakrone commented Dec 12, 2014

Oh I see...one is at the index level and one at yml...still, is it really necessary?

index.data_path is set per-index, not in elasticsearch.yml at all. node.enable_custom_paths in elasticsearch.yml (as you mentioned, just clarifying).

As to whether it's necessary, yes, because this is intended for mixed clusters where some indices use the path.data directory, and others use a (perhaps slower or faster) disk.

This allows specifying the path an index will be at.

`index.data_path` is specified in the settings when creating an index,
and can not be dynamically changed.

An example request would look like:

POST /myindex
{
  "settings": {
    "number_of_shards": 2,
    "data_path": "/tmp/myindex"
  }
}

And would put data in /tmp/myindex/0/index/0 and /tmp/myindex/0/index/1

Since this can be used to write data to arbitrary locations on disk, it
requires enabling the `node.enable_custom_paths` setting in
elasticsearch.yml on all nodes.
@dakrone dakrone merged commit b2ec19a into elastic:master Dec 16, 2014
@dakrone dakrone added the v1.5.0 label Dec 16, 2014
@dakrone
Copy link
Member Author

dakrone commented Dec 17, 2014

This is having trouble on Windows, I have reverted it for now while I look into it, I'll open a new PR when fixed.

@clintongormley clintongormley added :Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. and removed review v1.5.0 v2.0.0-beta1 labels Mar 19, 2015
@dakrone dakrone deleted the custom-index-data-path3 branch April 6, 2015 17:50
@clintongormley clintongormley added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants