Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement shard splitting #8912

Closed
nik9000 opened this issue Dec 11, 2014 · 3 comments
Closed

Implement shard splitting #8912

nik9000 opened this issue Dec 11, 2014 · 3 comments

Comments

@nik9000
Copy link
Member

nik9000 commented Dec 11, 2014

You should think about implementing shard splitting so people stop asking for it. I've honestly never been in a situation where I needed it. I've always used the strategy from the old blog post about changing mappings with no downtime when I needed to reshard. This has the advantage of letting me change the mapping and analysis configuration which is much more common for me.

I totally won't be offended if you close this with "yup, still not something we want to do." I'll just reply with a link here rather than repeating myself the next time someone asks. Not that it's all that frequent. Its just been twice in two days now.

@clintongormley
Copy link

Heya @nik9000

This is why I wrote this in the book:

Users often ask why Elasticsearch doesn’t support shard-splitting — the ability to split each shard into two or more pieces. The reason is that shard-splitting is a bad idea:

  • Splitting a shard is almost equivalent to reindexing your data. It’s a much heavier process than just copying a shard from one node to another.
  • Splitting is exponential. You start with one shard, then split into 2, then 4, 8, 16, etc. Splitting doesn’t allow you to increase capacity by just 50%.
  • Shard splitting requires you to have enough capacity to hold a second copy of your index. Usually, by the time you realise that you need to scale out, you don’t have enough free space left to perform the split.

In a way, Elasticsearch does support shard splitting. You can always reindex your data to a new index with the appropriate number of shards (see Reindexing Your Data). It is still a more intensive process than moving shards around, and still requires enough free space to complete, but at least you can control the number of shards in the new index.

We still don't want to implement shard splitting for the above reasons, but we do want to make it easier to reindex your data, by implementing a changes API (#1242) and a reindex API (#492).

@s1monw
Copy link
Contributor

s1monw commented Dec 15, 2014

One thing that might be efficient and reasonable to implement is shard merging from N to 1 ie in the daily indices case where you want multiple shards for write performance but once you are done one shard is enough. this might be something we will consider in the future

@javanna
Copy link
Member

javanna commented Nov 27, 2017

Shard splitting has been implemented in #26931.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants