Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow an index to be partitioned with custom routing #22274

Merged
merged 7 commits into from Jan 18, 2017

Conversation

@scottsom
Copy link
Contributor

commented Dec 20, 2016

This change makes it possible for custom routing values to go to a subset of shards rather than
just a single shard. This enables the ability to utilize the spatial locality that custom routing can
provide while mitigating the likelihood of ending up with an imbalanced cluster or suffering
from a hot shard.

This is ideal for large multi-tenant indices with custom routing that suffer from one or both of
the following:

  • The big tenants cannot fit into a single shard or there is so many of them that they will likely
    end up on the same shard
  • Tenants often have a surge in write traffic and a single shard cannot process it fast enough

Beyond that, this should also be useful for use cases where most queries are done under the context
of a specific field (e.g. a category) since it gives a hint at how the data can be stored to minimize
the number of shards to check per query. While a similar solution can be achieved with multiple
concrete indices or aliases per value today, those approaches breakdown for high cardinally fields.

A partitioned index enforces that mappings have routing required, that the partition size does not
change when shrinking an index (the partitions will shrink proportionally), and rejects mappings
that have parent/child relationships.

Closes #21585

scottsom added some commits Dec 15, 2016

Allow an index to be partitioned with custom routing
This change makes it possible for custom routing values to go to a subset of shards rather than
just a single shard. This enables the ability to utilize the spatial locality that custom routing can
provide while mitigating the likelihood of ending up with an imbalanced cluster or suffering
from a hot shard.

This is ideal for large multi-tenant indices with custom routing that suffer from one or both of
the following:
- The big tenants cannot fit into a single shard or there is so many of them that they will likely
end up on the same shard
- Tenants often have a surge in write traffic and a single shard cannot process it fast enough

Beyond that, this should also be useful for use cases where most queries are done under the context
of a specific field (e.g. a category) since it gives a hint at how the data can be stored to minimize
the number of shards to check per query. While a similar solution can be achieved with multiple
concrete indices or aliases per value today, those approaches breakdown for high cardinality fields.

A partitioned index enforces that mappings have routing required, that the partition size does not
change when shrinking an index (the partitions will shrink proportionally), and rejects mappings
that have parent/child relationships.

Closes #21585
@jpountz
Copy link
Contributor

left a comment

It looks good to me in general, I like how the change is contained, strict about validation and well tested. The choice that you made to require routing when this feature is used makes sense to me.

However routing can be tricky at times so I'd like @s1monw to have a look at it too.

@@ -178,6 +179,10 @@ public static State fromString(String state) {
public static final Setting<Boolean> INDEX_SHADOW_REPLICAS_SETTING =
Setting.boolSetting(SETTING_SHADOW_REPLICAS, false, Property.IndexScope);

public static final String SETTING_PARTITION_SIZE = "index.partition_size";

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 20, 2016

Contributor

I don't have a better suggestion at the moment, but I wish this name included something like shard or routing to make it clearer what it is about.

This comment has been minimized.

Copy link
@scottsom

scottsom Dec 20, 2016

Author Contributor

Makes sense, perhaps routing_group_size would be more descriptive?

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 21, 2016

Contributor

sounds good to me

This comment has been minimized.

Copy link
@scottsom

scottsom Dec 21, 2016

Author Contributor

Actually, I'll just change it to routing_partition_size since partitioning is a well established concept and "partition" should help draw the connection.


//create index service for parsing and validating "mappings"
Settings dummySettings = Settings.builder()
.put(IndexMetaData.SETTING_VERSION_CREATED, Version.CURRENT)
.put(request.settings)
.put(IndexMetaData.SETTING_NUMBER_OF_SHARDS, 1)
.put(IndexMetaData.SETTING_NUMBER_OF_SHARDS, dummyShards)

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 20, 2016

Contributor

maybe we should just not set IndexMetaData.SETTING_NUMBER_OF_SHARDS explicitly and just inherit from request.settings. I don't think setting the number of shards to 1 is really useful here.

This comment has been minimized.

Copy link
@scottsom

scottsom Dec 20, 2016

Author Contributor

Right, it didn't seem like it being 1 was particularly important but we do need to set a number of shards in situations where only the partition_size is provided.

For example, someone could create a template that omits number_of_shards but includes a partition_size of 10 and this dummy index would default to having a number_of_shards of 5, which fails validation.

I can remove the special handling that is defaulting it to 1 by just changing the dummyShards value to request.settings.getAsInt(IndexMetaData.SETTING_NUMBER_OF_SHARDS, dummyPartitionSize + 1)

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 21, 2016

Contributor

OK I see... Let's keep it this way then.

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Dec 20, 2016

To be clear, it still needs docs to be added, but I am good with waiting for @s1monw to have a look at the impl before looking into adding docs.

@scottsom

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2016

Yes, I was withholding documentation in case there was any major changes but I can start drafting something.

Any preference on where it will go? I was thinking a sub-section on the Create Index page to explain what it does and the limitations. Can also add a page in the Designing for Scale section to explain why you would use it.

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2016

The pages you linked are in the guide, which would be great to update but we usually do not require it. However it is important to update the reference docs. For that particular change, I think we should add most of the documentation in docs/reference/mapping/fields/routing-field.asciidoc (https://www.elastic.co/guide/en/elasticsearch/reference/5.x/mapping-routing-field.html) as well as documenting the index setting in docs/reference/index-modules.asciidoc (https://www.elastic.co/guide/en/elasticsearch/reference/5.x/index-modules.html#_static_index_settings) just below number_of_shards.

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Dec 22, 2016

@scottsom FYI I'd like other people to have a look at this PR who are away for Christmas/new Eve, so we'll probably come back to this PR in about two weeks.

@jpountz jpountz self-assigned this Dec 22, 2016

@scottsom

This comment has been minimized.

Copy link
Contributor Author

commented Dec 22, 2016

No problem, it's likely I won't be replying until then as well. Thanks for the heads up.

@s1monw
Copy link
Contributor

left a comment

I left a bunch of comments. Looks great thanks for going this.

}

public int routingPartitionSize() {
return settings.getAsInt(SETTING_ROUTING_PARTITION_SIZE, -1);

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 3, 2017

Contributor

can't we use INDEX_ROUTING_PARTITION_SIZE_SETTING.get(settings) here instead?

This comment has been minimized.

Copy link
@scottsom

scottsom Jan 3, 2017

Author Contributor

We could but I was going for consistency with the lines above. For example, the numberOfReplicas() and numberOfShards() in the builder each return a value of -1 when they are not set as opposed to the default for the setting.

I got the impression that we would want to return -1 here to let callers differentiate between the default value and an unset value.

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 11, 2017

Contributor

fair enough, can you document this behavior? if you want that can be a second PR where we document this across the class?

@@ -956,6 +980,16 @@ public IndexMetaData build() {
throw new IllegalArgumentException("must specify non-negative number of shards for index [" + index + "]");
}

Integer maybeRoutingPartitionSize = settings.getAsInt(SETTING_ROUTING_PARTITION_SIZE, null);

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 3, 2017

Contributor

just use INDEX_ROUTING_PARTITION_SIZE_SETTING.get(settings) here and you will be fine. you also don't need the routingPartitionSize <= 0 anymore since the setting has a min value of 1 afaik?

@@ -194,12 +195,16 @@ private static void validateAndAddTemplate(final PutRequest request, IndexTempla
Index createdIndex = null;
final String temporaryIndexName = UUIDs.randomBase64UUID();
try {
// use the provided values, otherwise just pick valid dummy values
int dummyPartitionSize = request.settings.getAsInt(IndexMetaData.SETTING_ROUTING_PARTITION_SIZE, 1);

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 3, 2017

Contributor

please use INDEX_ROUTING_PARTITION_SIZE_SETTING.get(request.settings)

throw new IllegalArgumentException("expected a routing value on partitioned index [" + indexMetaData.getIndex().getName() + "]");
}

return generatePartitionedShardId(indexMetaData, routing, calculatePartitionOffset(indexMetaData, id));

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 3, 2017

Contributor

unless I miss something which is possible :) we could share a common codepath here since if #getRoutingPartitionSize() will be 1 in the unpartitioned case. This would simplify this method again and we can turn the check into an assertion that a routing value is provided? we would also get rid of most of the static onliners here (inline them into generateShardId). I think we should strive for a common code-path it's much easier to reason about and JIT might like it too

@@ -75,6 +75,56 @@ public void testGenerateShardId() {
}
}

public void testPartitionedIndexBWC() {

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 3, 2017

Contributor

can we get some more basic tests here not just BWC. I'd love to see some test here that ensures we are getting the right shards back

@scottsom

This comment has been minimized.

Copy link
Contributor Author

commented Jan 6, 2017

@s1monw I've made some changes to reduce the number of code paths in the shard calculation

@s1monw
Copy link
Contributor

left a comment

I left some minor suggestions - looks great


return generatePartitionedShardId(indexMetaData, routing, calculatePartitionOffset(indexMetaData, id));
if (routing == null) {
assert(!indexMetaData.isRoutingPartitionedIndex());

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 9, 2017

Contributor

we usually use indexMetaData.isRoutingPartitionedIndex() == false for easier readability. Also can you add some message to the assert ?

if (indexMetaData.isRoutingPartitionedIndex()) {
partitionOffset = Math.floorMod(Murmur3HashFunction.hash(id), indexMetaData.getRoutingPartitionSize());
} else {
// we would have still got 0 above but this check just saves us an unnecessary hash calculation

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 9, 2017

Contributor

if we'd pass the hash to calculateScaledShardId instead of the string we could remove this conditional since we only calculate the hash once? WDYT?

This comment has been minimized.

Copy link
@scottsom

scottsom Jan 9, 2017

Author Contributor

I think either way we write it, we're always going to have to check this condition somewhere or just always perform the extra hash computation.

My preference is to pass the string to calculateScaledShardId since this method is called from two places (computeTargetedShards and generateShardId). The way it is now allows each of them to just pass a string without each caller duplicating the hash logic before calling it.

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 11, 2017

Contributor

fair enough

@scottsom

This comment has been minimized.

Copy link
Contributor Author

commented Jan 11, 2017

@s1monw I've updated the assert and commented on your other suggestion. Let me know if there is anything else you would like to see changed.

@s1monw

s1monw approved these changes Jan 11, 2017

Copy link
Contributor

left a comment

I left one suggestion. LGTM otherwise. I will wait for @jpountz to pull this in. If @clintongormley has some time to look over the docs that would be fantastic. It's important that we get those straight for the users but we can improve after this has been merged.! thanks so much for doing this

}

public int routingPartitionSize() {
return settings.getAsInt(SETTING_ROUTING_PARTITION_SIZE, -1);

This comment has been minimized.

Copy link
@s1monw

s1monw Jan 11, 2017

Contributor

fair enough, can you document this behavior? if you want that can be a second PR where we document this across the class?

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jan 11, 2017

awesome @scottsom thanks so much

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jan 11, 2017

@jpountz this one is ready IMO feel free to pull it in

@@ -79,6 +79,12 @@ Checking shards may take a lot of time on large indices.
which uses https://en.wikipedia.org/wiki/DEFLATE[DEFLATE] for a higher
compression ratio, at the expense of slower stored fields performance.

`index.routing_partition_size`::

This comment has been minimized.

Copy link
@clintongormley

clintongormley Jan 12, 2017

Member

Add an ID for linking

[[routing-partition-size]]
`index.routing_partition_size`::  
than a single shard. This helps mitigate the risk of ending up with an imbalanced cluster while still
reducing the impact of searches.

This is done by providing the index level setting `index.routing_partition_size` at index creation.

This comment has been minimized.

Copy link
@clintongormley

clintongormley Jan 12, 2017

Member

Link to the setting details:

<<routing-partition-size,`index.routing_partition_size`>> 
@@ -109,3 +109,28 @@ documents with the same `_id` might end up on different shards if indexed with
different `_routing` values.

It is up to the user to ensure that IDs are unique across the index.

==== Routing to an index partition

This comment has been minimized.

Copy link
@clintongormley

clintongormley Jan 12, 2017

Member

Add an ID for linking

[[routing-index-partition]]
==== Routing to an index partition
The number of shards a custom <<mapping-routing-field,routing>> value can go to.
Defaults to 1 and can only be set at index creation time. This value must be less
than the `index.number_of_shards` unless the `index.number_of_shards` value is also 1.

This comment has been minimized.

Copy link
@clintongormley

clintongormley Jan 12, 2017

Member

Add:

See <<routing-index-partition>> for more details about how this setting is used.
@clintongormley

This comment has been minimized.

Copy link
Member

commented Jan 12, 2017

Docs LGTM, other than adding some links between the two sections

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Jan 17, 2017

Tests just passed locally for me, I'll merge tomorrow.

@jpountz jpountz merged commit 372812d into elastic:master Jan 18, 2017

1 check passed

CLA Commit author has signed the CLA
Details

jpountz added a commit that referenced this pull request Jan 18, 2017

Allow an index to be partitioned with custom routing (#22274)
This change makes it possible for custom routing values to go to a subset of shards rather than
just a single shard. This enables the ability to utilize the spatial locality that custom routing can
provide while mitigating the likelihood of ending up with an imbalanced cluster or suffering
from a hot shard.

This is ideal for large multi-tenant indices with custom routing that suffer from one or both of
the following:
- The big tenants cannot fit into a single shard or there is so many of them that they will likely
end up on the same shard
- Tenants often have a surge in write traffic and a single shard cannot process it fast enough

Beyond that, this should also be useful for use cases where most queries are done under the context
of a specific field (e.g. a category) since it gives a hint at how the data can be stored to minimize
the number of shards to check per query. While a similar solution can be achieved with multiple
concrete indices or aliases per value today, those approaches breakdown for high cardinality fields.

A partitioned index enforces that mappings have routing required, that the partition size does not
change when shrinking an index (the partitions will shrink proportionally), and rejects mappings
that have parent/child relationships.

Closes #21585

@scottsom scottsom deleted the scottsom:custom_routing_partitioning branch Jan 19, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.