Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for a default alias #22472

Closed
scottsom opened this issue Jan 6, 2017 · 3 comments
Closed

Add support for a default alias #22472

scottsom opened this issue Jan 6, 2017 · 3 comments
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates discuss

Comments

@scottsom
Copy link

scottsom commented Jan 6, 2017

This feature seeks to add the concept of a default alias. When configured, it would allow users to perform an action on an alias or index that does not exist and have it fallback to some default configuration instead of throwing an error.

Problem

In the Designing for Scale guide, it is suggested to use an alias per user, which allows you to easily move the big users to their own dedicated index.

There are a few problems with this approach:

  • When indexing documents, it requires special client code to check if the alias exists and if not, create it.
  • It can consume a significant amount of cluster state as the number of users in our index grows. As called out in the guide, cluster state does not scale.

To some extent, this is the problem we were looking to solve in #22274 by allowing users to be dispersed to a subset of shards instead of just one when using custom routing. By picking a larger partition size then you don't need to pull out the big users into their own dedicated index. If all users fit in the shared index then you don't need aliases.

However, that has some drawbacks too. First, it would be an operational burden to just have a single giant index (e.g. to do a reindex you need to double your cluster size). Second, you would pick your partition size to be as large as your expected biggest user (plus some buffer). This can be problematic for imbalanced user distributions since it leads to a sub-optimal partitioning scheme. For example, imagine you had one huge system account or just a few large users and a long-tail for the rest. This means most of the users are hitting more shards per request than they need to.

Proposal

When creating an alias per user, they generally all look the same with one value changing:

{
  "index": "shared_v1",
  "routing": "<alias>",
  "filter": {
    "term": {
      "user_id": "<alias>"
    }
  }
}

We can use this to accept what is essentially an alias template such that we can compute the alias at request time rather than maintaining the state for all of them.

For example, you might define a default alias like this:

# POST /_aliases
{
    "default_alias" : {
        "index": "shared_v1",
        "routing": true,
        "field": "user_id"
    }
}

When a request is made against an index or alias then Elasticsearch will attempt to resolve it as it does now. If the resolution fails then it checks for the default alias definition and uses that instead of throwing an exception.

If routing is true then the alias is passed as the custom routing value.
If field is set then that field is used in a term filter with the alias as the value.

For example, searching on the non-existent alias, 42, with the above configuration is equivalent to having created the alias:

{
  "index": "shared_v1",
  "routing": "42",
  "filter": {
    "term": {
      "user_id": "42"
    }
  }
}

The procedure for moving a big user (42) to a dedicated index would look like this:

  1. Create a new index, 42_v1.
  2. Copy all data from shared_v1 to 42_v1 where user_id is 42.
  3. Create the alias, 42, pointing to 42_v1 with no routing or term filter.
  4. Delete documents from shared_v1 where user_id is 42.

Finally, if you want to recreate the shared index then you can do:

  1. Create a new index, shared_v2.
  2. Copy all data from shared_v1 to shared_v2.
  3. Update the default alias to point to shared_v2.
  4. Drop shared_v1.

Now you can grow to millions of users and our aliases are essentially a whitelist of the big users who need a dedicated index while the other users remain stateless.

@scottsom
Copy link
Author

scottsom commented Jan 9, 2017

Any thoughts/concerns with this? @jpountz or @s1monw

Alternatively, the API could just accept an actual template to make it more flexible:

# POST /_aliases
{
  "default_alias" : {
    "index": "shared_v1",
    "routing": "{{alias}}",
    "filter": {
      "term": {
        "user_id": "{{alias}}"
      }
    }
  }
}

@s1monw
Copy link
Contributor

s1monw commented Jan 16, 2017

@clintongormley didn't we discuss this last friday, do you wanna comment?

@clintongormley
Copy link

HI @scottsom

We had a long discussion about this in our FixItFriday session. First thing to note is that any application can add filtering and custom routing on top of Elasticsearch today, so there isn't a requirement for this to be implemented in Elasticsearch.

Second, we weren't crazy about the idea of a default alias, as most people use an Elasticsearch cluster for more than one purpose. We thought about adding pattern matching for index names (eg user_* but that suddenly gets complicated (eg multiple matching patterns).

We also thought about the possibility of doing (eg):

PUT my_user|my_dynamic_alias/my_type/1

If the my_user index exists, then it'd be used, otherwise it'd fall back to plugging my_user into the my_dynamic_alias (along the lines of what you've suggested).

This of course complicates what gets logged etc

Another possibility was to have something like an ingest processor which could look up the destination for each request from an index.

All in all, we decided that this was a complicated feature, with a limited audience, with many knobs that users would want to twiddle in different ways, and which can be implemented application side. As such, we're not going to go down this road.

thanks anyway for the idea

@clintongormley clintongormley added :Data Management/Indices APIs APIs to create and manage indices and templates and removed :Aliases labels Feb 13, 2018
@scottsom scottsom mentioned this issue Apr 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates discuss
Projects
None yet
Development

No branches or pull requests

3 participants