Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize indexing in create once and never update scenarios #19813

Closed
3 of 10 tasks
bleskes opened this issue Aug 4, 2016 · 10 comments
Closed
3 of 10 tasks

Optimize indexing in create once and never update scenarios #19813

bleskes opened this issue Aug 4, 2016 · 10 comments
Labels
:Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. Meta

Comments

@bleskes
Copy link
Contributor

bleskes commented Aug 4, 2016

The Elasticsearch API supports all CRUD operations on every single document, topped by things like optimistic versioning controls and real time get API. To achieve this we need several data structures that support looking up existing documents both in real time (before a refresh) and if not found there, we need to do a lookup by ID in Lucene to make sure we find and replace any existing document. For certain use cases, if our users can guarantee that documents are inserted once and are never changed again (like, for example logging, metrics or a one off re-indexing of existing data) this turns out to be a significant overhead in terms of indexing speed, without adding any real value. We sometimes refer to these scenarios as the "append only usecase" as well.

In the past, we have tried to introduce optimizations in this area, but had to go and revert them as their have proven to be unsafe under cetrain race/error condition in our distributed system.

Since then things have evolved and we now have more tools at our disposal. This issue is to track the list of things it will take to optimize the append only use case, while keeping it safe. Most concretely, we should explore not maintaining the LiveVersionMap and always use addDocument (rather then updateDocument) when indexing to Lucene. Some of the things listed here are already done. Some require more work.

The list is mostly focused on dealing with the main issue in this cases: when an indexing operation is sent to the primary, it maybe be that the sending node is disconnect from the node with a primary and thus have no idea if the operation was successful or failed. In those cases, the node will resend it's request once the connection is restored. This may result in a duplicate delivery on the primary as thus requires the usage of updateDocument. It is easy to add a 'isRetry` flag on those retry requests but we still run the danger of the retry request being processed before the original, in which case we will have duplicates. This scenario needs to be avoided at all costs.

  • Disable optimization when shard is recovering and do things as we do today. During recovery document may be written twice to a shard.
  • Rely in primary terms to solve collisions between request originating from an old primary and a retry.
  • Rely on the task manager to avoid have an original request and a retry request executing concurrently on a node. For this we need to make sure they use the same task id and reject/have one wait on another.

Having dealt with concurrency, we need to deal with the case where the retry request is already executed and removed form the task manager. This can be done as follows:

  • Introduce a new “connection generation” id (a long) that is incremented every time a connection is (re)established between the nodes. This generation id will be added to every request and will also be exchanged as part of establishing a connection, making sure the target node is aware of the new generation id before exposing the connection outside of the transport service.
  • Networking threads on the target node will check that the generation id of a request is still the current one after adding the request to the task manager. If this is not the case the request will be removed from the task manager and will be failed (though there is no way to respond now, because the channel is closed). This guarantees that an old request will never be executed after a new connection was established. This in turn means it can not be processed after a retry request was processed.

Instead of the above we went with a simpler approach of dealing with concurrency:

  • add a timestamp to each request with an autogen id and create a "safety" barrier guaranteeing that all problematic requests (i.e., an original request that is processed after a retry is request) have a lower timestamp. This allows us to identify them and do the the right things. See details below
  • Use an index level settings to indicate to the engine it can disable the LiveVersionMap and use addDocument, when the shard is started. This can be updated on a closed index.
  • Disable real time get API on these indices.
  • Disable indexing with an id.
  • ~~Disable delete & update API ~~

Instead of disabling single doc update/delete/index with id, we have managed to come up with a scheme to skip adding append only operations to the LiveVersionMap and marking it as unsafe. Whenever a non-append only indexing operation comes along we detect that and force a refresh. See #27752

@abeyad
Copy link

abeyad commented Aug 4, 2016

Disable real time get API and optimistic concurrency control on these indices.

I was wondering what the reason is that real-time gets wouldn't work with the solutions outlined above?

@s1monw
Copy link
Contributor

s1monw commented Aug 4, 2016

I was wondering what the reason is that real-time gets wouldn't work with the solutions outlined above?

We do RAM heavy accounting in the engine to allow GET operations to work in real-time vs. near-real-time. If we optimize for append only I think the need for realtime GET is a corner case and we can make use of the RAM for indexing instead. Hope that helps?

@abeyad
Copy link

abeyad commented Aug 4, 2016

@s1monw makes sense, thank you!

@clintongormley clintongormley added the :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. label Aug 11, 2016
@s1monw
Copy link
Contributor

s1monw commented Aug 29, 2016

@bleskes I was looking into the problem of more than once delivery due to retries on the primary. The entire task manager / network generation ideas that I had might work but I'd want to discuss a potentially simpler and more elegant way of doing this. IMO we just need a "happens before" relationship on the target shard. For instance if we send a document that is has the canHaveDuplicates property set (we can set it on the primary when we retry or when we recover from translog) we remember for instance the sequence_id or maybe the timestamp something that is the same on the duplicates and inside the engine we just deoptimize (use updateDocument) if we see a document with a deoptimizeProperty =< minSeenDeoptimizeProperty this might then hit other potentially unrelated documents which is ok but it would guarantee that we always deoptimize on the one that looses the race? this sounds simpler to me instead of adding stuff to the network layer etc?

@bleskes
Copy link
Contributor Author

bleskes commented Aug 29, 2016

@s1monw that sounds very promising. I would love it if we can come up with something which is limited to the engine working together with the replication code and leave the networking code/task management out of it.

Let me try and reword what you say, to make sure we are on the same page.

On the engine level, we already lock based on doc id, so we don't have to worry about concurrency on that level (as it seems this is not an indexing bottleneck , we can keep this lock around, at least for now). With this in place, we need to make sure that an "optimized" request is never executed (either at all, or in an unsafe manner) after a "retry" request has been completed - i.e., we need some barrier which the "retry" request leaves behind which blocks the "optimized" request. A naive implementation would be to have a map of doc ids for which "optimized" requests should be dealt with carefully. This obviously problematic is there is now clear way to indicate when the map should be cleaned. Instead the idea is to do it as follows:

  1. Mark every request with a timestamp. This is done once on the first node that receives a request and is fixed for this request. This can be even the machine local time (see why later). The important part is that retry requests will have the same value as the original one.
  2. In the engine we make sure we keep the highest seen time stamp of "retry" requests. This is updated while the retry request has its doc id lock. Call this highestRetriedTimestamp
  3. When the engine runs an "optimized" request comes, it compares it's timestamp with the current highestRetriedTimestamp (but doesn't update it). If the the request timestamp is higher it is safe to execute it as optimized (no retry request with the same timestamp has been run before). If not we fall back to "non-optimzed" mode and run the request as a retry one.

The first and last step guarantees that we will never run an "optimized" request after a "retry" request on the same doc (because the will have the same timestamp)/

The roughly monotonic nature of the timestamp (which can be different on different nodes, and can go back in time) limits the potentially secondary damage where a retry request cause other unrelated "optimized" requests to run as retries.

@s1monw are we on the same page. If so 🏆 🎉 :)

@s1monw
Copy link
Contributor

s1monw commented Aug 29, 2016

🎉 ++ that's it :)

s1monw added a commit to s1monw/elasticsearch that referenced this issue Aug 29, 2016
If elasticsearch controls the ID values as well as the documents
version we can optimize the code that adds / appends the documents
to the index. Essentially we an skip the version lookup for all
documents unless the same document is delivered more than once.

On the lucene level we can simply call IndexWriter#addDocument instead
of #updateDocument but on the Engine level we need to ensure that we deoptimize
the case once we see the same documetn more than once.

This is done as follows:

1. Mark every request with a timestamp. This is done once on the first node that
receives a request and is fixed for this request. This can be even the
machine local time (see why later). The important part is that retry
requests will have the same value as the original one.

2. In the engine we make sure we keep the highest seen time stamp of "retry" requests.
This is updated while the retry request has its doc id lock. Call this `highestDeOptimzeAddDocumentTimestamp`

3. When the engine runs an "optimized" request comes, it compares it's timestamp with the
current `highestDeOptimzeAddDocumentTimestamp` (but doesn't update it). If the the request
timestamp is higher it is safe to execute it as optimized (no retry request with the same
timestamp has been run before). If not we fall back to "non-optimzed" mode and run the request as a retry one
and update the `highestDeOptimzeAddDocumentTimestamp` unless it's been updated already to a higher value

Closes elastic#19813
s1monw added a commit that referenced this issue Sep 1, 2016
If elasticsearch controls the ID values as well as the documents
version we can optimize the code that adds / appends the documents
to the index. Essentially we an skip the version lookup for all
documents unless the same document is delivered more than once.

On the lucene level we can simply call IndexWriter#addDocument instead
of #updateDocument but on the Engine level we need to ensure that we deoptimize
the case once we see the same document more than once.

This is done as follows:

1. Mark every request with a timestamp. This is done once on the first node that
receives a request and is fixed for this request. This can be even the
machine local time (see why later). The important part is that retry
requests will have the same value as the original one.

2. In the engine we make sure we keep the highest seen time stamp of "retry" requests.
This is updated while the retry request has its doc id lock. Call this `maxUnsafeAutoIdTimestamp`

3. When the engine runs an "optimized" request comes, it compares it's timestamp with the
current `maxUnsafeAutoIdTimestamp` (but doesn't update it). If the the request
timestamp is higher it is safe to execute it as optimized (no retry request with the same
timestamp has been run before). If not we fall back to "non-optimzed" mode and run the request as a retry one
and update the `maxUnsafeAutoIdTimestamp` unless it's been updated already to a higher value

Relates to #19813
s1monw added a commit to s1monw/elasticsearch that referenced this issue Sep 1, 2016
Today we force refresh on get calls if the document has changes that are
not yet refreshed. This might have implications and uses might want to disable
this to ensure cluster stability. We also can, if we don't need to serve realtime
requests at all optimize indexing to not populate the version map in such a case and
use more memory for the index writer buffers to speed up indexing.

Relates to elastic#19813
@clintongormley
Copy link

@bleskes can this issue be closed now or is there more to do?

@s1monw
Copy link
Contributor

s1monw commented Sep 17, 2016

I think we should update the description we haven't done the live version map stuff yet

@bleskes
Copy link
Contributor Author

bleskes commented Sep 18, 2016

@s1monw ++ . I have updated the issue description.

@s1monw
Copy link
Contributor

s1monw commented Sep 19, 2016

thx @bleskes

@s1monw s1monw closed this as completed in d941c64 Dec 15, 2017
s1monw added a commit that referenced this issue Dec 15, 2017
Today we still maintain a version map even if we only index append-only
or in other words, documents with auto-generated IDs. We can instead maintain
an un-safe version map that will be swapped to a safe version map only if necessary
once we see the first document that requires access to the version map. For instance:
 * a auto-generated id retry
 * any kind of deletes
 * a document with a foreign ID (non-autogenerated

In these cases we forcefully refresh then internal reader and start maintaining
a version map until such a safe map wasn't necessary for two refresh cycles.
Indices / shards that never see an autogenerated ID document will always meintain a version
map and in the case of a delete / retry in a pure append-only index the version map will be
de-optimized for a short amount of time until we know it's safe again to swap back. This
will also minimize the requried refeshes.

Closes #19813
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. Meta
Projects
None yet
Development

No branches or pull requests

4 participants