Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Allow to seal an index #10032
In a lot of use cases indices are used for indexing only for a limited amount of time. Ie. in the daily index use case indices are created with a highish number of shards to scale out indexing and then after a couple of days these indices are idle in terms of writing. Yet, we still keep all the resources open since we are accepting writes at any time. This not necessary in a lot of cases and would allow for a large amount of optimizations:
@kimchy I think we can. To me it's a 2 staged process, first we switch the index read only using the flag we have and once we are there and the cluster state has been published we seal the index which causes the actual optimizations to happen. so I think we can just reuse it?
adding this is also going to make full-cluster restart as well as rolling restarts likely instant. Even for the non-timeseries data / logging case for full restarts we can seal all the indices, shutdown, restart & unseal. This also works for rolling restarts if folks can afford having read only indices which I think is reasonable im most cases since the restart will be pretty fast. very promising!
This is great! Our use case wouldn't allow us to seal an index outside of a rolling restart window or some other temporary maintenance action but we can absolutely get away with sealing them all for an hour or so.
So this is a great solution for us!
referenced this issue
Mar 18, 2015
referenced this issue
Mar 19, 2015
yes it's absolutely possible to unseal and the operation should be very fast. ie. makeing a cluster state update essentially.
We had some internal discussions how to implement this and I wanted to make sure they are recorded here on the issue. Sealing an index happens basically on two levels, the index and the shard level.
Index Level sealing
On the index level we use a
This also requires the entire cluster to be on a version that supports and understands index sealing otherwise this feature will not be available. (we have the ability to check this via
The seal process is essentially a cluster state update (setting the block and the id) that waits for all shards to respond. This is very similar to how deleting an index works today. We issue the cluster state update that subsequently gets propagated to all the nodes in the cluster. Inside
Once the seal operation is issued we set
This is also very similar to the delete logic which is currently implemented in
Shard Level sealing
Shard level sealing
For this we are currently planning to use ref-counting similar to what we do on
The good news is that due to the cluster block (read only settings) no new indexing operations can be issues such that we will reach 0 eventually. Certainly this requires reference counting (
At that point the index is sealed and no write operation can be submitted to the index anymore.
The unseal operation pretty much reverses the sealing. We process a cluster state update that marks the index as
On a shard level we basically
Today recovery is very resource heavy and often super slow since we don't know if two shards are identical on a document level ie. did all operations reach the replica or not. We can tell on a lucene segment level but the segments are different on all replicas unless we copied the over which takes a huge amount of time. With index sealing we basically mark the replicas as
Luckily implementing fast recovery on top of the sealing is very straight forward. Basically what we need is an extension of the
For safety reasons, if any operations exist in the transaction log we can't utilize the seal ID for fast recovery. Any operation in the translog indicates an illegal state in the context of the seal ID or in other words it breaks the seal. For instance if an old replica is started on a node that was sealed before but the primary is already accepting writes again we can in theory only recover from the transaction log but for the initial iteration we should skip this optimization. In the future we might be even able to extend this process to issue seal commits on a per shard level while accepting writes.
Optimizing / Force Merge on a Sealed index
For the time based indices usecase it's important to run
Proposed work items
I hope I covered all the moving parts at least on a high level. if there are any questions feel free to ask. Once we basically agree I will move this to the issue itself.
Discussing this we came up with a new and simpler plan, which works independently of the cluster update. This gist of it is to have a best effort operation to sync the commit points both primaries and replicas. This "synced flush" is guaranteed to succeed if there are no concurrent indexing operations but will fail gracefully if there are. The result is a marker (sync id) on lucene commit points which allows us to shortcut the phase1 of recoveries which will give us the desired speed up. Since this is a best effort approach we can trigger it when ever a shard becomes inactive or in regular, longish intervals (say 30m) or any other time (TBD).
Solution sketch (this is a shard operation):
[x] -> in feature branch https://github.com/elastic/elasticsearch/tree/feature/synced_flush