Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flush IndexWriter to disk on close and shutdown #7563

Merged
merged 1 commit into from Feb 18, 2015

Conversation

Projects
None yet
5 participants
@s1monw
Copy link
Contributor

s1monw commented Sep 3, 2014

Today we trash everything that has been indexed but not flushed to disk
if the engine is closed. This might not be desired if we shutting down a
node for restart / upgrade or if we close / archive an index. In such a
case we would like to flush the transaction log and commit everything to
disk. This commit adds a flag to the close method that is set on close
and shutdown but not when we remove the shard due to relocations.

@s1monw s1monw added v2.0.0 labels Sep 3, 2014

@kimchy

This comment has been minimized.

Copy link
Member

kimchy commented Sep 3, 2014

LGTM

As an aside, we might want to experiment with moving stopping the DiscoveryService to stop before InternalIndicesService in the stop ordering we have (and probably cluster state service as well). If closing indices service will not take longer, potentially it might make sense to leave the cluster before.

@s1monw

This comment has been minimized.

Copy link
Contributor Author

s1monw commented Sep 8, 2014

I am pushing this out to 1.5 - I need to do some more testing

@mikemccand

This comment has been minimized.

Copy link
Contributor

mikemccand commented Oct 1, 2014

I think it's important to get this in soon, otherwise the "always check index on close" is quite weakened since for many tests this will mean there is no Lucene index actually created...

@s1monw

This comment has been minimized.

Copy link
Contributor Author

s1monw commented Oct 1, 2014

this patch has a lot of problems still - lots of tests fail etc. I need to get back to it though

@s1monw s1monw force-pushed the s1monw:flush_on_close branch Oct 22, 2014

@clintongormley clintongormley added :Engine and removed review labels Nov 11, 2014

@s1monw s1monw force-pushed the s1monw:flush_on_close branch Feb 17, 2015

@s1monw

This comment has been minimized.

Copy link
Contributor Author

s1monw commented Feb 17, 2015

@kimchy @mikemccand I rebased this on current master and run tests. All tests pass now and nothing gets stuck anymore. This is likey due to some changes we did to the lock ordering. I think it's pretty ready, I wonder we should add a backdoor on a per index basis if flush should be called or not?

@mikemccand

This comment has been minimized.

Copy link
Contributor

mikemccand commented Feb 17, 2015

LGTM

@kimchy

This comment has been minimized.

Copy link
Member

kimchy commented Feb 17, 2015

LGTM, +1 for a setting to disable this just in case

@s1monw s1monw force-pushed the s1monw:flush_on_close branch Feb 17, 2015

@s1monw

This comment has been minimized.

Copy link
Contributor Author

s1monw commented Feb 17, 2015

@kimchy pushed a new commit

@kimchy

View changes

src/main/java/org/elasticsearch/index/engine/Engine.java Outdated
logger.trace("flushAndClose now acquired writeLock");
try {
logger.debug("flushing shard on close - this might take some time to sync files to disk");
flush();

This comment has been minimized.

Copy link
@kimchy

kimchy Feb 17, 2015

Member

should we think about the fact that flush might not be allowed since a recovery is happening (that will eventually get cancelled since we are closing the engine)

@bleskes

View changes

src/main/java/org/elasticsearch/index/IndexService.java Outdated
@@ -388,7 +388,7 @@ private void closeShardInjector(String reason, ShardId sId, Injector shardInject
// and close the shard so no operations are allowed to it
if (indexShard != null) {
try {
indexShard.close(reason);
indexShard.close(reason, deleted.get());

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 17, 2015

Member

the deleted flag is only set when an index is deleted. The PR description says we don't want to flush upon relocation. I'm a bit on the fence w.r.t flush on relocation. On one hand it would be good to leave things consistent. On the other hand it may delay things needlessly (this runs on the cluster state thread). In anyway we should change the description of the PR we keep this way..

@s1monw

This comment has been minimized.

Copy link
Contributor Author

s1monw commented Feb 17, 2015

@kimchy I catch the exceptin now....

@s1monw

This comment has been minimized.

Copy link
Contributor Author

s1monw commented Feb 17, 2015

@bleskes can you take another look I fixed the condition when we actually flush

@@ -388,7 +388,8 @@ private void closeShardInjector(String reason, ShardId sId, Injector shardInject
// and close the shard so no operations are allowed to it
if (indexShard != null) {
try {
indexShard.close(reason);
final boolean flushEngine = deleted.get() == false && closed.get(); // only flush we are we closed (closed index or shutdown) and if we are not deleted

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 17, 2015

Member

+1 on the closed indices use case. Good catch. I'm not sure it has effect now. Did some sniffing and we close the shards in too many places upon index close. First place in IndicesClusterStateService.applyDeletedShards, then we have another iteration in the beginning of IndicesClusterStateService.applyCleanedIndices :

for (IndexService indexService : indicesService) {
            String index = indexService.index().getName();
            IndexMetaData indexMetaData = event.state().metaData().index(index);
            if (indexMetaData != null && indexMetaData.state() == IndexMetaData.State.CLOSE) {
                for (Integer shardId : indexService.shardIds()) {
                    logger.debug("[{}][{}] removing shard (index is closed)", index, shardId);
                    try {
                        indexService.removeShard(shardId, "removing shard (index is closed)");
                    } catch (Throwable e) {
                        logger.warn("[{}] failed to remove shard (index is closed)", e, index);
                    }
                }
            }
        }

and only then we do indicesService.removeIndex(index, reason); which closes the index (but it has no shards any more..)

@bleskes

This comment has been minimized.

Copy link
Member

bleskes commented Feb 17, 2015

@s1monw good catch on the closed indices. Left a comment about it.

@s1monw

This comment has been minimized.

Copy link
Contributor Author

s1monw commented Feb 17, 2015

@bleskes the biggest effect it has when we shutdown the nodes so I guess that's just fine then?

logger.debug("flushing shard on close - this might take some time to sync files to disk");
try {
flush(); // TODO we might force a flush in the future since we have the write lock already even though recoveries are running.
} catch (FlushNotAllowedEngineException ex) {

This comment has been minimized.

Copy link
@bleskes

bleskes Feb 17, 2015

Member

do we also need to catch EngineClosedException, in case someone calls closeNoLock in the mean time? in this case this method should be a noop, right?

This comment has been minimized.

Copy link
@s1monw

s1monw Feb 17, 2015

Author Contributor

I think it's important to bubble this up here we don't guarantee that this wont' throw this.

This comment has been minimized.

Copy link
@s1monw

s1monw Feb 17, 2015

Author Contributor

I added a catch clause...

@bleskes

This comment has been minimized.

Copy link
Member

bleskes commented Feb 17, 2015

@s1monw I think closing and opening and index has just as good use case - you want to avoid translog replays. We can clean the delete logic in that case in another PR - just make sure it's not lost...

@s1monw

This comment has been minimized.

Copy link
Contributor Author

s1monw commented Feb 17, 2015

@bleskes I added a catch clause for EngineClosed... I think it's ready

@bleskes

This comment has been minimized.

Copy link
Member

bleskes commented Feb 17, 2015

@s1monw +1. Thx. I'll pick up the close indices change I mentioned. I think it's important.

@s1monw

This comment has been minimized.

Copy link
Contributor Author

s1monw commented Feb 18, 2015

I added some more randomization in the mock engine that sometimes calls flushAndClose() in close() and viseversa... all tests pass I will push it soon

@s1monw

This comment has been minimized.

Copy link
Contributor Author

s1monw commented Feb 18, 2015

I ran into one test issue where the test relied on replaying the translog. i think this is a general problem since it's not guaranteed that if you use dynamic mappings that the mapping is acked by the master. We might run into several other problems here along the lines. @bleskes WDYT should we fix the mapping ack first?

[ENGINE] Flush IndexWriter to disk on close and shutdown
Today we trash everything that has been indexed but not flushed to disk
if the engine is closed. This might not be desired if we shutting down a
node for restart / upgrade or if we close / archive an index. In such a
case we would like to flush the transaction log and commit everything to
disk. This commit adds a flag to the close method that is set on close
and shutdown but not when we remove the shard due to relocations

@s1monw s1monw force-pushed the s1monw:flush_on_close branch to ddd16de Feb 18, 2015

@s1monw s1monw merged commit ddd16de into elastic:master Feb 18, 2015

1 check passed

CLA Commit author is a member of Elasticsearch
Details
@bleskes

This comment has been minimized.

Copy link
Member

bleskes commented Feb 23, 2015

@s1monw sorry for the late response. Yeah, it's a general problem, which will be amplified by this. Since we plan to do #8688 this will not be an issue anymore. We could add a wait for out going mapping update logic, ala what we do during recovery. Doubting about the urgency of it and whether we should do it given the plans for #8688 and the fact that if there is another active replica it will force the local closed copy to resync and be consistent (note that this is not the case for closed indices where we shut down all the shard copies).

@s1monw

This comment has been minimized.

Copy link
Contributor Author

s1monw commented Feb 23, 2015

@bleskes lets just fix #8688 asap

@clintongormley clintongormley changed the title [ENGINE] Flush IndexWriter to disk on close and shutdown Flush IndexWriter to disk on close and shutdown Jun 7, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.