Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove TranslogService and fold it into synchronous IndexShard API #13707

Merged
merged 1 commit into from Sep 23, 2015

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Sep 22, 2015

This commit moves the size and ops based flush into a synchronous API into
IndexShard and removes the time-based flush alltogether since it' basically
covered by the inactive async flush API we have today. The functionality doesn't
need to be covered by scheduled task and async APIs while we can actually make all
the decisions in a sync manner which is way easier to control and to test.

indexShard.sync(location);
}
if (indexShard.flushPending()) {
threadPool.executor(ThreadPool.Names.FLUSH).execute(() -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can create a storm of tasks pending to be executed on the thread pool. For example, if a flush is needed, and we are doing 5k indexing requests per second, until the thread get scheduled, and the flush actually executes to "cleanup" the flushPending (can take time), there will be thousands of tasks pending on the thread pool, eventually executing on a thread and potentially not doing anything. If active indexing is happening all the time and flushing is continuously needed, we will have an exponential number of thread switches to execute a flush.

Might it make sense to have an AtomicBoolean like behavior encapsulated in IndexShard? an "executing flush" like semantics?

@bleskes
Copy link
Contributor

bleskes commented Sep 22, 2015

I'm +1 on removing the time component for translog flushing which leaves translog service without any real value. Folding what's left into index shard also makes sense. I'm a bit concerned that flushing is now left to the TransportReplicationAction. It doesn't feel like that's where this logic should be as the replication action doesn't really need to know what the index shard is doing with the write operations. Instead I proposed adding this logic to index shard it self, where all write operations go through as well.

I share Shay's concern about the flush storm that may happen. In the spirit of the new java 8 lambda functionality it feels like we need a debounce utility :)

@s1monw
Copy link
Contributor Author

s1monw commented Sep 22, 2015

Instead I proposed adding this logic to index shard it self, where all write operations go through as well.

I should have added this to the issue this can't go into index shard since it would completely ruin the cleanup and add more crazy state to it sorry we have to find a different solution. I had the same concerns about the thread pool while I was AFK ....I will work on a better solution

@s1monw
Copy link
Contributor Author

s1monw commented Sep 22, 2015

I pushed a new commit that prevent the storms

asyncFlushRunning.compareAndSet(true, false);
}
};
if (shouldFlush()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this double check seems redundant? we check it right before in the first line of the method? will simplify a bit the code, since there is no need for an else?

@kimchy
Copy link
Member

kimchy commented Sep 22, 2015

left a minor comment, LGTM otherwise

@bleskes
Copy link
Contributor

bleskes commented Sep 22, 2015

LGTM2. Agreed with Shay's comment.

This commit moves the size and ops based flush into a synchronous API into
IndexShard and removes the time-based flush alltogether since it' basically
covered by the inactive async flush API we have today. The functionality doesn't
need to be covered by scheduled task and async APIs while we can actually make all
the decisions in a sync manner which is way easier to control and to test.

Closes elastic#13707
@s1monw s1monw merged commit 75e8164 into elastic:master Sep 23, 2015
s1monw added a commit that referenced this pull request Sep 23, 2015
The check prevents a race condition since we can't use real locks here.
Relates to #13707
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 24, 2015
… marked as inactive

The IndexingMemoryController checks periodically if there is any indexing activity on the shard. If no activity is sean for 5m (default) the shard is marked as inactive allowing it's indexing buffer quota to given to other active shards.

Sadly the current check is bad as it checks for 0 translog operation. This makes the inactive wait for a flush to happen - which used to take 30m and since elastic#13707 doesn't happen at all (as we rely on the synced flush triggered by inactivity). This commit fixes the check so it will work with any translog size.
bleskes added a commit that referenced this pull request Sep 24, 2015
… marked as inactive

The IndexingMemoryController checks periodically if there is any indexing activity on the shard. If no activity is sean for 5m (default) the shard is marked as inactive allowing it's indexing buffer quota to given to other active shards.

Sadly the current check is bad as it checks for 0 translog operation. This makes the inactive wait for a flush to happen - which used to take 30m and since #13707 doesn't happen at all (as we rely on the synced flush triggered by inactivity). This commit fixes the check so it will work with any translog size.

Closes #13759
@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Translog :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v5.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants