Remove TranslogService and fold it into synchronous IndexShard API #13707

s1monw · 2015-09-22T09:17:30Z

This commit moves the size and ops based flush into a synchronous API into
IndexShard and removes the time-based flush alltogether since it' basically
covered by the inactive async flush API we have today. The functionality doesn't
need to be covered by scheduled task and async APIs while we can actually make all
the decisions in a sync manner which is way easier to control and to test.

kimchy · 2015-09-22T09:41:47Z

core/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java

+            indexShard.sync(location);
+        }
+        if (indexShard.flushPending()) {
+            threadPool.executor(ThreadPool.Names.FLUSH).execute(() -> {


this can create a storm of tasks pending to be executed on the thread pool. For example, if a flush is needed, and we are doing 5k indexing requests per second, until the thread get scheduled, and the flush actually executes to "cleanup" the flushPending (can take time), there will be thousands of tasks pending on the thread pool, eventually executing on a thread and potentially not doing anything. If active indexing is happening all the time and flushing is continuously needed, we will have an exponential number of thread switches to execute a flush.

Might it make sense to have an AtomicBoolean like behavior encapsulated in IndexShard? an "executing flush" like semantics?

bleskes · 2015-09-22T11:38:25Z

I'm +1 on removing the time component for translog flushing which leaves translog service without any real value. Folding what's left into index shard also makes sense. I'm a bit concerned that flushing is now left to the TransportReplicationAction. It doesn't feel like that's where this logic should be as the replication action doesn't really need to know what the index shard is doing with the write operations. Instead I proposed adding this logic to index shard it self, where all write operations go through as well.

I share Shay's concern about the flush storm that may happen. In the spirit of the new java 8 lambda functionality it feels like we need a debounce utility :)

s1monw · 2015-09-22T12:16:15Z

Instead I proposed adding this logic to index shard it self, where all write operations go through as well.

I should have added this to the issue this can't go into index shard since it would completely ruin the cleanup and add more crazy state to it sorry we have to find a different solution. I had the same concerns about the thread pool while I was AFK ....I will work on a better solution

s1monw · 2015-09-22T13:06:11Z

I pushed a new commit that prevent the storms

kimchy · 2015-09-22T13:38:20Z

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+                        asyncFlushRunning.compareAndSet(true, false);
+                    }
+                };
+                if (shouldFlush()) {


this double check seems redundant? we check it right before in the first line of the method? will simplify a bit the code, since there is no need for an else?

kimchy · 2015-09-22T13:38:40Z

left a minor comment, LGTM otherwise

bleskes · 2015-09-22T14:03:28Z

LGTM2. Agreed with Shay's comment.

This commit moves the size and ops based flush into a synchronous API into IndexShard and removes the time-based flush alltogether since it' basically covered by the inactive async flush API we have today. The functionality doesn't need to be covered by scheduled task and async APIs while we can actually make all the decisions in a sync manner which is way easier to control and to test. Closes elastic#13707

The check prevents a race condition since we can't use real locks here. Relates to #13707

… marked as inactive The IndexingMemoryController checks periodically if there is any indexing activity on the shard. If no activity is sean for 5m (default) the shard is marked as inactive allowing it's indexing buffer quota to given to other active shards. Sadly the current check is bad as it checks for 0 translog operation. This makes the inactive wait for a flush to happen - which used to take 30m and since elastic#13707 doesn't happen at all (as we rely on the synced flush triggered by inactivity). This commit fixes the check so it will work with any translog size.

… marked as inactive The IndexingMemoryController checks periodically if there is any indexing activity on the shard. If no activity is sean for 5m (default) the shard is marked as inactive allowing it's indexing buffer quota to given to other active shards. Sadly the current check is bad as it checks for 0 translog operation. This makes the inactive wait for a flush to happen - which used to take 30m and since #13707 doesn't happen at all (as we rely on the synced flush triggered by inactivity). This commit fixes the check so it will work with any translog size. Closes #13759

s1monw added >enhancement review v5.0.0-alpha1 labels Sep 22, 2015

bleskes self-assigned this Sep 22, 2015

kimchy reviewed Sep 22, 2015
View reviewed changes

s1monw force-pushed the shard_level_services branch from 8dcdf15 to 20579c6 Compare September 23, 2015 10:30

s1monw force-pushed the shard_level_services branch from 20579c6 to 75e8164 Compare September 23, 2015 10:39

s1monw merged commit 75e8164 into elastic:master Sep 23, 2015

s1monw added a commit that referenced this pull request Sep 23, 2015

Add back presumably redundant shouldFlush() check.

c32b9c3

The check prevents a race condition since we can't use real locks here. Relates to #13707

clintongormley removed the :Engine label Sep 23, 2015

bleskes mentioned this pull request Sep 24, 2015

Pending operations in the translog prevent shard from being marked as inactive #13759

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove TranslogService and fold it into synchronous IndexShard API #13707

Remove TranslogService and fold it into synchronous IndexShard API #13707

s1monw commented Sep 22, 2015

kimchy Sep 22, 2015

bleskes commented Sep 22, 2015

s1monw commented Sep 22, 2015

s1monw commented Sep 22, 2015

kimchy Sep 22, 2015

kimchy commented Sep 22, 2015

bleskes commented Sep 22, 2015

Remove TranslogService and fold it into synchronous IndexShard API #13707

Remove TranslogService and fold it into synchronous IndexShard API #13707

Conversation

s1monw commented Sep 22, 2015

kimchy Sep 22, 2015

Choose a reason for hiding this comment

bleskes commented Sep 22, 2015

s1monw commented Sep 22, 2015

s1monw commented Sep 22, 2015

kimchy Sep 22, 2015

Choose a reason for hiding this comment

kimchy commented Sep 22, 2015

bleskes commented Sep 22, 2015