Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure IndexShard is active during recovery so it gets its fair share of the indexing buffer #16209

Closed
wants to merge 8 commits into from

Conversation

Projects
None yet
4 participants
@mikemccand
Copy link
Contributor

mikemccand commented Jan 25, 2016

This PR is based on 2.2. It's a start for #16206, but I still need to make a test case.

I fixed IndexShard.recovering to forcefully activate the shard (and have IMC re-budget indexing buffers), and also fixed IndexShard.checkIdle to always return false (still active) if the shard is still recovering.

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

mikemccand commented Jan 25, 2016

@bleskes I pushed changes based on your feedback (pushing activation "lower", to when recovery actually begins ... and I changed markLastWrite to use System.nanoTime instead of the operation's start time) can you look? Thanks!

if (active.getAndSet(true) == false) {
// We are currently inactive, but a new write operation just showed up, so we now notify IMC
// to wake up and fix our indexing buffer. We could do this async instead, but cost should
// be low, and it's rare this happens.
indexingMemoryController.forceCheck();
assert engineConfig.getIndexingBufferSize() != IndexingMemoryController.INACTIVE_SHARD_INDEXING_BUFFER;

This comment has been minimized.

Copy link
@bleskes
@bleskes

This comment has been minimized.

Copy link
Member

bleskes commented Jan 25, 2016

LGTM. I wonder if we can add an easy unit test test to make sure this doesn't regress. Maybe check that a shard is active post recovery from store in testRecoverFromStore?

Also - although not needed in master, I wonder what parts of this should go there as well? (i.e., mark the shard as active up recovery from primary)

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

mikemccand commented Jan 25, 2016

@bleskes I added a couple unit tests ^^

They fail before the change and pass with the change.

I'll think about master ...

@bleskes

This comment has been minimized.

Copy link
Member

bleskes commented Jan 25, 2016

Awesome. Thanks mike

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

mikemccand commented Jan 25, 2016

Thanks @bleskes, I'll run all tests and push to 2.2 and 2.1.x.

@dadoonet can you maybe test this with your script and see if recovery is as fast as the original indexing again? Thanks!

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

mikemccand commented Jan 25, 2016

Well, some tests are angry ... digging.

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

mikemccand commented Jan 25, 2016

So the rest test (Rest6IT) is angry because, immediately after the local recovery, which just does activate() but does not set the lastWriteNS, the shard goes to inactive and a sync flush is done, giving the shard a sync_id when the test doesn't want one (since it didn't flush).

If I change the activate() to markLastWrite() in internalPerformTranslogRecovery the test is happy again (no sync flush happens, since the shard never goes to inactive again during the test). Or I can fix the test (10_basic.yaml) to accept a sync_id....

Other tests are tripping on the new assert I added, which is spooky, but I fear it could be a concurrency issue where one thread is activating, and then another deactivates, before the first thread gets to its assert ... these failures don't repro ... still digging.

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

mikemccand commented Jan 25, 2016

If I change the activate() to markLastWrite() in internalPerformTranslogRecovery

I'm going to push this ... it simplifies things again because activate() can be folded into markLastWrite() again ... I think the "purity" of separating "I'm writing via my local translog" vs "I'm writing via a peer's xlog" is not worth the strange "active -> inactive -> active" that would happen as a result.

But I'm still struggling with:

Other tests are tripping on the new assert I added,

Many tests seem to trip on this, where a shard becomes active, asks IMC to set its buffer, but after that it still has an inactive buffer ... but only intermittently ... I'm still not sure why. I'll dig some more, but this is possibly a pre-existing issue (I'll confirm) and we can perhaps decouple from this blocker.

@bleskes

This comment has been minimized.

Copy link
Member

bleskes commented Jan 25, 2016

+1 to just use markLastWrite.. Make sense to me.

On 25 jan. 2016 8:41 PM +0100, Michael McCandlessnotifications@github.com, wrote:

If I change theactivate()tomarkLastWrite()ininternalPerformTranslogRecovery

I'm going to push this ... it simplifies things again becauseactivate()can be folded intomarkLastWrite()again ... I think the "purity" of separating "I'm writing via my local translog" vs "I'm writing via a peer's xlog" is not worth the strange "active ->inactive ->active" that would happen as a result.

But I'm still struggling with:

Other tests are tripping on the new assert I added,

Many tests seem to trip on this, where a shard becomes active, asks IMC to set its buffer, but after that it still has an inactive buffer ... but only intermittently ... I'm still not sure why. I'll dig some more, but this is possibly a pre-existing issue (I'll confirm) and we can perhaps decouple from this blocker.


Reply to this email directly orview it on GitHub(#16209 (comment)).

@dadoonet

This comment has been minimized.

Copy link
Member

dadoonet commented Jan 26, 2016

@mikemccand here is what I did:

  • checkout 2.1
  • cherry-picked your commits in 2.1
  • build the distribution
  • run my scenario

I can confirm that everything works perfectly now! Thanks for fixing this Mike

always use markLastWrite when applying translog internally or from pe…
…er; only assert indexing buffer size if we are the thread that activated (inside markLastWrite)
@mikemccand

This comment has been minimized.

Copy link
Contributor Author

mikemccand commented Jan 26, 2016

Thank you for confirming @dadoonet!

I pushed a fix to restrict the new assertion to only the one thread that actually did the activation (invoked IMC's forceCheck), and mvn verify is passing (well at least once), but this is hard to explain the test failures I was seeing because local recovery should be single threaded...

The assert does NOT appear trip if I add it to a clean 2.2 clone, meaning it was only ever tripping when activating during translog replay.

If mvn verify passes again I'm going to push!

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

mikemccand commented Jan 26, 2016

OK I pushed this fix to 2.2 with a disastrously accidental commit message: 9c612d4

Working on 2.1.x backport now ...

mikemccand added a commit that referenced this pull request Jan 26, 2016

Make sure IndexShard is active during recovery so that it gets its fa…
…ir share of the indexing buffer.

Closes #16209

Closes #16206
@mikemccand

This comment has been minimized.

Copy link
Contributor Author

mikemccand commented Jan 26, 2016

I pushed the fix to 2.1 and 2.2, but the new assert still intermittently trips, e.g.: https://build-us-01.elastic.co/job/elastic+elasticsearch+2.2+multijob-intake/128/console

I've commented it out for now, but I still can't explain it.

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

mikemccand commented Jan 26, 2016

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

mikemccand commented Jan 26, 2016

The failures look like this:

 > Caused by: java.lang.AssertionError: active=true state=RECOVERING shard=[test][0]
   >    at __randomizedtesting.SeedInfo.seed([582017D9CD0B9CD1]:0)
   >    at org.elasticsearch.index.shard.IndexShard.markLastWrite(IndexShard.java:1005)
   >    at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:923)
   >    at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:897)
   >    at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:245)
   >    at org.elasticsearch.index.shard.StoreRecoveryService.access$100(StoreRecoveryService.java:56)
   >    at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:129)
   >    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   >    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   >    at java.lang.Thread.run(Thread.java:745)
  1> [2016-01-26 03:08:21,706][INFO ][index          

My only guess at this point is when IMC asks IndicesService for all indices + shards, that somehow this index is not yet visible? I.e., that somehow the StoreRecoveryService is recovering this shard before IndicesService knows about the index ... is that possible?

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

mikemccand commented Jan 26, 2016

Hmm I think for the IndexWithShadowReplicasIT.testIndexWithFewDocuments failure, the node is being closed while (concurrently) an index shard is trying to recover, and so when IMC iterates all active shards, there are none.

@bleskes

This comment has been minimized.

Copy link
Member

bleskes commented Jan 26, 2016

I think your theory about closing while recovering is correct. An index is recovered on a concurrent thread (to not block the cluster service thread). If the shard is removed before it's recovered (which I guess is likely on test cleanup), the shard is first removed from the lookup tables and then shut down. In between those the state of the shard is not closed, but the IMC ignores it. Not sure what a simple fix would be - we don't have access to indexservice from the shard (good!). we can add "pending delete" flag to the indexshard marking it as being deleted (while still in the pool), we can change the the assert to wait on close on failure. All of this feels like an overkill to just removing the assert and just relying on the unit tests.

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

mikemccand commented Jan 26, 2016

Thanks @bleskes ... my inclination would be to just leave the assert off, as long as we can convince ourselves these cases when it trips are not important in practice ...

I'm chasing this failure now, which is different...: https://build-us-01.elastic.co/job/elastic+elasticsearch+2.2+multijob-os-compatibility/os=elasticsearch&&debian/107/console

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

mikemccand commented Jan 26, 2016

I'm chasing this failure now, which is different.

Alright, I've explained this failure, and I'm running tests for a simple fix now.

The issue is that when a shard creates its Engine it pulls the current indexing buffer (and other settings) from its EngineConfig, but then possibly a lot of time can elapse (e.g. replaying local translog) before it finally sets the Engine instances into the shard's AtomicReference currentEngineReference.

During that window, if IMC goes and updates the indexing buffer for this shard, since the current engine reference is still null, that shard will update the EngineConfig but not the engine, and the change is "lost" in the sense that unless IMC further changes the indexing buffer, IndexShard.updateBufferSize's change detection logic will think nothing had actually changed and won't push the change down to the (now initialized) engine.

This is really a pre-existing issue, but my change here exacerbated it by invoking IMC more frequently, especially when N shards are being initialized which happens nearly concurrently.

My proposed fix is this is to call onSettingsChanged() after we finally set the engine instance:

diff --git a/core/src/main/java/org/elasticsearch/index/shard/IndexShard.java b/core/src/main/java/org/elasticsearch/index/shard/IndexShard.java
index 69f575b..26bc3ed 100644
--- a/core/src/main/java/org/elasticsearch/index/shard/IndexShard.java
+++ b/core/src/main/java/org/elasticsearch/index/shard/IndexShard.java
@@ -1433,6 +1433,12 @@ public class IndexShard extends AbstractIndexShardComponent {
             assert this.currentEngineReference.get() == null;
             this.currentEngineReference.set(newEngine(skipTranslogRecovery, config));
         }
+
+        // time elapses after the engine is created above (pulling the config settings) until we set the engine reference, during which
+        // IMC may have updated our indexing buffer, e.g. if N other shards are being created and applying their translogs, and we would
+        // otherwise not realize the index buffer had changed while we were concurrently "startint up" so here we forcefully push any config
+        // chagnes to the new engine:
+        engine().onSettingsChanged();
     }

     protected Engine newEngine(boolean skipTranslogRecovery, EngineConfig config) {

This fixes the above (reproducible, yay!) test failure, and mvn verify seems happy at least once as well...

@bleskes

This comment has been minimized.

Copy link
Member

bleskes commented Jan 26, 2016

+1 on this move. Strictly speaking we are still missing the update to the translog buffer, which is a pain in the... . The current infra doesn't allow fixing it directly without doing something ugly and the IMC will set it right in it's next run. I'm good with letting this go and proceed with your suggestion.

@mikemccand mikemccand added the v2.3.0 label Jan 26, 2016

mikemccand added a commit that referenced this pull request Jan 26, 2016

Make sure IndexShard is active during recovery so that it gets its fa…
…ir share of the indexing buffer.

Closes #16209

Closes #16206

Conflicts:
	core/src/main/java/org/elasticsearch/index/shard/IndexShard.java
@mikemccand

This comment has been minimized.

Copy link
Contributor Author

mikemccand commented Jan 26, 2016

Thanks @bleskes, I pushed that change (I added null check just in case ;) ).

I think this change is done for 2.1, 2.2, 2.x.

I'll open a separate PR for master changes, and resolve this ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.