Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When shard becomes active again, immediately increase its indexing buffer #13918

Merged
merged 9 commits into from
Oct 8, 2015

Conversation

mikemccand
Copy link
Contributor

Spinoff from #13802.

Today, when an index goes from inactive to active, because indexing ops suddenly arrive to an inactive index, we (IndexingMemoryController) take up to 30 seconds to notice this and then increase the indexing buffer from the tiny idle 512KB to "its fair share". This is somewhat dangerous because it could write many, many segments in those 30 seconds...

This PR changes that so the inactive -> active transition instead causes us to immediately re-visit the indexing buffer for all shards.

It also shifts responsibility of tracking active/inactive into IndexShard, and no longer uses the translog ID/ops count to check for changes (I think this is more error prone? E.g. #13802), simplifying it instead to timestamp (System.nanoTime) of the last indexing op.

@mikemccand mikemccand self-assigned this Oct 3, 2015
@@ -975,6 +999,7 @@ public void addFailedEngineListener(Engine.FailedEngineListener failedEngineList
this.failedEngineListener.delegates.add(failedEngineListener);
}

/** Returns true if the indexing buffer size did change */
public void updateBufferSize(ByteSizeValue shardIndexingBufferSize, ByteSizeValue shardTranslogBufferSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this doc is correct - the return type is void

Conflicts:
	core/src/main/java/org/elasticsearch/index/shard/IndexShard.java
	core/src/main/java/org/elasticsearch/index/shard/ShadowIndexShard.java
	core/src/main/java/org/elasticsearch/indices/memory/IndexingMemoryController.java
@mikemccand
Copy link
Contributor Author

Thanks @nik9000 I folded in the feedback.

@Override
public long indexWriterRAMBytesUsed() {
// No IndexWriter
return 0L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if UnsupportedOperationException is a better choice here- we don't have an indexwriter...

@bleskes
Copy link
Contributor

bleskes commented Oct 6, 2015

I love how this turned out, especially around moving away from the translog checks. I left some suggestions here and there. I also think we should move the controller to work on IndexShard objects rather than shard ids (sorry for the scope creep - can be done in a different change). At some point I got worried that a new shard will not be seen as "added" if the original shard was destroyed and a new one created. I think it's OK now (because we only use current state when allocating memory) but it's more correct to use physical instances.

@mikemccand
Copy link
Contributor Author

I also think we should move the controller to work on IndexShard objects rather than shard ids

++

But I thought we used ShardId all over to make this more easily unit-tested? But anyway let's do this on a separate change ... this one is big already :)

@mikemccand
Copy link
Contributor Author

OK I folded feedback @bleskes, thank you!

@@ -288,6 +288,7 @@ public synchronized IndexShard createShard(int sShardId, ShardRouting routing) {
indicesLifecycle.afterIndexShardCreated(indexShard);
settingsService.addListener(indexShard);
shards = newMapBuilder(shards).put(shardId.id(), indexShard).immutableMap();
indexServicesProvider.getIndexingMemoryController().forceCheck();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it and I think we should start the shard as inactive (it is :)) and only mark it as active + the claiming of memory in IndexShard#skipTranslogRecovery / IndexShard#performTranslogRecovery when we actually open the engine for indexing. We also make sure call IndexingMemoryController.forceCheck before we create the engine, so we always have an up to date buffer size ...

Re concerns about n^2 problem, it is there but since we limit the amount of concurrent recoveries a node can do, it means it will be done during a longer process. Also because of the limit it is hard to do anything around batching. All in all, I think we're OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, why not simply start up as inactive, init the buffers to the INACTIVE values, and then the first indexing op that arrives will make it active like normal?

OK that's a good point, that concurrent recoveries are limited anyway, so the O(N^2) cost is paid over a long time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, why not simply start up as inactive, init the buffers to the INACTIVE values, and then the first indexing op that arrives will make it active like normal?

That was my original thought as well, but then I remembered you were very worried about indexing into a small buffer. When one creates an index by indexing into it that index will likely get many indexing requests at once, the first will trigger a forced check and the others will fly through into the engine. Figured we better not have to worry about it but If you're good with that, I'm good :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're good with that, I'm good :)

I'm gonna try :)

@bleskes
Copy link
Contributor

bleskes commented Oct 7, 2015

This is getting great. I left some final comments.

if (iwBytesUsed > shardIndexingBufferSize.bytes()) {
// our allowed buffer was changed to less than we are currently using; we ask IW to refresh
// so it clears its buffers (otherwise it won't clear until the next indexing/delete op)
logger.debug(message + "; now refresh to clear IndexWriter memory");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm used to wrapping debug logging stuff in if (logger.isDebugEnabled()) tests to prevent the message construction in the (very common) case that debug is disabled. Here it probably doesn't matter because message construction is dwarfed by the refresh call.

@mikemccand
Copy link
Contributor Author

OK thanks @nik9000 and @bleskes, I folded in the last round of feedback.

@bleskes
Copy link
Contributor

bleskes commented Oct 7, 2015

LGTM. Awesome mike.

Conflicts:
	core/src/main/java/org/elasticsearch/index/engine/Engine.java
	core/src/main/java/org/elasticsearch/index/shard/IndexShard.java
mikemccand pushed a commit that referenced this pull request Oct 8, 2015
When shard becomes active again, immediately increase its indexing buffer instead of waiting for up to 30 seconds while indexing with a tiny (500 KB) indexing buffer.
@mikemccand mikemccand merged commit 9688e86 into elastic:master Oct 8, 2015
mikemccand pushed a commit that referenced this pull request Oct 8, 2015
When shard becomes active again, immediately increase its indexing buffer instead of waiting for up to 30 seconds while indexing with a tiny (500 KB) indexing buffer.
Conflicts:
	core/src/main/java/org/elasticsearch/index/IndexServicesProvider.java
	core/src/main/java/org/elasticsearch/index/shard/IndexShard.java
	core/src/main/java/org/elasticsearch/index/shard/ShadowIndexShard.java
	core/src/main/java/org/elasticsearch/indices/memory/IndexingMemoryController.java
	core/src/test/java/org/elasticsearch/index/shard/IndexShardTests.java
	core/src/test/java/org/elasticsearch/indices/memory/IndexingMemoryControllerIT.java
mikemccand pushed a commit that referenced this pull request Oct 8, 2015
When shard becomes active again, immediately increase its indexing buffer instead of waiting for up to 30 seconds while indexing with a tiny (500 KB) indexing buffer.
Conflicts:
	core/src/main/java/org/elasticsearch/index/IndexServicesProvider.java
	core/src/main/java/org/elasticsearch/index/shard/IndexShard.java
	core/src/main/java/org/elasticsearch/index/shard/ShadowIndexShard.java
	core/src/main/java/org/elasticsearch/indices/memory/IndexingMemoryController.java
	core/src/test/java/org/elasticsearch/index/shard/IndexShardTests.java
	core/src/test/java/org/elasticsearch/indices/memory/IndexingMemoryControllerIT.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants