Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make peer recovery send file chunks async #44040

Merged
merged 31 commits into from
Jul 16, 2019
Merged

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Jul 6, 2019

Relates #36981
Relates #36195
Supersedes #39769

@dnhatn dnhatn added >enhancement :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.0.0 v7.4.0 labels Jul 6, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@original-brownbear
Copy link
Member

Jenkins run elasticsearch-ci/2
Jenkins run elasticsearch-ci/docbldesx

(both unrelated failures)

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some question/suggestion type comments, in general this looks very nice I think :)

@dnhatn
Copy link
Member Author

dnhatn commented Jul 8, 2019

@original-brownbear Thank you for reviewing. I have addressed your comments. Would you mind taking another look?

@dnhatn
Copy link
Member Author

dnhatn commented Jul 8, 2019

Please hold off the review. There's a dead-lock issue after I pushed 5c1977a.

@dnhatn
Copy link
Member Author

dnhatn commented Jul 8, 2019

This is ready again.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this @dnhatn , I have left a few initial comments to consider.

@dnhatn
Copy link
Member Author

dnhatn commented Jul 9, 2019

@henningandersen I've pushed e8dfd75 to make MultiFileSender an AsyncIOProcessor. Can you please take another look? Thank you!

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general :)
One suggestion (that might not be so trivial so feel free to ignore for now) and +1 to Yannick's comment on the exception ignoring . We should def. not do that imo.

}
listener.onResponse(null);
} catch (Exception ignored) {
// we can safely ignore this exception as it happens after we have released the resource and notified the caller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I don't think we should have this. If we run into a Exception in the listener.onResponse we should fix the listener to handle that exception? Otherwise we're adding another situation where a listener behaves unexpectedly and we won't even see a log line for it?

}

private void addItem(long requestSeqId, StoreFileMetaData md, Exception failure) {
processor.put(new FileChunkResponseItem(requestSeqId, md, failure), e -> { assert e == null : e; });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should somehow assert that this never blocks? (since AsyncIOProcessor might be using blocking ArrayBlockingQueue#put on the item). It seems to me logically that's not an option with the current code, but maybe something we should make sure of to avoid (admittedly super unlikely) deadlocks in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation is perfectly fine as we control the number of possible items in the queue. We can assert the remaining capacity before putting an item but this option subjects to a race condition. Another option is to use "offer" instead of "put" in AsyncIOProcessor if the assertion is enabled and blocking is not allowed. I will think about it more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I didn't have a good idea on how to do this now either. Just figured I'd raise it :) Either way, once we start using this in tests that use DeterministicTaskQueue we'll probably be guarded against deadlocks/blocking anyway :)

@dnhatn
Copy link
Member Author

dnhatn commented Jul 15, 2019

@ywelsch @original-brownbear Can you take another look? Thank you

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @dnhatn !

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member Author

dnhatn commented Jul 16, 2019

Thanks everyone :)

@dnhatn dnhatn merged commit 30b7545 into elastic:master Jul 16, 2019
@dnhatn dnhatn deleted the send-chunks branch July 16, 2019 13:52
dnhatn added a commit that referenced this pull request Jul 16, 2019
dnhatn added a commit that referenced this pull request Jul 16, 2019
@dnhatn
Copy link
Member Author

dnhatn commented Jul 16, 2019

I have reverted this change on master and 7.x as the new assertion was tripped.

dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jul 17, 2019
dnhatn added a commit that referenced this pull request Jul 17, 2019
dnhatn added a commit that referenced this pull request Jul 18, 2019
@pcsanwald pcsanwald added v8.0.0 and removed v8.0.0 labels Jul 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants