Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify bulk request execution #20109

Merged
merged 24 commits into from Oct 12, 2016

Conversation

Projects
None yet
4 participants
@areek
Copy link
Contributor

commented Aug 22, 2016

Currently, bulk item requests can be any ActionRequest, this PR
restricts bulk item requests to DocumentRequest. This simplifies
handling failures during bulk requests. Additionally, a new enum
is added to DocumentRequest to represent the intended operation
to be performed by a document request (create, index, update
and delete), which was previously represented with a mix of strings
and index request operation type.
Now, index request operation type reuses the new enum to specify
whether the request should create or index a document.
Restricting bulk requests to DocumentRequest further simplifies
execution of shard-level bulk operations to use the same failure
handling for index, delete and update operations.
This PR also fixes a bug which executed delete operations twice for
replica copies while executing bulk requests.

relates #19105

Make bulk item-level requests implement DocumentRequest interface
Currently, bulk item requests can be any ActionRequest, this commit
restricts bulk item requests to DocumentRequest. This simplifies
handling failures during bulk requests. Additionally, a new enum
is added to DocumentRequest to represent the intended operation
to be performed by a document request. Now, index operation type
also uses the new enum to specify whether the request should
create or index a document.

@areek areek force-pushed the areek:cleanup/transport_bulk branch Aug 23, 2016

areek added some commits Aug 22, 2016

Simplify shard-level bulk operation execution
This commit refactors execution of shard-level
bulk operations to use the same failure handling
for index, delete and update operations.

@areek areek force-pushed the areek:cleanup/transport_bulk branch to 14908f8 Sep 1, 2016

@clintongormley clintongormley added v5.0.0 and removed v5.0.0-beta1 labels Sep 14, 2016

@areek areek force-pushed the areek:cleanup/transport_bulk branch Oct 3, 2016

@areek areek force-pushed the areek:cleanup/transport_bulk branch to 248ac24 Oct 3, 2016

@areek areek force-pushed the areek:cleanup/transport_bulk branch to 40b4f39 Oct 4, 2016

@bleskes
Copy link
Member

left a comment

I went through this very carefully and it looks great. It's an amazing restructuring and I'm happy you took it on yourself. I left some minor comments. I also think we can potentially simplify further and make the DocumentRequest inherit from WriteReplicationRequest (and call DocWriteRequest) . I'm not sure but it would be great if you can give it a go.

request = new UpdateRequest();
UpdateRequest updateRequest = new UpdateRequest();
updateRequest.readFrom(in);
request = updateRequest;
}

This comment has been minimized.

Copy link
@bleskes

bleskes Oct 5, 2016

Member

add an else and throw an exception?

This comment has been minimized.

Copy link
@areek

areek Oct 6, 2016

Author Contributor

Thanks for the suggestion


OpType(int op) {
this.op = (byte) op;
this.lowercase = this.toString().toLowerCase(Locale.ENGLISH);

This comment has been minimized.

Copy link
@bleskes

bleskes Oct 5, 2016

Member

don't we typically use ROOT for these things?

This comment has been minimized.

Copy link
@areek

areek Oct 6, 2016

Author Contributor

changed it to ROOT, but there are a few places (e.g. DocWriteResponse) where we use ENGLISH too. maybe we should change it to ROOT as well?

@@ -577,15 +574,17 @@ public void writeTo(StreamOutput out) throws IOException {
super.writeTo(out);
waitForActiveShards.writeTo(out);
out.writeVInt(requests.size());
for (ActionRequest<?> request : requests) {
for (DocumentRequest<?> request : requests) {
if (request instanceof IndexRequest) {

This comment has been minimized.

Copy link
@bleskes

bleskes Oct 5, 2016

Member

shall we fold these reads and writes into static methods of DocumentRequest?

This comment has been minimized.

Copy link
@areek

areek Oct 6, 2016

Author Contributor

done

}
String concreteIndex = concreteIndices.getConcreteIndex(request.index()).getName();

This comment has been minimized.

Copy link
@bleskes
setResponse(item, item.getPrimaryResponse());
BulkItemRequest item = request.items()[requestIndex];
DocumentRequest<?> documentRequest = item.request();
if (ExceptionsHelper.status(e) == RestStatus.CONFLICT) {

This comment has been minimized.

Copy link
@bleskes

bleskes Oct 5, 2016

Member

can we stay consistent and use isConflictException? (I know it was like this before)

This comment has been minimized.

Copy link
@areek

areek Oct 6, 2016

Author Contributor

fixed

location = locationToSync(location, result.getLocation());
WriteResult<? extends DocWriteResponse> writeResult = innerExecuteBulkItemRequest(metaData, indexShard,
request, requestIndex);
if (writeResult.getResponse().getResult() != DocWriteResponse.Result.NOOP) {

This comment has been minimized.

Copy link
@bleskes

bleskes Oct 5, 2016

Member

why not check if writeResult.getLocation() != null?

This comment has been minimized.

Copy link
@areek

areek Oct 6, 2016

Author Contributor

I wanted to be more strict as only a noop result can be valid with a null location. changed it to be an assert instead.

@@ -322,7 +328,7 @@ public void readFrom(StreamInput in) throws IOException {
@Override
public void writeTo(StreamOutput out) throws IOException {
out.writeVInt(id);
out.writeString(opType);
out.writeByte(opType.getId());

This comment has been minimized.

Copy link
@bleskes

bleskes Oct 5, 2016

Member

we need bwc here too

This comment has been minimized.

Copy link
@areek

areek Oct 6, 2016

Author Contributor

bwc added

areek added some commits Oct 5, 2016

Make update a replication action
Currently, update action delegates to index and delete actions
for replication using a dedicated transport action. This change
makes update a replication operation, removing the dedicated
transport action. This simplifies bulk execution and removes
duplicate logic for update retries and translation. This
consolidates the interface for single document write requests.

Now on the primary, the update request is translated to
an index or delete request before execution and the translated
request is sent to copies for replication.

@areek areek force-pushed the areek:cleanup/transport_bulk branch to eee0d18 Oct 6, 2016

@areek

This comment has been minimized.

Copy link
Contributor Author

commented Oct 6, 2016

Thanks @bleskes for the review :). I addressed all the minor comments.

I also think we can potentially simplify further and make the DocumentRequest inherit from WriteReplicationRequest (and call DocWriteRequest)

I really like the idea of making DocumentRequest inherit from WriteReplicationRequest. While giving this a go, I refactored update operation to be a replication operation (updates are a DocumentWriteRequest but does't use the replication operation instead delegates to index and delete operations for replication). This cleans up the dedicated transport update action (TransportInstanceSingleOperationAction) in favour of reusing replication action and duplicate code for update retry logic and translation. The one big downside is it would be difficult to implement wire bwc with 5.0. WDYT?

@areek areek force-pushed the areek:cleanup/transport_bulk branch 2 times, most recently Oct 6, 2016

@areek areek force-pushed the areek:cleanup/transport_bulk branch to 42bc2d1 Oct 6, 2016

areek added some commits Oct 7, 2016

@bleskes
Copy link
Member

left a comment

Thanks @areek . Let's leave the merger/inheritance of DocumentRequest and ReplicationRequest alone. None of the solutions seem ideal (update request is not a replication request and the fact it's running on the primary is not a good thing). This PR is great enough and it would be a shame to delay it. We can revisit / evaluate again when have a better idea.

I do have two small asks (next to a question I left in the comments)

return new UpdateResult(translate, indexRequest, retry, cause, null);
} else {
assert translate.getResponseResult() == DocWriteResponse.Result.UPDATED;
update.setGetResult(updateHelper.extractGetResult(updateRequest, updateRequest.concreteIndex(), indexResponse.getVersion(), translate.updatedSourceAsMap(), translate.updateSourceContentType(), indexSourceAsBytes));

This comment has been minimized.

Copy link
@bleskes

bleskes Oct 9, 2016

Member

why do we now return always the get result on updates? it seems to be different before

This comment has been minimized.

Copy link
@areek

areek Oct 12, 2016

Author Contributor

I changed it to match the logic in TransportUpdateAction.shardOperation but that caused CI failures, I reverted the change to be the same as update operation in bulk

@bleskes

This comment has been minimized.

Copy link
Member

commented Oct 9, 2016

It also seems CI isn't happy.

@bleskes bleskes added v6.0.0-alpha1 and removed v5.0.0 labels Oct 9, 2016

@areek

This comment has been minimized.

Copy link
Contributor Author

commented Oct 12, 2016

Thanks @bleskes for the feedback. I updated the PR, addressing your comments and the CI is happy. Could you take a look

@bleskes

This comment has been minimized.

Copy link
Member

commented Oct 12, 2016

LGTM. Awesome @areek

@areek areek merged commit 133be66 into elastic:master Oct 12, 2016

1 of 2 checks passed

elasticsearch-ci Build started sha1 is merged.
Details
CLA Commit author is a member of Elasticsearch
Details

dakrone added a commit to dakrone/elasticsearch that referenced this pull request Jan 23, 2017

Simplify bulk request execution
This is a bespoke backport of elastic#20109 for 5.x:

Currently, bulk item requests can be any ActionRequest, this PR restricts bulk
item requests to DocumentRequest. This simplifies handling failures during bulk
requests. Additionally, a new enum is added to DocumentRequest to represent the
intended operation to be performed by a document request (create, index, update
and delete), which was previously represented with a mix of strings and index
request operation type.

Now, index request operation type reuses the new enum to specify whether the
request should create or index a document. Restricting bulk requests to
DocumentRequest further simplifies execution of shard-level bulk operations to
use the same failure handling for index, delete and update operations. This PR
also fixes a bug which executed delete operations twice for replica copies while
executing bulk requests.

Relates to elastic#19105 and elastic#20109

@lcawl lcawl added :Distributed/CRUD and removed :Bulk labels Feb 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.