Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace hard-deletes by soft-deletes to maintain document history #29549

Merged
merged 10 commits into from Apr 20, 2018

Conversation

Projects
None yet
4 participants
@dnhatn
Copy link
Contributor

commented Apr 17, 2018

Today we can use the soft-deletes feature from Lucene to maintain a
history of a document. This change simply replaces hard-deletes by
soft-deletes in Engine.

Besides marking a document as deleted, we also index a tombstone
associated with that delete operation. Storing delete tombstones allows
us to have a history of sequence-based operations which can serve in
recovery or rollback.

Relates #29530

Replace hard-deletes by soft-deletes to maintain document history
Today we can use the soft-deletes feature from Lucene to maintain a
history of a document. This change simply replaces hard-deletes by
soft-deletes in Engine.

Besides marking a document as deleted, we also index a tombstone
associated with that delete operation. Storing delete tombstones allows
us to have a history of sequence-based operations which can serve in
recovery or rollback.
@elasticmachine

This comment has been minimized.

Copy link
Collaborator

commented Apr 17, 2018

@dnhatn

This comment has been minimized.

Copy link
Contributor Author

commented Apr 17, 2018

@bleskes
Copy link
Member

left a comment

Thanks Nhat. I left some comments.

@@ -80,6 +81,7 @@
private final CircuitBreakerService circuitBreakerService;
private final LongSupplier globalCheckpointSupplier;
private final LongSupplier primaryTermSupplier;
private final MetaDocSupplier metaDocSupplier;

This comment has been minimized.

Copy link
@bleskes

bleskes Apr 17, 2018

Member

is TombstoneDoc a better name?

This comment has been minimized.

Copy link
@dnhatn

dnhatn Apr 17, 2018

Author Contributor

Yes, it's better - more explicit.


private ParseContext.Document newMetaDoc(String type, String id, long seqno, long primaryTerm, long version) {
final SourceToParse source = SourceToParse.source(shardId.getIndexName(), type, id, new BytesArray("{}"), XContentType.JSON);
final ParsedDocument parsedDocument = docMapper(type).getDocumentMapper().parse(source);

This comment has been minimized.

Copy link
@bleskes

bleskes Apr 17, 2018

Member

Instead of creating everything and removing what we don't need, can we maybe fold this into the document mapper and add a createTombstoneDoc method to it that only does what it needs to create the right fields (probably only calls the preParse / postParse methods in the right fields (similar to here)?

private ParseContext.Document newMetaDoc(String type, String id, long seqno, long primaryTerm, long version) {
final SourceToParse source = SourceToParse.source(shardId.getIndexName(), type, id, new BytesArray("{}"), XContentType.JSON);
final ParsedDocument parsedDocument = docMapper(type).getDocumentMapper().parse(source);
parsedDocument.updateSeqID(seqno, primaryTerm);

This comment has been minimized.

Copy link
@bleskes

bleskes Apr 17, 2018

Member

I prefer to be consistent with how the engine set these during indexing - i.e., during the addition to lucene.

@@ -2962,13 +2979,13 @@ public void testSegmentMemoryTrackedWithRandomSearchers() throws Exception {
for (Thread t : threads) {
t.join();
}
// Close remaining searchers
IOUtils.close(searchers);

This comment has been minimized.

Copy link
@bleskes

bleskes Apr 17, 2018

Member

Why was this move earlier? I think I miss something I don't know - can you explain?

This comment has been minimized.

Copy link
@dnhatn

dnhatn Apr 17, 2018

Author Contributor

It's a left-over.

dnhatn added some commits Apr 17, 2018

@dnhatn

This comment has been minimized.

Copy link
Contributor Author

commented Apr 18, 2018

@bleskes and @simonw This is ready for review. Can you please take a look. Thank you!

@dnhatn

This comment has been minimized.

Copy link
Contributor Author

commented Apr 18, 2018

please run all tests.

@dnhatn

This comment has been minimized.

Copy link
Contributor Author

commented Apr 18, 2018

@elasticmachine please test this

@dnhatn

This comment has been minimized.

Copy link
Contributor Author

commented Apr 18, 2018

run sample packaging tests.

@s1monw

s1monw approved these changes Apr 19, 2018

Copy link
Contributor

left a comment

LGTM I left some questions none of them are blockers

@@ -363,4 +367,13 @@ public CircuitBreakerService getCircuitBreakerService() {
public LongSupplier getPrimaryTermSupplier() {
return primaryTermSupplier;
}

@FunctionalInterface
public interface TombstoneDocSupplier {

This comment has been minimized.

Copy link
@s1monw

s1monw Apr 19, 2018

Contributor

maybe add some javadocs here?

}

public ParsedDocument createTombstoneDoc(String index, String type, String id) throws MapperParsingException {
final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON);

This comment has been minimized.

Copy link
@s1monw

s1monw Apr 19, 2018

Contributor

do we need some identifier that this doc is a tombstone? I am not sure we do at this point but down the road we would no?

This comment has been minimized.

Copy link
@bleskes

bleskes Apr 19, 2018

Member

+1 . We don't plan to keep many of these around so the overhead is minimal and we'd have the ability to search for them and debug. The question is how to do this. I guess the easiest would be to add a boolean metadata field to the mapping, but that feels like an overkill. In any case, I think this can be a follow up.

This comment has been minimized.

Copy link
@dnhatn

dnhatn Apr 19, 2018

Author Contributor

Yes, I will think about it and make it in a follow-up.

@bleskes
Copy link
Member

left a comment

This LGTM, but I think we want a unit test on the engine level for this? What am I missing?

@dnhatn

This comment has been minimized.

Copy link
Contributor Author

commented Apr 19, 2018

This LGTM, but I think we want a unit test on the engine level for this? What am I missing?

We planned to add real tests in the next PR when indexing stale deletes and no-ops. This PR is just a cut-over. I am fine to add a simple test here.

@bleskes

This comment has been minimized.

Copy link
Member

commented Apr 19, 2018

We planned to add real tests in the next PR when indexing stale deletes and no-ops. This PR is just a cut-over. I am fine to add a simple test here.

Cool. Thanks for explaining.

@dnhatn

This comment has been minimized.

Copy link
Contributor Author

commented Apr 20, 2018

Thanks @bleskes and @simonw for reviewing.

@dnhatn dnhatn merged commit ac84879 into elastic:ccr Apr 20, 2018

3 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci Build finished.
Details
elasticsearch-ci/packaging-sample Build finished.
Details

@dnhatn dnhatn deleted the dnhatn:soft-deletes branch Apr 20, 2018

@dnhatn dnhatn referenced this pull request Apr 23, 2018

Closed

Use soft-deletes to maintain document history #29530

14 of 14 tasks complete

dnhatn added a commit that referenced this pull request May 10, 2018

Use soft deletes to maintain doc history (#29549)
Today we can use the soft-deletes feature from Lucene to maintain a
history of a document. This change simply replaces hard-deletes by
soft-deletes in Engine.

Besides marking a document as deleted, we also index a tombstone
associated with that delete operation. Storing delete tombstones allows
us to have a history of sequence-based operations which can serve in
recovery or rollback.

Relates #29530
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.