Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support implicit pre-read in Consensus Commit #1222

Merged
merged 11 commits into from
Nov 15, 2023

Conversation

brfrn169
Copy link
Collaborator

@brfrn169 brfrn169 commented Oct 23, 2023

Description

This PR adds support for implicit pre-read in Consensus Commit. With implicit pre-read, if a record is in the write set but not in the read set, ScalarDB will automatically read the record and add it to the read set before preparing records. This allows users to perform mutations (Put and Delete operations) without reading the records beforehand.

One scenario where the pre-read isn't necessary is when writing a record that doesn't already exist. In such a case, users originally don't need to read the record before writing it. To address this, we can disable implicit pre-read for Put operations by calling the Put.disableImplicitPreRead() method. When disabling implicit pre-read, ScalarDB will not read the record automatically before preparing it.

Related issues and/or PRs

N/A

Changes made

  • Execute implicit pre-read right before preparing records
  • Also execute implicit pre-read when condition is specified for the Put and Delete operations
  • Added the the disableImplicitPreRead() method to the Put operation to disable implicit pre-read
  • Added a new configuration scalar.db.consensus_commit.parallel_implicit_pre_read.enabled that indicates whether executing implicit pre-read in parallel

Checklist

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes.
  • Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
  • Tests (unit, integration, etc.) have been added for the changes.
  • My changes generate no new warnings.
  • Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

N/A

Release notes

Supported implicit pre-read in Consensus Commit. With implicit pre-read, users can perform mutations (Put and Delete operations) without reading the records beforehand.

Comment on lines 134 to 145
// Fill the read set with records from the write and delete sets if they are unread
try {
crud.fillReadSetForRecordsFromWriteAndDeleteSetsIfUnread();
} catch (CrudConflictException e) {
throw new CommitConflictException(
"Conflict occurred while reading unread records in the write and delete sets",
e,
getId());
} catch (CrudException e) {
throw new CommitException(
"Failed to read unread records in the write and delete sets", e, getId());
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before committing (preparing) a transaction, we will fill the read set with records from the write and delete sets if they are unread.

Snapshot.Key key = new Snapshot.Key(put);

if (put.getCondition().isPresent()) {
fillReadSetIfUnread(key);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will also fill the read set if the mutation has a condition.

Snapshot.Key key = new Snapshot.Key(delete);

if (delete.getCondition().isPresent()) {
fillReadSetIfUnread(key);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. We will also fill the read set if the mutation has a condition.

}
}

parallelExecutor.fillReadSetForRecordsFromWriteAndDeleteSetsIfUnread(tasks, snapshot.getId());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ParallelExecutor to fill read set in parallel for better performance.

| `scalar.db.consensus_commit.parallel_rollback.enabled` | Whether or not the rollback phase is executed in parallel. | The value of `scalar.db.consensus_commit.parallel_commit.enabled` |
| `scalar.db.consensus_commit.async_commit.enabled` | Whether or not the commit phase is executed asynchronously. | `false` |
| `scalar.db.consensus_commit.async_rollback.enabled` | Whether or not the rollback phase is executed asynchronously. | The value of `scalar.db.consensus_commit.async_commit.enabled` |
| `scalar.db.consensus_commit.parallel_fill_read_set.enabled` | Whether or not to fill the read set with records from the write and delete sets if they are unread. | `true` |
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new configuration scalar.db.consensus_commit.parallel_fill_read_set.enabled.

Comment on lines 136 to 147
// Fill the read set with records from the write and delete sets if they are unread
try {
crud.fillReadSetForRecordsFromWriteAndDeleteSetsIfUnread();
} catch (CrudConflictException e) {
throw new PreparationConflictException(
"Conflict occurred while reading unread records in the write and delete sets",
e,
getId());
} catch (CrudException e) {
throw new PreparationException(
"Failed to read unread records in the write and delete sets", e, getId());
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before preparing a transaction, we will fill the read set with records from the write and delete sets if they are unread.

@brfrn169 brfrn169 force-pushed the support-blind-writes-for-consensus-commit branch from e51bc13 to 077335e Compare October 23, 2023 15:46
@komamitsu
Copy link
Contributor

@brfrn169 It seems you removed the review requests. But you may want us to review the PR?

@brfrn169
Copy link
Collaborator Author

brfrn169 commented Oct 24, 2023

@brfrn169 It seems you removed the review requests. But you may want us to review the PR?

@komamitsu Thank you for the reminder! 🙇 Yeah, I found some mistakes in the PR and removed the review requests. I fixed the mistakes and requested review again. Pleas take a look when you have time!

}
}

parallelExecutor.fillReadSetForRecordsFromWriteAndDeleteSetsIfUnread(tasks, snapshot.getId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshot is annotated with @NotThreadSafe, so calling Snapshot.put() in concurrent might cause an issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, it won't cause an issue because the read set is a concurrent map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. The inconsistency of class-level non-thread-safety and method-level thread-safety is a little bit confusing, but I don't come up with a good straightforward solution...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inconsistency of class-level non-thread-safety and method-level thread-safety is a little bit confusing

That's true. For now, I left some comments for the methods. Let's refactoring it if we come up with a good solution.


`Delete` is an operation to delete a record specified by a primary key.
If you're certain that a record you're trying to write doesn't exist, you can use a blind `Put` operation for better performance. A blind `Put` is faster than a regular `Put` as it doesn't read the record beforehand. However, if the record does exist, the operation will fail due to a conflict.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the "blind write" feature in this PR is basically implicitly filling read-set if needed so that users don't need to manually read the target record in advance. But com.scalar.db.api.PutBuilder.Buildable#blind() is to skip the implicit read for insert operations. So it sounds a bit confusing to me.

com.scalar.db.api.PutBuilder.Buildable#create(), insert(), skipImplicitRead() or something might be better instead of com.scalar.db.api.PutBuilder.Buildable#blind()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. In this context, a blind write means preparing a record without having an existing record (in the read set). So when you set the blind option for the Put operation, we skip the implicit read. What are your thoughts on this?

Copy link
Contributor

@komamitsu komamitsu Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good if considering only com.scalar.db.api.PutBuilder.Buildable#blind()

When I read the description of this PR, I thought "blind writes" meant implicit filling read-set. So, I still feel some inconsistency between the description and com.scalar.db.api.PutBuilder.Buildable#blind().

This PR adds support for blind writes in Consensus Commit. Specifically, if a record is in the write set but not in the read set, ScalarDB will read the record and add it to the read set before preparing the records internally.

Copy link
Collaborator Author

@brfrn169 brfrn169 Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. The term blind has two meanings in this context.

With this change, users can execute Put and Delete operations without pre-reading the target records, which is called blind writes. Also, we can set the blind option for a Put operation to skip the automatic read set filling. I think we can call the former user-perspective blind writes and the latter transaction-manager-perspective blind writes.

Actually, the user-perspective blind writes aren't really blind writes; they're more like virtual blind writes, since the transaction manager actually reads the target records on behalf of the user. Conversely, the transaction-manager-perspective blind writes are the actual blind writes, since the transaction manager writes the records without reading the target records.

Maybe we don't need to modify the option name blind. And should I update the PR description instead? What are your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation. I think we are on the same page.

I also agree to the ideas of user-perspective blind writes and transaction-manager-perspective blind writes.

Maybe we don't need to modify the option name blind. And should I update the PR description instead?

Maybe it's related to how we want to appeal this feature to users. If we publicly call user-perspective blind writes "blind write", the blind() option might confuse users. Otherwise, the description and release note (With this enhancement, users can execute blind writes without the need to read records prior to writing them.) section should be updated to clearly describe "blind write" means transaction-manager-perspective blind writes (plus "We don't need in-advance reads anymore when you write a fixed value!"?).

I want to hear opinion from @feeblefakie as a product owner as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the description (and the release notes) slightly. Maybe we should also update the PR title but I didn't for now. Can you please check the description and let me know if anything still seems confusing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me!

Snapshot.Key key = new Snapshot.Key(delete);

if (delete.getCondition().isPresent()) {
fillReadSetIfUnread(key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: Why only if a condition is set?
This might be a silly idea or I might be missing something, but how about always calling fillReadSetIfUnread() in delete() and put() even if no condition is set so that a commit doesn't need to call fillReadSetForRecordsFromWriteAndDeleteSetsIfUnread()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, we can fill the read set anytime before preparing records. So, as you suggested, we could always call fillReadSetIfUnread() in delete() and put(). The benefit of calling fillReadSetForRecordsFromWriteAndDeleteSetsIfUnread() before preparing records is that we can fill the read set in parallel for better performance. So we basically fill the read set right before preparing records, but for conditional updates, we need to fill the read set at that time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Sounds reasonable 👍

true,
"fillReadSetForRecordsFromWriteAndDeleteSetsIfUnread",
transactionId);
} catch (ExecutionException | ValidationConflictException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] The order of exception classes is opposite to the above existing methods

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me fix this. Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the order is consistently ExecutionException, ValidationConflictException, and CrudException. So I don't think we need to fix it. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I confused ExecutionException with CrudException. You're right. Please ignore my comment above.

@brfrn169
Copy link
Collaborator Author

brfrn169 commented Nov 2, 2023

@komamitsu I've discussed with @feeblefakie offline, and we've decided to name this feature implicit pre-read. Accordingly, I've updated the method names and the configuration name, and also revised the PR title and description, and the documentation. Can you please take a look at this PR once again? Sorry for the inconvenience.

BTW, we've also decided not to use the term blind writes because it's confusing for users.

Copy link
Contributor

@kota2and3kan kota2and3kan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!


`Delete` is an operation to delete a record specified by a primary key.
If you're certain that a record you're trying to write doesn't exist, you can disable implicit pre-read for the `Put` operation for better performance. A `Put` without implicit pre-read is faster than a regular `Put` as it skips the implicit pre-read. However, if the record does exist, the operation will fail due to a conflict.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're certain that a record you're trying to write doesn't exist, you can disable implicit pre-read for the Put operation for better performance.

This describes the implicit pre-read feature can be disabled. But there is no explanation in advance about what "implicit pre-read" is. How about adding a short description about "implicit pre-read" mechanism? It may be helpful for those who are interest in ScalarDB performance, but don't know implicit pre-read happens by default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing that out. That's right. I've added a short description about "implicit pre-read" in a7c0071. Please take a look! Thanks.

@komamitsu komamitsu self-requested a review November 2, 2023 06:59
Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the naming change!
Left additional comments. PTAL!

@@ -281,6 +281,23 @@ interface ClearValues<T> {
T clearValue(String columnName);
}

interface ImplicitRreReadEnabled<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
interface ImplicitRreReadEnabled<T> {
interface ImplicitPreReadEnabled<T> {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops... Let me fix it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a5bdbe7. Thanks.

* @param implicitRreReadEnabled whether implicit pre-read is enabled or not
* @return the operation builder
*/
T implicitRreReadEnabled(boolean implicitRreReadEnabled);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
T implicitRreReadEnabled(boolean implicitRreReadEnabled);
T implicitPreReadEnabled(boolean implicitPreReadEnabled);

I'm also wondering if this should be enableImplicitPreRead to make it consistent with the other method.
(In that case, the interface should probably be ImplicitPreRead.)
What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the implicitPreReadEnabled(boolean implicitPreReadEnabled) method because it can be useful in some cases.

Let's see the following example:

boolean implicitPreRead = ...;

Buildable buildable =
    Put.newBuilder()
        .namespace("ns")
        .table("tbl")
        .partitionKey(partitionKey)
        .clusteringKey(clusteringKey)
        .floatValue("c4", 1.23F)
        .doubleValue("c5", 4.56);

if (!implicitPreRead) {
  buildable.disableImplicitPreRead();
}

Put put = buildable.build();

With the implicitPreReadEnabled(boolean implicitPreReadEnabled) method, the above code can be rewritten as follows:

boolean implicitPreRead = ...;

Put put =
    Put.newBuilder()
        .namespace("ns")
        .table("tbl")
        .partitionKey(partitionKey)
        .clusteringKey(clusteringKey)
        .floatValue("c4", 1.23F)
        .doubleValue("c5", 4.56)
        .implicitPreReadEnabled(implicitPreRead)
        .build();

So I think we need a method that takes a boolean parameter that toggles the implicit pre-read feature. However, we can discuss the method name (implicitPreReadEnabled) further if needed. Thank you.

}

@Override
public Buildable implicitRreReadEnabled(boolean implicitRreReadEnabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public Buildable implicitRreReadEnabled(boolean implicitRreReadEnabled) {
public Buildable implicitPreReadEnabled(boolean implicitPreReadEnabled) {

snapshot.put(key, delete);
}

public void executeImplicitPreReadIfEnabled() throws CrudException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void executeImplicitPreReadIfEnabled() throws CrudException {
public void readIfImplicitPreReadEnabled() throws CrudException {

The above might be more consistent with other methods like readUnread.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 460a579. Thanks.

docs/api-guide.md Outdated Show resolved Hide resolved
docs/api-guide.md Outdated Show resolved Hide resolved
docs/storage-abstraction.md Outdated Show resolved Hide resolved
brfrn169 and others added 3 commits November 3, 2023 23:11
Co-authored-by: Hiroyuki Yamada <mogwaing@gmail.com>
Copy link
Contributor

@komamitsu komamitsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@brfrn169
Copy link
Collaborator Author

brfrn169 commented Nov 6, 2023

@josh-wong Can you please take a look at the documentation part of this PR?

Copy link
Member

@josh-wong josh-wong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay! I've added a few comments and suggestions. PTAL!🙇🏻‍♂️

docs/api-guide.md Outdated Show resolved Hide resolved
docs/api-guide.md Outdated Show resolved Hide resolved
docs/api-guide.md Show resolved Hide resolved
brfrn169 and others added 2 commits November 9, 2023 14:24
Co-authored-by: Josh Wong <joshua.wong@scalar-labs.com>
Copy link
Member

@josh-wong josh-wong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!🙇‍♂️

@brfrn169 brfrn169 merged commit 0881450 into master Nov 15, 2023
23 checks passed
@brfrn169 brfrn169 deleted the support-blind-writes-for-consensus-commit branch November 15, 2023 07:30
@brfrn169
Copy link
Collaborator Author

@Torch3333 It's been merged, but please take a look when you come back!

@brfrn169
Copy link
Collaborator Author

@Torch3333 It's merged but please take a look when you are back!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants