Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add relational scan for JDBC storage and consensus commit transaction #900

Merged
merged 10 commits into from
Jun 20, 2023

Conversation

jnmt
Copy link
Contributor

@jnmt jnmt commented Jun 7, 2023

This PR adds the relational scan feature for JDBC storage. For transactions, it only supports the consensus-commit transaction. JDBC and gRPC transactions will be introduced by other PRs.

@jnmt jnmt added the enhancement New feature or request label Jun 7, 2023
@jnmt jnmt self-assigned this Jun 7, 2023
Comment on lines +95 to +96
if (ScalarDbUtils.isRelational(scan)) {
selectQuery = buildSelectQueryForRelationalScan(scanAll, tableMetadata);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the major changes is starting from here. Since we decide to use ScanAll for relational scan, we do not have to change the interface of DistributedStorage, but just have to change query building for conditions and orderings.

Comment on lines 78 to 84
if (ScalarDbUtils.isRelational(scan)) {
// We verify if this scan does not overlap previous writes using the results obtained from
// the actual scan since the condition (i.e., where clause) is arbitrary in the relational
// scan, and thus, the write command may not have columns used in the condition, which are
// necessary to determine overlaps.
snapshot.verify(scan);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other major change is starting from here. This prevents scan-after-write with a bit different approach (i.e., use actual scan results) in existing scans since we need to handle the case that the user's Put does not have columns required to check if they match the Scan conditions.

Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left several comments. Please take a look when you have time!

@@ -63,6 +64,21 @@ public Cassandra(DatabaseConfig config) {
operationChecker = new OperationChecker(metadataManager);
}

Cassandra(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's only for testing. So we can add @VisibleForTesting:

Suggested change
Cassandra(
@VisibleForTesting
Cassandra(

We can do the same thing for other storage implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion. Fixed in 31d2d2b.

Comment on lines 184 to 188
if (!ScalarDbUtils.isRelational(scan) && isWriteSetOverlappedWith(scan)) {
// Verify overlaps only for non-relational scan since we do it later for relational scan
// (right before returning the results).
throw new IllegalArgumentException("reading already written data is not allowed");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move this overlap check logic into the newly added verify() method for consistency and readability. This would execute this check logic after the actual read, but I don't think this is a problem. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me confirm a bit. Do you mean that we can do the overlap check after the actual scan for both cases, regardless of whether it is relational or not? If yes, the answer is yes; we can do it without losing correctness. My only concern was unnecessary additional latency, but I think the readability would have more benefits here if we assume skilled users usually know it is prohibited and can avoid it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that we can do the overlap check after the actual scan for both cases, regardless of whether it is relational or not?

Yes, correct.

My only concern was unnecessary additional latency, but I think the readability would have more benefits here if we assume skilled users usually know it is prohibited and can avoid it.

The IllegalArgumentException issue is basically a bug, so maybe I don't think we need to worry about the additional latency. So in this case, I think we can prioritize the readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will revise it. Thanks for the clarification and advice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 31d2d2b.

scan instanceof ScanAll
? buildSelectQueryForScanAll((ScanAll) scan, tableMetadata)
: buildSelectQueryForScan(scan, tableMetadata);
SelectQuery selectQuery;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need the same change for the scan() method to support the relational scan for JDBC transactions. Do you plan to do this in another PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will add it in another PR, including tests for it.

sql += " WHERE " + conditionSqlString();
if (isRelationalQuery) {
// for relational abstraction
sql += relationalConditionSqlString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using StringBuilder instead to avoid instantiating temporary objects? Using StringBuilder is basically better in terms of performance and memory footprint https://dev.to/this-is-learning/performance-benchmarking-string-and-string-builder-3bid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1490ed6.

return "";
}

List<String> conjunctionList = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] This can be replaced with the following code

    List<String> conjunctionList =
        conjunctions.stream().map(conjunction ->
            conjunction.getConditions().stream().map(condition ->
                rdbEngine.enclose(condition.getColumn().getName())
                    + convert(condition.getOperator())
            ).collect(Collectors.joining(" AND "))
        ).collect(Collectors.toList());

I'm not sure which is better from the view points of readability and performance, though...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thank you. I don't think your idea loose readability very much. So, let's do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1490ed6.

case IS_NOT_NULL:
return " IS NOT NULL";
default:
throw new IllegalArgumentException("unknown operator");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about showing the unexpected operator in the error message? I don't think it would be useful for now but for the future when we add another operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1490ed6.

@@ -127,6 +186,17 @@ private String orderBySqlString() {
.collect(Collectors.joining(","));
}

private String relationalOrderBySqlString() {
if (orderings.size() == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] I think if (orderings.isEmpty()) would be slightly clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1490ed6.

continue;
}

if (scan.getConjunctions().size() == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Same as above (isEmpty()) would be slightly clearer)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1490ed6.

case LTE:
return column.compareTo((Column<T>) condition.getColumn()) <= 0;
default:
throw new IllegalArgumentException("unknown operator");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if the unexpected operator is included in this error message (for future developments)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1490ed6.

return true;
}

// the following check prevents scan-after-write even if the update record potentially matches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate this? It would be great if you give me an example problematic case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to consider three cases here to prevent scan-after-write.

  1. A put operation overlaps the scan range regardless of the update (put) results.
  2. A put operation does not overlap the scan range as a result of the update.
  3. A put operation overlaps the scan range as a result of the update.

We can check cases 1 and 2 with L. 282-284. (Note that case 2 does not have the overlap, but we intentionally prohibit it due to the consistency and simplicity of snapshot management.) For case 3, we need to evaluate the operator to check for overlap since the put operation is only in the read-write set and not committed to the database. Let's see the following examples.

-- Assume the following table.
-- tbl (key int, val int)
-- 1 | 2 
-- 2 | 3

-- Case 2
update tbl set val = 4 where key = 1;
select * from tbl where val < 3;
-- We can find the overlap using the scan results.

-- Case 3
update tbl set val = 2 where key = 2;
select * from tbl where val < 3;
-- We cannot find the overlap since the database is not updated yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the explanation!!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnmt This is just a suggestion and doesn't prevent this PR from being merged. The above explanation was really helpful for me to understand why we need to consider updated values. How about putting it as a comment so that new developers and 2y later me who has a bad memory can maintain the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess I will also forget it. It's a bit too much, but I added the detailed comments in 4b2fbe5.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Copy link
Contributor

@Torch3333 Torch3333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good thanks, I just have a minor comment.

Copy link
Contributor

@komamitsu komamitsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

@jnmt
Copy link
Contributor Author

jnmt commented Jun 14, 2023

@brfrn169 @komamitsu @Torch3333 Thank you for your review. I fixed some issues based on your comment. PTAL when you have time.

@jnmt jnmt requested review from brfrn169 and Torch3333 June 14, 2023 05:21
@jnmt
Copy link
Contributor Author

jnmt commented Jun 15, 2023

SQLite integration tests sometimes failed because I didn't add ordering, and there was no guarantee to preserve the expected order. I fixed it in 7163241 and b9ea15e. Also, I noticed there is a bug in the builder part. I will fix it in another PR.

@jnmt jnmt requested a review from Torch3333 June 15, 2023 03:43
Copy link
Contributor

@Torch3333 Torch3333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment. Other than that, LGTM! Thank you!

Comment on lines 184 to 188
if (!ScalarDbUtils.isRelational(scan) && isWriteSetOverlappedWith(scan)) {
// Verify overlaps only for non-relational scan since we do it later for relational scan
// (right before returning the results).
throw new IllegalArgumentException("reading already written data is not allowed");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's moved to verify(), we can delete this, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for that. I removed it (and fix some related tests) in 2bdbae4.

Comment on lines 191 to 192
if ((ScalarDbUtils.isRelational(scan) && isWriteSetOverlappedWithRelational(scan))
|| (!ScalarDbUtils.isRelational(scan) && isWriteSetOverlappedWith(scan))) {
Copy link
Collaborator

@brfrn169 brfrn169 Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one more point. To avoid duplicate calculations, I think we should do the following:

Suggested change
if ((ScalarDbUtils.isRelational(scan) && isWriteSetOverlappedWithRelational(scan))
|| (!ScalarDbUtils.isRelational(scan) && isWriteSetOverlappedWith(scan))) {
boolean isRelational = ScalarDbUtils.isRelational(scan);
if ((isRelational && isWriteSetOverlappedWithRelational(scan))
|| (!isRelational && isWriteSetOverlappedWith(scan))) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I will fix it. Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brfrn169 Fixed in 0c90275. PTAL if you have time.

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very sorry for the late review.
I left several questions for clarification. PTAL!

.collect(Collectors.joining(" AND ")))
.collect(Collectors.toList());

return " WHERE " + String.join(" OR ", conjunctionList);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we create the where clause with CNF, but it seems to be creating it with DNF.
Is it OK?
I might miss something, so let me confirm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using DNF for the internal representation and building queries. So, it should be fine here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarification, I added some comments for the Scan class in 19f4ada.

List<Result> results = scanInternal(scan);

// We verify if this scan does not overlap previous writes after the actual scan. For a
// relational scan, this must be done here, using the obtained results. This is because the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It says the verify uses the obtained results, but does it?
I don't see the code that uses it.
Or you meant scanSet is updated by scanInternal and is used in the check logic (isWriteSetOverlappedWithRelational)?
I might miss something, but let me clarify this just in case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you meant scanSet is updated by scanInternal and is used in the check logic (isWriteSetOverlappedWithRelational)?

Right, we use the obtained results put in the scanSet by scanInternal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"using the obtained results" was a bit confusing since the check only uses keys in the scan set (not returned results above). I changed the comment a little in 19f4ada.

@jnmt jnmt requested a review from feeblefakie June 19, 2023 03:37
Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@feeblefakie feeblefakie merged commit eca9078 into master Jun 20, 2023
12 checks passed
@feeblefakie feeblefakie deleted the add-relational-scan-for-consensus-commit-tx branch June 20, 2023 05:24
@jnmt jnmt mentioned this pull request Nov 16, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants