-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate util/groupcommit
package with the coordinator table commit
#1728
Conversation
@@ -8,6 +8,6 @@ public class ConsensusCommitCrossPartitionScanIntegrationTestWithJdbcDatabase | |||
|
|||
@Override | |||
protected Properties getProps(String testName) { | |||
return JdbcEnv.getProperties(testName); | |||
return ConsensusCommitJdbcEnv.getProperties(testName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConsensusCommitJdbcEnv
is introduced to inject GroupCommit related properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, we might have already discussed it, but do we add integration tests for GroupCommit only for JDBC, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, this is a kind of trade-off topic between the cost and completeness, based on the memory of the discussion we had before. The reason why I picked JDBC is the current main target of the group commit feature is YugabyteDB.
But, on second thought, it might be better to add the group commit integration tests with other storages. Maybe scheduled weekly integration test with all the storages is a good trade-off? If it's okay, I'll take care of it as another PR later. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I also think it is better to add the group commit integration tests with other storages. But I'm wondering if we should add them to the regular integration tests. You assume they will impact the integration test time, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In the case of https://github.com/scalar-labs/scalardb/actions/runs/9202687096, the group commit integration test using PostgreSQL which executed only **ConsensusCommit**
cases took 12 minutes, while the full integration test using PostgreSQL without the group commit feature took only 2 minutes. (It was basically because of frequent open/close CoordinatorGroupCommitter
and garbage ongoing transactions that were not explicitly rollbacked, though.)
So, this is a very rough estimate, but it may take 6 times larger computation resources of the GitHub Actions in total. I think the combination of 1) the group commit integration test only using PostgreSQL in the regular workflow, and 2) testing using all the storages with the group commit feature enabled in a weekly or daily workflow. 1 is already implemented in this PR, and 2 will be added in another PR later. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that sounds good. I thought we should run CI for the group commit feature with other storages as much as possible before pushing changes, because we should detect bugs early. However, it seems like this significantly impacts the regular CI time.
Let's add item 2 in another PR. Also, maybe we should add the workflow_dispatch
trigger to the workflow so that we can manually run the CI for the group commit feature with other storages when we make changes related to the group commit feature.
} | ||
|
||
commitState(snapshot); | ||
commitRecords(snapshot); | ||
} | ||
|
||
protected void handleCommitConflict(Snapshot snapshot, Exception cause) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted this from commitState()
so that CommitHandlerWithGroupCommit
can reuse the impl
@@ -17,7 +17,7 @@ class DelayedGroup<PARENT_KEY, CHILD_KEY, FULL_KEY, EMIT_KEY, V> | |||
FULL_KEY fullKey, | |||
Emittable<EMIT_KEY, V> emitter, | |||
KeyManipulator<PARENT_KEY, CHILD_KEY, FULL_KEY, EMIT_KEY> keyManipulator) { | |||
super(emitter, keyManipulator, 1, config.oldGroupAbortTimeoutSeconds()); | |||
super(emitter, keyManipulator, 1, config.oldGroupAbortTimeoutMillis()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, this refactoring isn't directly related to this PR.
@Mock private Coordinator coordinator; | ||
@Mock private TransactionTableMetadataManager tableMetadataManager; | ||
@Mock private ConsensusCommitConfig config; | ||
@Mock protected DistributedStorage storage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is inherited by CommitHandlerWithGroupCommitTest
and these variables are used by it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! I've added some comments and suggestions. PTAL!
core/src/main/java/com/scalar/db/util/groupcommit/GroupManager.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Josh Wong <joshua.wong@scalar-labs.com>
Co-authored-by: Josh Wong <joshua.wong@scalar-labs.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some minor suggestions. Other than that, LGTM. Thank you!🙇🏻♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, but I'll review this again later. Left several comments. Please take a look when you have time!
@@ -8,6 +8,6 @@ public class ConsensusCommitCrossPartitionScanIntegrationTestWithJdbcDatabase | |||
|
|||
@Override | |||
protected Properties getProps(String testName) { | |||
return JdbcEnv.getProperties(testName); | |||
return ConsensusCommitJdbcEnv.getProperties(testName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, we might have already discussed it, but do we add integration tests for GroupCommit only for JDBC, right?
@@ -137,19 +157,34 @@ TwoPhaseCommitTransaction join(String txId, Isolation isolation, SerializableStr | |||
return resume(txId); | |||
} | |||
|
|||
return createNewTransaction(txId, isolation, strategy); | |||
// Participant services don't use the group commit feature even if it's enabled. They simply use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate why participants don't use the group commit feature even if it's enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing!
How GroupCommitter places transactions over multiple groups depends on various timings, including some types of timeouts (e.g., moving delayed transactions to a DelayedGroup) and events (e.g., GroupCommitter.ready(SnapShot)
is called after the prepare
and validate
phases are done). So, it's unlikely that the Coordinator service and all the Participant services will have the same the groups with the same transaction distributions in-memory. So, if the participant services use the group commit feature, they may insert transaction groups different from other nodes, including the Coordinator service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. Thank you for your explanation!
So the way to use the two-phase interface varies depending on whether group commit is enabled or not, which might be confusing for users. One option is to always disable group commit when using the two-phase interface. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the way to use the two-phase interface varies depending on whether group commit is enabled or not, which might be confusing for users.
Yes. Users need to change the commit order in their applications as a part of the limitation, and it may be confusing.
One option is to always disable group commit when using the two-phase interface
You mean to throw an exception or something if TwoPhaseCommitTransactionManager
is used with the group commit feature enabled, right? Indeed, the idea sounds simpler and less confusing. On the other hand, I'm wondering what if a user wants to use the two-phase commit interface on a high-latency underlying storage where the coordinator table is placed 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean to throw an exception or something if TwoPhaseCommitTransactionManager is used with the group commit feature enabled, right?
Yes, I meant that.
On the other hand, I'm wondering what if a user wants to use the two-phase commit interface on a high-latency underlying storage where the coordinator table is placed 🤔
Correct. But, actually, there are also some other optimizations and features we have to give up for the two-phase commit interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That sounds a reasonable concern.
I guess it's related to the product specification. @feeblefakie Do you think we can/should prohibit the group commit feature in the two-phase commit interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brfrn169 I had a chance to talk with @feeblefakie about that and he agreed with you. I'll revise this PR (or create another PR to disable that?).
This PR already contains many changes and comments, and got some approvals. So, I think it would be great if I can disable the combination of the group commit and two-phase commit interface in a separate PR.
Users need to change the commit order in their applications as a part of the limitation
BTW, although this is a kind of memo for me about future development when we'll enable the group commit with two-phase commit interface, we should add a check whether the transaction is committed in the coordinator table, before committing records on participant services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looking good!
Left some comments and suggestions. PTAL!
} | ||
|
||
public Optional<Coordinator.State> getState(String id) throws CoordinatorException { | ||
if (keyManipulator.isFullKey(id)) { | ||
// The following read operation order is important. Reading a coordinator state is likely to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These sentences are a bit hard to understand, especially the if a transaction scanned by another transaction executing lazy recovery is delayed
part.
Let's chat about it and rephase them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I agree with you. I moved these logics for the group commit to getStateForGroupCommit() and revised the comment. What you think?
Of course, chatting on the comment is welcome!
// transaction ID. Therefore, looking up with the full transaction ID should be tried first | ||
// to minimize read operations as much as possible. | ||
|
||
// Scan with the full ID for a delayed group that contains only a single transaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this method is also used in normal cases without group commit, so the wording is probably too specific to group commit. So, we need to describe two cases: one for the normal case without group commit and the other for the group commit case?
Also, regarding the Scan with the full ID
, should it be something like Regard the ID as a full ID and scan with the ID
since we don't know if the given ID is a full ID or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this logic for the group commit to getStateForGroupCommit()
to make things clearer.
regarding the
Scan with the full ID
, should it be something likeRegard the ID as a full ID and scan with the ID
since we don't know if the given ID is a full ID or not.
Ah, whether the ID is a full ID is already checked. But I changed the argument name to a more clear one in the commit.
return state; | ||
} | ||
|
||
// Scan with the parent ID for a normal group that contains multiple transactions with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A parent ID is explicitly extracted from a full ID here. So, we can go with the current comment as is?
.withValue(Attribute.toStateValue(state.getState())) | ||
Put put = new Put(new Key(Attribute.toIdValue(state.getId()))); | ||
if (!state.getChildIds().isEmpty()) { | ||
put.withValue(Attribute.toChildIdsValue(Joiner.on(',').join(state.getChildIds()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"" is hard-coded, so should CHILD_IDS_DELIMITER
be defined at the top level and used here, or should a helper method be defined in the State inner class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Yeah, I overlooked it. State
class already knows how to handle the column including what's the delimiter. So I took care of it in State
class.
private final Random random = new Random(); | ||
|
||
@Override | ||
public String generateParentKey() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random string generation might affect the performance, so it would be great if you could do a simple benchmark with multiple threads.
I previously encountered crazily slow random string generation when using an Apache Commons library, so I used a method taken from the benchbase one instead.
https://github.com/scalar-labs/scalardb-benchmarks/blob/master/src/main/java/com/scalar/db/benchmarks/ycsb/YcsbCommon.java#L109
So, it would be great if you could compare this with the one above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing the concern. That makes sense.
I benchmarked the following combinations with this commit.
a. Simple impl (the original generateParentKey()
impl) + Random
(as same as the current impl)
b. Faster impl (ported from scalardb-benchmark
) + Random
c. Simple impl (the original generateParentKey()
impl) + ThreadLocalRandom
d. Faster impl (ported from scalardb-benchmark
) + ThreadLocalRandom
[komamitsu@pop-os ~]$ lscpu | grep '^CPU(s)'
CPU(s): 16
[komamitsu@pop-os ~]$ uname -a
Linux pop-os 6.6.10-76060610-generic #202401051437~1704728131~22.04~24d69e2 SMP PREEMPT_DYNAMIC Mon J x86_64 x86_64 x86_64 GNU/Linux
What I noticed from the result are:
- The implementation ported from
scalardb-benchmark
is about 10-15% faster than the original simplegenerateParentKey()
impl when usingThreadLocalRandom
- Which to use
Random
orThreadLocalRandom
significantly affects the performance along with the number of threads. The average duration was 0.015 ms in casea
with 16 threads which is the same number of the Linux machine's CPUs. The duration would increase when using a larger number of CPU machine
I think we should use ThreadLocalRandom
just in case. When I benchmarked the group commit feature with ThreadLocalRandom
before, I observed a lot of parent key conflicts at the beginning of benchmarking. I'll come up with a way to mitigate the issue and update this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@feeblefakie BTW, I checked the distribution of the random character values when using the implementation ported from scalardb-benchmark
. It seems skewed compared to the original simple impl. I also observed a similar distribution when directly testing https://github.com/scalar-labs/scalardb-benchmarks/blob/master/src/main/java/com/scalar/db/benchmarks/ycsb/YcsbCommon.java#L109.
I think it's okay to generate just a random value, but the simple one would be better when generating unique identifiers.
- The implementation ported from
scalardb-benchmark
, where the number in the square brackets is the index of the valid char sequence0,1,2,...,A,B,..Z,a,b,..z
and the next number indicates how many the target character is used
[000] 12170
[001] 21654
[002] 13412
[003] 23733
[004] 15577
[005] 26786
[006] 15973
[007] 56956
[008] 14013
[009] 26212
[010] 14705
[011] 107740
[012] 13730
[013] 28530
[014] 19633
[015] 106851
[016] 10670
[017] 29521
[018] 19581
[019] 61491
[020] 17591
[021] 46623
[022] 22265
[023] 48223
[024] 16174
[025] 58574
[026] 17782
[027] 44353
[028] 16499
[029] 69603
[030] 18314
[031] 40725
[032] 8898
[033] 73426
[034] 17933
[035] 47698
[036] 12817
[037] 81226
[038] 18393
[039] 30052
[040] 12386
[041] 64607
[042] 14837
[043] 53717
[044] 15901
[045] 73947
[046] 17029
[047] 30590
[048] 11894
[049] 34586
[050] 22686
[051] 52105
[052] 16454
[053] 43558
[054] 22372
[055] 33789
[056] 15510
[057] 24684
[058] 17335
[059] 28752
[060] 18036
[061] 29118
- The original simple implementation
[000] 39054
[001] 38329
[002] 38803
[003] 39094
[004] 38654
[005] 38724
[006] 38692
[007] 38661
[008] 38667
[009] 38810
[010] 38764
[011] 38945
[012] 38641
[013] 38578
[014] 38603
[015] 38749
[016] 38753
[017] 38709
[018] 38565
[019] 38706
[020] 38555
[021] 38993
[022] 38613
[023] 38477
[024] 38939
[025] 38521
[026] 38813
[027] 38752
[028] 38829
[029] 38728
[030] 38606
[031] 38984
[032] 38803
[033] 38611
[034] 39012
[035] 38577
[036] 38516
[037] 38485
[038] 38580
[039] 38982
[040] 38378
[041] 38650
[042] 38782
[043] 38494
[044] 38518
[045] 38574
[046] 38892
[047] 38815
[048] 38335
[049] 38681
[050] 38510
[051] 38572
[052] 38581
[053] 38473
[054] 38734
[055] 38822
[056] 39052
[057] 39031
[058] 38996
[059] 38902
[060] 38711
[061] 38620
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I benchmarked the group commit feature with ThreadLocalRandom before, I observed a lot of parent key conflicts at the beginning of benchmarking
This was my silly mistake. I stored the return value from ThreadLocalRandom.current()
, but I should've not done that...
Usages of this class should typically be of the form: ThreadLocalRandom.current().nextX(...) (where X is Int, Long, etc). When all usages are of this form, it is never possible to accidently share a ThreadLocalRandom across multiple threads.
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadLocalRandom.html
Fixed it in 755b911
@@ -84,7 +84,10 @@ public void commitState(Snapshot snapshot) | |||
commitStateViaGroupCommit(snapshot); | |||
} | |||
|
|||
private void groupCommitState(String parentId, List<Snapshot> snapshots) | |||
// TODO: Emitter interface should have `emitNormalGroup()` and `emitDelayedGroup()` separately and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found there was room to improve this method and related ones to make which type of ID is passed clearer, so added this TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
to improve the performance
Co-authored-by: Josh Wong <joshua.wong@scalar-labs.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some minor suggestions. Other than that, LGTM. Thank you!🙇🏻♂️
Co-authored-by: Josh Wong <joshua.wong@scalar-labs.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
#1728) Co-authored-by: Josh Wong <joshua.wong@scalar-labs.com>
Description
We've already added
util/groupcommit
package, but not used yet. This PR integrate the group commit with the coordinator table writes.Related issues and/or PRs
util/groupcommit
package incore
subproject #1607Changes made
CoordinatorGroupCommitter
that inheritsGroupCommitter
andCommitHandlerWithGroupCommit
that inheritsCommitHandler
coordinator.state
table asEmitter.emit()
coordinator.state
table recordtx_id
:$parent_id
(a key of the normal group contains multiple transactions)tx_child_ids
:$child_id_1,$child_id_2,...,$child_id_n
(values of the normal group contains multiple transactions)tx_id
:$parent_id:$child_id
(a key and a value of the delayed group contains a single transaction)tx_child_ids
: EmptyConsensusCommitConfig
,ConsensusCommit
,ConsensusCommitManager
,TwoPhaseConsensusCommit
, andTwoPhaseConsensusCommitManager
to useCommitHandlerWithGroupCommit
util/groupcommit
package for fixing some issues occurred in this integrationoldGroupAbortTimeoutSeconds
tooldGroupAbortTimeoutMillis
that can be handled in another PR...ConsensusJdbcEnv
is introduced to inject the group commit related properties.Checklist
Additional notes (optional)
Transaction ID conventions
coordinator.state.tx_id
will be set to this ID when storing transactions in a normal group[0-9A-Za-z]
coordinator.state.tx_child_ids
will contain this type of ID when storing transactions in a normal groupbegin(String txId)
coordinator.state.tx_id
will be set to this ID when storing transactions in a delayed group that contains only a single transaction (coordinator.state.tx_child_ids
will be empty in this case)$ParentID:$ChildID
whose total length is 61 (24 + 36 + 1). So we don't need to extend the maximum length ofcoordinator.state.tx_id
Limitations
ConsensusCommitManager.begin(String txId)
Release notes
Added Group Commit feature for Coordinator Table