Skip to content

Conversation

@peterbroadhurst
Copy link
Contributor

@peterbroadhurst peterbroadhurst commented Feb 9, 2022

This leads on from the investigation in #421 and then #499

At the moment this is how the batching logic works:

OLD

batch_logic

  • The assembly loop of each batch dispatcher is able to keep accepting work as long as the last batch has been successfully dispatched.
  • The assembly loop passes work to the persistence loop, which has as many slots in its channel input as there are slots in the batch.
  • The persistence loop keeps continually recording data to the database.
  • The offset on the batch manager that's reading from the messages queue, is updated as soon as the assembly loop picks up the message

This design evolved from a similar design in the previous generation of Asset Trail technology.

Given this most recent issue, and some inefficiency and crash recovery challenges that exist in the design after it's port over to FireFly and evolution, I'm planning an updated design.

NEW

This PR updates it to simplify it (even beyond the previous proposal in #421):

batching_v2

  • We remove two of the loops
  • Persistence only happens once for each batch in the final stage when the batch is sealed
  • Dispatching is synchronous to the assembler
  • The assembler only allows one batch to be in assembly before it blocks waiting for the dispatch completion
  • The batch manager just has in-memory offset state, and on restart it looks from time-zero again
  • We introduce a new sent state for messages that go into a batch
  • Anything that doesn't reach that state is eligible for re-send after restart (at least once delivery)
  • The manager can rewind at any time, because everything will either:
    • Be persisted with a batch ID so skipped
    • Be already in-flight
  • The processor keeps track of sequences its flushing, and does duplicate detection
  • The processor orders the assembly batch in sequence order

Note this means if things are appearing at once across a batch assembly window of 0.5s, it's very likely all messages will be sent in DB sequence order (even though the DB doesn't guarantee that's the order they'll become visible in).

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
@awrichar
Copy link
Contributor

awrichar commented Feb 9, 2022

Explanation seems sound. I'll wait for more of the code to take shape before reviewing though.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
@codecov-commenter
Copy link

codecov-commenter commented Feb 10, 2022

Codecov Report

Merging #501 (42335d8) into main (fce32d6) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main      #501   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          275       275           
  Lines        15694     15731   +37     
=========================================
+ Hits         15694     15731   +37     
Impacted Files Coverage Δ
internal/database/sqlcommon/provider.go 100.00% <ø> (ø)
...nternal/apiserver/route_get_status_batchmanager.go 100.00% <100.00%> (ø)
internal/batch/batch_manager.go 100.00% <100.00%> (ø)
internal/batch/batch_processor.go 100.00% <100.00%> (ø)
internal/broadcast/manager.go 100.00% <100.00%> (ø)
internal/broadcast/message.go 100.00% <100.00%> (ø)
internal/database/postgres/postgres.go 100.00% <100.00%> (ø)
internal/database/sqlcommon/batch_sql.go 100.00% <100.00%> (ø)
internal/database/sqlcommon/event_sql.go 100.00% <100.00%> (ø)
internal/database/sqlcommon/sqlcommon.go 100.00% <100.00%> (ø)
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fce32d6...42335d8. Read the comment docs.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

var (
// TransactionTypeNone indicates no transaction should be used for this message/batch
// TransactionTypeNone deprecreated - replaced by TransactionTypeUnpinned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: deprecated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to the day we dep-recreate something

// TransactionTypeNone indicates no transaction should be used for this message/batch
// TransactionTypeNone deprecreated - replaced by TransactionTypeUnpinned
TransactionTypeNone TransactionType = ffEnum("txtype", "none")
// TransactionTypeUnpinned indicates no transaction should be used for this message/batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the wording should indicate that the message will be sent without pinning any evidence to the blockchain.

There is a FireFly transaction (although there's no blockchain transaction) - so want to avoid confusion there.

@peterbroadhurst peterbroadhurst force-pushed the batch-v2 branch 2 times, most recently from 23e7fc8 to dfdb167 Compare February 11, 2022 13:02
… with increasing sequence

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
dispatcher.mux.Lock()
key := fmt.Sprintf("%s:%s:%s[group=%v]", namespace, identity.Author, identity.Key, group)
processor, ok := dispatcher.processors[key]
name := fmt.Sprintf("%s|%s|%v", namespace, identity.Author, group)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small thing, but maybe there should be a getProcessorKey next to getDispatcherKey.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

func (bm *batchManager) WaitStop() {
<-bm.sequencerClosed
func (bm *batchManager) reapQuiescing() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Award for the best function name here 🏅

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
// Do not block sending to the shoulderTap - as it can only contain one
func (bm *batchManager) newMessageNotification(seq int64) {
log.L(bm.ctx).Debugf("Notification of message %d", seq)
if (seq - 1) < bm.readOffset {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might bear a comment here on why there are - 1 calculations. Alternately, should we just add 1 to readOffset everywhere (initialize it to 0, make the database query GtOrEq, etc)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add 1 to readOffset everywhere

Worried this is a big risky change. Going for the comment

}

func (bm *batchManager) waitForShoulderTapOrPollTimeout() {
func (bm *batchManager) waitForShoulderTapOrPollTimeout() (done bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a spelling thing - but "shoulder tap" has been replaced by "new message", right?

newQueue := make([]*batchWork, 0, len(bp.assemblyQueue)+1)
added := false
skip := false
// Check it's not in the recently flushed lish
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lish

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lush 👍 🏴󠁧󠁢󠁷󠁬󠁳󠁿

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
added = true
}
newQueue = append(newQueue, work)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also feels like much of this loop could be replaced by a single append of the new work and then a call to sort.Slice(). Would still need a check for duplicates ahead of that though, so unsure of the ultimate performance impact. Just a thought.

Copy link
Contributor Author

@peterbroadhurst peterbroadhurst Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this ends up as more code in the end of the day. We have to have all the branches that we have today, but also create a custom function do pass into the sort.Slice() to do the comparison.

As for whether it's more performant in practice to do a single traversal and allocate memory for a new pointer array each time, vs only re-allocate the array when we add and do a sort-in-place, I'm not sure.

My gut was re-allocating the array each time and doing a single pass optimizes for the case where we're adding work - which is the most common case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think with the other optimizations added to abort early in the case of a duplicate, this now reads pretty clean and doesn't need any further tweaking.

newQueue = append(newQueue, newWork)
added = true
}
if added {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added will always be true here unless skip was set. Again just feels like a lot of the logic should be skipped early if skip gets flipped to true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not actually true, if (as is the most common case) we're adding to the end of the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do a simplification experiment, to see if I can make the add logic based on if the item is > than the current one... which I think would always add

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, in the common case added = true will be hit up above on line 180, so it will be true by this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... that went badly - just leaving the skip part

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - early return was the solution 👍 (overflow and full can be assumed to be false if we didn't add anything)

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
…imming the oldest

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
s.msg.Header.TxType = fftypes.TransactionTypeUnpinned
default:
// the only other valid option is "batch_pin"
s.msg.Header.TxType = fftypes.TransactionTypeBatchPin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we yell at them for setting an invalid txtype? Or just quietly set to the default like this?

Also might consider adding similar pre-validation to broadcast/message.go - can't remember if/when that case actually fails due to an invalid txtype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to go for the default approach, but do the same for Broadcast

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
newFlushedSequences := make([]int64, newFlushedSeqLen)
for i := 0; i < len(flushAssembly); i++ {
// Add in reverse order - so we can trim the end of it off later (in the next block) and keep the newest
newFlushedSequences[len(flushAssembly)-i-1] = flushAssembly[i].msg.Sequence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... we are adding to the front of a list, in reverse order, and then ultimately trimming the end of the list.

It feels like we could add to the end of the list in order, and then trim the front, and it might be more understandable. However, I do think the current logic works, so it's not mandatory to re-spin this again if we want to defer for now.

Copy link
Contributor

@awrichar awrichar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking approved with one more optional thought for a possible simplification.

Thanks for this; feels like a huge step forward for the message batching.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
newLength = maxFlushedSeqLen
}
dropLength := combinedLen - newLength
retainLength := len(bp.flushedSequences) - dropLength
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, personally I feel like the extra vars and comments have made this MUCH easier to follow.

@peterbroadhurst peterbroadhurst merged commit b7e8078 into hyperledger:main Feb 11, 2022
@peterbroadhurst peterbroadhurst deleted the batch-v2 branch February 11, 2022 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants