Re-work batch logic for simplicity, efficiency, and restart recovery #501

peterbroadhurst · 2022-02-09T06:38:11Z

This leads on from the investigation in #421 and then #499

At the moment this is how the batching logic works:

OLD

The assembly loop of each batch dispatcher is able to keep accepting work as long as the last batch has been successfully dispatched.
The assembly loop passes work to the persistence loop, which has as many slots in its channel input as there are slots in the batch.
The persistence loop keeps continually recording data to the database.
The offset on the batch manager that's reading from the messages queue, is updated as soon as the assembly loop picks up the message

This design evolved from a similar design in the previous generation of Asset Trail technology.

Given this most recent issue, and some inefficiency and crash recovery challenges that exist in the design after it's port over to FireFly and evolution, I'm planning an updated design.

NEW

This PR updates it to simplify it (even beyond the previous proposal in #421):

We remove two of the loops
Persistence only happens once for each batch in the final stage when the batch is sealed
Dispatching is synchronous to the assembler
The assembler only allows one batch to be in assembly before it blocks waiting for the dispatch completion
The batch manager just has in-memory offset state, and on restart it looks from time-zero again
We introduce a new sent state for messages that go into a batch
Anything that doesn't reach that state is eligible for re-send after restart (at least once delivery)
The manager can rewind at any time, because everything will either:
- Be persisted with a batch ID so skipped
- Be already in-flight
The processor keeps track of sequences its flushing, and does duplicate detection
The processor orders the assembly batch in sequence order

Note this means if things are appearing at once across a batch assembly window of 0.5s, it's very likely all messages will be sent in DB sequence order (even though the DB doesn't guarantee that's the order they'll become visible in).

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

awrichar · 2022-02-09T18:17:06Z

Explanation seems sound. I'll wait for more of the code to take shape before reviewing though.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

codecov-commenter · 2022-02-10T18:37:17Z

Codecov Report

Merging #501 (42335d8) into main (fce32d6) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #501   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          275       275           
  Lines        15694     15731   +37     
=========================================
+ Hits         15694     15731   +37

Impacted Files	Coverage Δ
internal/database/sqlcommon/provider.go	`100.00% <ø> (ø)`
...nternal/apiserver/route_get_status_batchmanager.go	`100.00% <100.00%> (ø)`
internal/batch/batch_manager.go	`100.00% <100.00%> (ø)`
internal/batch/batch_processor.go	`100.00% <100.00%> (ø)`
internal/broadcast/manager.go	`100.00% <100.00%> (ø)`
internal/broadcast/message.go	`100.00% <100.00%> (ø)`
internal/database/postgres/postgres.go	`100.00% <100.00%> (ø)`
internal/database/sqlcommon/batch_sql.go	`100.00% <100.00%> (ø)`
internal/database/sqlcommon/event_sql.go	`100.00% <100.00%> (ø)`
internal/database/sqlcommon/sqlcommon.go	`100.00% <100.00%> (ø)`
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fce32d6...42335d8. Read the comment docs.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

awrichar · 2022-02-10T22:20:24Z

pkg/fftypes/transaction.go


 var (
-	// TransactionTypeNone indicates no transaction should be used for this message/batch
+	// TransactionTypeNone deprecreated - replaced by TransactionTypeUnpinned


typo: deprecated

Looking forward to the day we dep-recreate something

awrichar · 2022-02-10T22:21:41Z

pkg/fftypes/transaction.go

-	// TransactionTypeNone indicates no transaction should be used for this message/batch
+	// TransactionTypeNone deprecreated - replaced by TransactionTypeUnpinned
 	TransactionTypeNone TransactionType = ffEnum("txtype", "none")
+	// TransactionTypeUnpinned indicates no transaction should be used for this message/batch


I think the wording should indicate that the message will be sent without pinning any evidence to the blockchain.

There is a FireFly transaction (although there's no blockchain transaction) - so want to avoid confusion there.

… with increasing sequence Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

awrichar · 2022-02-11T16:17:24Z

internal/batch/batch_manager.go

-	dispatcher.mux.Lock()
-	key := fmt.Sprintf("%s:%s:%s[group=%v]", namespace, identity.Author, identity.Key, group)
-	processor, ok := dispatcher.processors[key]
+	name := fmt.Sprintf("%s|%s|%v", namespace, identity.Author, group)


Small thing, but maybe there should be a getProcessorKey next to getDispatcherKey.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

awrichar · 2022-02-11T16:21:07Z

internal/batch/batch_manager.go


-func (bm *batchManager) WaitStop() {
-	<-bm.sequencerClosed
+func (bm *batchManager) reapQuiescing() {


Award for the best function name here 🏅

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

awrichar · 2022-02-11T16:32:32Z

internal/batch/batch_manager.go

-		// Do not block sending to the shoulderTap - as it can only contain one
+func (bm *batchManager) newMessageNotification(seq int64) {
+	log.L(bm.ctx).Debugf("Notification of message %d", seq)
+	if (seq - 1) < bm.readOffset {


Might bear a comment here on why there are - 1 calculations. Alternately, should we just add 1 to readOffset everywhere (initialize it to 0, make the database query GtOrEq, etc)?

just add 1 to readOffset everywhere

Worried this is a big risky change. Going for the comment

awrichar · 2022-02-11T16:34:10Z

internal/batch/batch_manager.go

 }

-func (bm *batchManager) waitForShoulderTapOrPollTimeout() {
+func (bm *batchManager) waitForShoulderTapOrPollTimeout() (done bool) {


Just a spelling thing - but "shoulder tap" has been replaced by "new message", right?

internal/batch/batch_processor.go

awrichar · 2022-02-11T16:45:21Z

internal/batch/batch_processor.go

+	newQueue := make([]*batchWork, 0, len(bp.assemblyQueue)+1)
+	added := false
+	skip := false
+	// Check it's not in the recently flushed lish


lush 👍 🏴󠁧󠁢󠁷󠁬󠁳󠁿

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

internal/batch/batch_processor.go

awrichar · 2022-02-11T17:04:39Z

internal/batch/batch_processor.go

+			added = true
+		}
+		newQueue = append(newQueue, work)
+	}


It also feels like much of this loop could be replaced by a single append of the new work and then a call to sort.Slice(). Would still need a check for duplicates ahead of that though, so unsure of the ultimate performance impact. Just a thought.

So I think this ends up as more code in the end of the day. We have to have all the branches that we have today, but also create a custom function do pass into the sort.Slice() to do the comparison.

As for whether it's more performant in practice to do a single traversal and allocate memory for a new pointer array each time, vs only re-allocate the array when we add and do a sort-in-place, I'm not sure.

My gut was re-allocating the array each time and doing a single pass optimizes for the case where we're adding work - which is the most common case.

Yea, I think with the other optimizations added to abort early in the case of a duplicate, this now reads pretty clean and doesn't need any further tweaking.

awrichar · 2022-02-11T17:08:20Z

internal/batch/batch_processor.go

+		newQueue = append(newQueue, newWork)
+		added = true
+	}
+	if added {


added will always be true here unless skip was set. Again just feels like a lot of the logic should be skipped early if skip gets flipped to true.

That's not actually true, if (as is the most common case) we're adding to the end of the list.

I will do a simplification experiment, to see if I can make the add logic based on if the item is > than the current one... which I think would always add

Yea, in the common case added = true will be hit up above on line 180, so it will be true by this point.

... that went badly - just leaving the skip part

ok - early return was the solution 👍 (overflow and full can be assumed to be false if we didn't add anything)

internal/batch/batch_processor.go

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

internal/batch/batch_processor.go

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

…imming the oldest Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

internal/database/sqlcommon/event_sql.go

awrichar · 2022-02-11T17:58:31Z

internal/privatemessaging/message.go

+		s.msg.Header.TxType = fftypes.TransactionTypeUnpinned
+	default:
+		// the only other valid option is "batch_pin"
 		s.msg.Header.TxType = fftypes.TransactionTypeBatchPin


Should we yell at them for setting an invalid txtype? Or just quietly set to the default like this?

Also might consider adding similar pre-validation to broadcast/message.go - can't remember if/when that case actually fails due to an invalid txtype.

Going to go for the default approach, but do the same for Broadcast

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

awrichar · 2022-02-11T18:20:45Z

internal/batch/batch_processor.go

+	newFlushedSequences := make([]int64, newFlushedSeqLen)
+	for i := 0; i < len(flushAssembly); i++ {
+		// Add in reverse order - so we can trim the end of it off later (in the next block) and keep the newest
+		newFlushedSequences[len(flushAssembly)-i-1] = flushAssembly[i].msg.Sequence


So... we are adding to the front of a list, in reverse order, and then ultimately trimming the end of the list.

It feels like we could add to the end of the list in order, and then trim the front, and it might be more understandable. However, I do think the current logic works, so it's not mandatory to re-spin this again if we want to defer for now.

awrichar

Marking approved with one more optional thought for a possible simplification.

Thanks for this; feels like a huge step forward for the message batching.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

awrichar · 2022-02-11T18:38:19Z

internal/batch/batch_processor.go

+		newLength = maxFlushedSeqLen
+	}
+	dropLength := combinedLen - newLength
+	retainLength := len(bp.flushedSequences) - dropLength


Thanks, personally I feel like the extra vars and comments have made this MUCH easier to follow.

peterbroadhurst added 5 commits February 8, 2022 17:44

Fix batch pin index calculation logic and improve logging

0ac34bc

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

Rewind offset to one before the message that popped

8bd5fd3

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

Re-work batch logic for simplicity, efficiency, and restart recovery

5bdaad0

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

Further refinement and testing

fb93472

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

More enhancements on logic

80c93cb

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

peterbroadhurst added 4 commits February 10, 2022 09:03

Private message send refactor to use batches always

dc1eae4

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

Prepare to be merge helled

1bd3735

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

Merge branch 'main' of github.com:hyperledger/firefly into batch-v2

0e679f2

Work through the changes for batch-only private xfers

c4ef11d

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

peterbroadhurst added 3 commits February 10, 2022 14:03

Tweaks

9c55381

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

Rename "none" to "unpinned" now we have a TX for them

0c6912a

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

Correct the last pin index

4630c3d

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

peterbroadhurst mentioned this pull request Feb 10, 2022

Only store the batch manifest #506

Closed

Close out on UTs

dfdb167

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

peterbroadhurst marked this pull request as ready for review February 10, 2022 21:26

peterbroadhurst requested review from awrichar, nguyer and nickgaski as code owners February 10, 2022 21:26

peterbroadhurst mentioned this pull request Feb 10, 2022

FF-Perf: Websocket events appear to be dropped #493

Closed

awrichar reviewed Feb 10, 2022

View reviewed changes

peterbroadhurst force-pushed the batch-v2 branch 2 times, most recently from 23e7fc8 to dfdb167 Compare February 11, 2022 13:02

Add exclusive table lock for event emission to ensure event detection…

1d3cc7d

… with increasing sequence Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>