Update aggregator batch processing to maintain in-memory pin state until OnFinalize #483

peterbroadhurst · 2022-02-06T00:45:11Z

Fix for #481

We were failing to process multiple private messages on the same topic, within a single batch of pins that are read by the aggregator. This was a side-effect of the #462 changes - because the logic that evaluates the NextPin state on each message no longer had up-to-date information to read from the database.

The core fix implemented for that, was to move to in-memory processing for all state that can change during processing of a page of pins in the aggregator. So the first time we access the state on a context (context is a particular topic, scoped to a group if it's a private context), we read the data we need in memory and from that point on we update it in memory until the Finalize phase is called at the end. At that point everything is flushed.

Fixing this turned into quite a big change to the internals of the aggregator, and in doing so I did find two other less serious issues:

When there were multiple pins in a message (multiple topics) we would leave some of those pins with dispatched=false when the message was confirmed. This means potential re-processing on rewind.
If the pins within a message with multiple topics, spanned a page of reads from the aggregator, then when we came to the next page we could immediately re-process the message.

The fix for ^^^ was that we always mark all pins associated with a message as dispatched. That can include pins outside of the page that was just read, so we calculate the start+end index of the pins within the batch for that message, and do an update in the DB scoped to that start+end range (and the batch ID).

While enhancing the E2E I found another bug - in the automatic reply generation for webhooks, we're not setting the topics in the reply to match the request.

…til OnFinalize Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

codecov-commenter · 2022-02-06T00:56:01Z

Codecov Report

Merging #483 (8b73337) into main (aeff8a8) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #483   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          266       267    +1     
  Lines        15147     15239   +92     
=========================================
+ Hits         15147     15239   +92

Impacted Files	Coverage Δ
internal/database/sqlcommon/pin_sql.go	`100.00% <100.00%> (ø)`
internal/events/aggregator.go	`100.00% <100.00%> (ø)`
internal/events/aggregator_batch_state.go	`100.00% <100.00%> (ø)`
internal/events/webhooks/webhooks.go	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aeff8a8...8b73337. Read the comment docs.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

…roadcast and private Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

awrichar · 2022-02-07T16:20:31Z

internal/database/sqlcommon/pin_sql.go

+		return err
+	}
+
+	_, err = s.updateTx(ctx, tx, query, nil /* no change events filter based update */)


So this type does reliably emit ChangeEventTypeCreated. It theoretically emits ChangeEventTypeDeleted, although I don't know of anywhere we call DeletePin. It now never emits ChangeEventTypeUpdated though.

I wonder if having change events at all is useful, or if it's confusing to emit them inconsistently.

This is common across the codebase for these update-multiple style actions.

We don't have a need for the events where we use them currently, and it's hard in SQL without detailed DB-specific coding to find out what rows have been updated. There's also a performance overhead

internal/events/aggregator_batch_state.go

awrichar · 2022-02-07T16:23:55Z

internal/events/aggregator_batch_state.go

+}
+
+// batchState are synchronous actions to be performed while processing system messages, but which must happen after reading the whole batch
+type batchState struct {


The description here doesn't fully cover the expanded purpose of this type

👍 - I've added a bunch of extra text here - might be good for you to validate my understanding @awrichar

Looks very good, thanks 🤩

awrichar · 2022-02-07T16:57:05Z

test/e2e/onchain_offchain_test.go

-	val2 := validateReceivedMessages(suite.testState, suite.testState.client2, fftypes.MessageTypeBroadcast, fftypes.TransactionTypeBatchPin, 1, 0)
-	assert.Equal(suite.T(), data.Value, val2.Value)
+	for i := 0; i < totalMessages; i++ {
+		// Wait for all thel message-confirmed events, from both participants


I feel like there's a slight step backward in going from waitForMessageConfirmed to simply counting websocket events - but it can be a topic for future enhancement if needed.

Not an intentional change - I'll fix that

awrichar · 2022-02-07T16:58:55Z

test/e2e/tokens_test.go

 	assert.Equal(suite.T(), fftypes.TokenTransferTypeTransfer, transfers[0].Type)
 	assert.Equal(suite.T(), int64(1), transfers[0].Amount.Int().Int64())
-	data := GetDataForMessage(suite.T(), suite.testState.client1, suite.testState.startTime, transfers[0].MessageHash)
+	data := GetDataByMessageHash(suite.T(), suite.testState.client1, suite.testState.startTime, transfers[0].MessageHash)


NB: We could actually do GetDataForMessage here, as we do have the ID in transfers[0].Message. When the test was originally written, the hash was the only thing recorded, so we had to do the lookup by hash - but now looking up by ID is probably better.

awrichar

I like where this has landed. The pin logic is still pretty complex, but as far as I can tell it looks right - and I think it's slightly more efficient now? Thanks for adding E2E coverage as well.

No major notes, so marking approved and we can always circle back on anything that doesn't need to be addressed immediately.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

Update aggregator batch processing to maintain in-memory pin state un…

72e2ede

…til OnFinalize Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

peterbroadhurst added 4 commits February 6, 2022 12:28

Round out UT coverage

2be9661

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

Add batch index and correct filters on pins

9e67065

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

Update E2E test to send multiple messages over different topics for b…

65bfbfe

…roadcast and private Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

Merge with main

651ed24

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

peterbroadhurst marked this pull request as ready for review February 6, 2022 21:26

peterbroadhurst requested review from awrichar and nguyer as code owners February 6, 2022 21:26

peterbroadhurst added 2 commits February 6, 2022 16:30

Reduce overhead of batch indexing on write

069547d

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

Merge branch 'main' of github.com:hyperledger/firefly into fix-481

89428af

awrichar reviewed Feb 7, 2022

View reviewed changes

internal/events/aggregator_batch_state.go Outdated Show resolved Hide resolved

awrichar reviewed Feb 7, 2022

View reviewed changes

awrichar approved these changes Feb 7, 2022

View reviewed changes

peterbroadhurst added 2 commits February 7, 2022 14:18

Add commentary and change scope of NextPinState

0223291

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

Tweaks to tests

8b73337

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

awrichar approved these changes Feb 7, 2022

View reviewed changes

peterbroadhurst merged commit 0c7b08b into hyperledger:main Feb 7, 2022

peterbroadhurst deleted the fix-481 branch February 7, 2022 19:39

peterbroadhurst mentioned this pull request Feb 8, 2022

Fix batch pin index calculation logic and improve logging #499

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update aggregator batch processing to maintain in-memory pin state until OnFinalize #483

Update aggregator batch processing to maintain in-memory pin state until OnFinalize #483

Uh oh!

peterbroadhurst commented Feb 6, 2022 •

edited

Loading

Uh oh!

codecov-commenter commented Feb 6, 2022 •

edited

Loading

Uh oh!

awrichar Feb 7, 2022

Uh oh!

peterbroadhurst Feb 7, 2022 •

edited

Loading

Uh oh!

Uh oh!

awrichar Feb 7, 2022 •

edited

Loading

Uh oh!

peterbroadhurst Feb 7, 2022

Uh oh!

awrichar Feb 7, 2022

Uh oh!

awrichar Feb 7, 2022

Uh oh!

peterbroadhurst Feb 7, 2022

Uh oh!

awrichar Feb 7, 2022

Uh oh!

awrichar left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update aggregator batch processing to maintain in-memory pin state until OnFinalize #483

Update aggregator batch processing to maintain in-memory pin state until OnFinalize #483

Uh oh!

Conversation

peterbroadhurst commented Feb 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Feb 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

awrichar Feb 7, 2022

Choose a reason for hiding this comment

Uh oh!

peterbroadhurst Feb 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

awrichar Feb 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterbroadhurst Feb 7, 2022

Choose a reason for hiding this comment

Uh oh!

awrichar Feb 7, 2022

Choose a reason for hiding this comment

Uh oh!

awrichar Feb 7, 2022

Choose a reason for hiding this comment

Uh oh!

peterbroadhurst Feb 7, 2022

Choose a reason for hiding this comment

Uh oh!

awrichar Feb 7, 2022

Choose a reason for hiding this comment

Uh oh!

awrichar left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

peterbroadhurst commented Feb 6, 2022 •

edited

Loading

codecov-commenter commented Feb 6, 2022 •

edited

Loading

peterbroadhurst Feb 7, 2022 •

edited

Loading

awrichar Feb 7, 2022 •

edited

Loading