Assign nonces more efficiently, with minimal DB ops #650

peterbroadhurst · 2022-03-30T17:23:39Z

This PR addresses two problems:

Currently when we are flushing a batch, we perform a DB read+update individually. This is very inefficient when you have a large number of messages for the same topic+group combination in the same batch. Instead we can keep a track in memory of the nonce increments, and flush them one time at the end of the dispatch.
I found while investigating (1) that we were not including the author of the message in the calculation of which Nonce we assigned from the DB. This is a bug. e.g. if you sent a message on topicA in a group with both bob and sally using author bob, then sent another message from the same node as sally on the same topicA - then sally would get the wrong nonce.

As per the comments, some consideration has been made to ensure we take allocation of nonces seriously. Two key scenarios:

Double assigning a nonce to a message: If we crashed after allocating a nonce in sending a batch, then after restart included the same messages back into a new batch then we need to re-use the same nonce rather than allocating a new one.
Failing to assign a nonce due to DB retry: If we span round the sealBatch logic once, then got a DB error and retried, the nonces we notionally assigned to the messages would not have been flushed to disk (we flush the nonces to the DB right at the end of this logic, and all DB plugins today have atomic TXs). This means when we retry we need to do the allocation again.

The above scenarios is why we check that the msg.Pins array hasn't been assigned, and only assign it after we've exited the retry loop.

codecov-commenter · 2022-03-30T17:35:09Z

Codecov Report

Merging #650 (4f7ca1a) into main (a70dfbd) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #650   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          313       313           
  Lines        18975     18991   +16     
=========================================
+ Hits         18975     18991   +16

Impacted Files	Coverage Δ
internal/batch/batch_processor.go	`100.00% <100.00%> (ø)`
internal/database/sqlcommon/nonce_sql.go	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a70dfbd...4f7ca1a. Read the comment docs.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

db/migrations/postgres/000078_add_nonce_author.down.sql

internal/batch/batch_processor.go

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

awrichar · 2022-03-30T19:10:14Z

internal/batch/batch_processor.go

 	nonceBytes := make([]byte, 8)
-	binary.BigEndian.PutUint64(nonceBytes, uint64(gc.Nonce))
+	binary.BigEndian.PutUint64(nonceBytes, uint64(nonce))
 	hashBuilder.Write(nonceBytes)

 	pin := fftypes.HashResult(hashBuilder)


Isn't this recomputing the same value from line 455?

Line 455 is the state of the hash after we've written:

The topic

The group

The author

We use this un-nonce'd hash as the lookup key into the database to find what nonce we last used for this combination.

Then line 463 here we are generating a big endian 8byte hex value for the nonce (which we've just determined from the above) and adding it to the end of the hash.

Example here:

firefly/internal/batch/batch_manager_test.go

Line 189 in 25a2b21

"746f70696331" + "44dc0861e69d9bab17dd5e90a8898c2ea156ad04e5fabf83119cc010486e6c1b" + "6469643a66697265666c793a6f72672f61626364" + "0000000000003039",

internal/batch/batch_processor.go

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

peterbroadhurst requested review from nguyer and awrichar as code owners March 30, 2022 17:23

Assign nonces more efficiently, with minimal DB ops

dec565f

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

peterbroadhurst force-pushed the nonce++ branch from 4f7ca1a to dec565f Compare March 30, 2022 18:48

awrichar reviewed Mar 30, 2022

View reviewed changes

db/migrations/postgres/000078_add_nonce_author.down.sql Outdated Show resolved Hide resolved

awrichar reviewed Mar 30, 2022

View reviewed changes

internal/batch/batch_processor.go Show resolved Hide resolved

peterbroadhurst added 2 commits March 30, 2022 15:03

Add comment

d320006

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

Remove cruft from migration

25a2b21

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

awrichar reviewed Mar 30, 2022

View reviewed changes

internal/batch/batch_processor.go Show resolved Hide resolved

awrichar approved these changes Mar 30, 2022

View reviewed changes

Split flushNonceState to a separate call in sealBatch

2cf8d16

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

peterbroadhurst merged commit b7937f7 into hyperledger:main Mar 30, 2022

peterbroadhurst deleted the nonce++ branch March 30, 2022 20:13

peterbroadhurst mentioned this pull request Mar 30, 2022

Cannot assume zero as reason to insert vs. update #652

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assign nonces more efficiently, with minimal DB ops #650

Assign nonces more efficiently, with minimal DB ops #650

peterbroadhurst commented Mar 30, 2022

codecov-commenter commented Mar 30, 2022

awrichar Mar 30, 2022

peterbroadhurst Mar 30, 2022 •

edited

Loading

Assign nonces more efficiently, with minimal DB ops #650

Assign nonces more efficiently, with minimal DB ops #650

Conversation

peterbroadhurst commented Mar 30, 2022

codecov-commenter commented Mar 30, 2022

Codecov Report

awrichar Mar 30, 2022

Choose a reason for hiding this comment

peterbroadhurst Mar 30, 2022 • edited Loading

Choose a reason for hiding this comment

peterbroadhurst Mar 30, 2022 •

edited

Loading