Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign nonces more efficiently, with minimal DB ops #650

Merged
merged 4 commits into from
Mar 30, 2022

Conversation

peterbroadhurst
Copy link
Contributor

This PR addresses two problems:

  1. Currently when we are flushing a batch, we perform a DB read+update individually. This is very inefficient when you have a large number of messages for the same topic+group combination in the same batch. Instead we can keep a track in memory of the nonce increments, and flush them one time at the end of the dispatch.

  2. I found while investigating (1) that we were not including the author of the message in the calculation of which Nonce we assigned from the DB. This is a bug. e.g. if you sent a message on topicA in a group with both bob and sally using author bob, then sent another message from the same node as sally on the same topicA - then sally would get the wrong nonce.

As per the comments, some consideration has been made to ensure we take allocation of nonces seriously. Two key scenarios:

  • Double assigning a nonce to a message: If we crashed after allocating a nonce in sending a batch, then after restart included the same messages back into a new batch then we need to re-use the same nonce rather than allocating a new one.
  • Failing to assign a nonce due to DB retry: If we span round the sealBatch logic once, then got a DB error and retried, the nonces we notionally assigned to the messages would not have been flushed to disk (we flush the nonces to the DB right at the end of this logic, and all DB plugins today have atomic TXs). This means when we retry we need to do the allocation again.

The above scenarios is why we check that the msg.Pins array hasn't been assigned, and only assign it after we've exited the retry loop.

@codecov-commenter
Copy link

Codecov Report

Merging #650 (4f7ca1a) into main (a70dfbd) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #650   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          313       313           
  Lines        18975     18991   +16     
=========================================
+ Hits         18975     18991   +16     
Impacted Files Coverage Δ
internal/batch/batch_processor.go 100.00% <100.00%> (ø)
internal/database/sqlcommon/nonce_sql.go 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a70dfbd...4f7ca1a. Read the comment docs.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
nonceBytes := make([]byte, 8)
binary.BigEndian.PutUint64(nonceBytes, uint64(gc.Nonce))
binary.BigEndian.PutUint64(nonceBytes, uint64(nonce))
hashBuilder.Write(nonceBytes)

pin := fftypes.HashResult(hashBuilder)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this recomputing the same value from line 455?

Copy link
Contributor Author

@peterbroadhurst peterbroadhurst Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 455 is the state of the hash after we've written:

  • The topic
  • The group
  • The author

We use this un-nonce'd hash as the lookup key into the database to find what nonce we last used for this combination.

Then line 463 here we are generating a big endian 8byte hex value for the nonce (which we've just determined from the above) and adding it to the end of the hash.

Example here:

"746f70696331" + "44dc0861e69d9bab17dd5e90a8898c2ea156ad04e5fabf83119cc010486e6c1b" + "6469643a66697265666c793a6f72672f61626364" + "0000000000003039",

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants