Skip to content

Conversation

@peterbroadhurst
Copy link
Contributor

@peterbroadhurst peterbroadhurst commented Mar 4, 2022

Resolves #506

A few notes on the implementation.

Improvements related to performance:

  • The hash on a batch is now just a hash of the manifest, rather than the full payload
    • tx was added to the manifest to include in the hash
  • The database object for a batch is now the manifest
  • The manifest has been updated to include everything the batch aggregator needs to find pins
    • A count of the topics were needed for this
  • We now have a cache for messages + all data associated with a message

Improvements related to debug:

  • Added to-string helpers to definition batch actions, and log the results
  • Added a GET /status/pins collection to peekinside the pins status

Migration:

  • The code copes with a persisted batch of the old type stored in the DB
    • Version in manifest used to distinguish this, and provide future extensibility
  • The code copes with processing a batch that has a payload hash, rather than a manifest hash
    • To handle late-join/re-sync to a network processing old broadcasts

Potential follow-on work:

Message/data cache implementation notes

Messages have fields that are mutable, in two categories

  1. Can change multiple times like state - you cannot rely on the cache for these
  2. Can go from being un-set, to being set, and once set are immutable.

For (2) the cache provides a set of CacheReadOption modifiers that makes it safe to query the cache, even if the cache we slow to update asynchronously (active/active cluster being the ultimate example here, but from code inspection this is possible in the current cache).

If you use CRORequestBatchID then the cache will return a miss, if there is no BatchID set.

If you use CRORequirePins then the cache will return a miss, if the number of pins does not match the number of topics in the message.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
return nil
}
// For broadcast data the blob reference contains the "public" (shared storage) reference, which
// must have been allocated to this data item before sealing the batch.
Copy link
Contributor

@awrichar awrichar Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the operative decision for #568 - I'm still wondering if it's possible for the public reference to be removed from the batch hash altogether. Perhaps it could be stored on the Blob instead of on the BlobRef?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mainly odd that private blob transfer happens after batch sealing, but public blob transfer must happen before batch sealing in order to include the IPFS ref. Seems like they should happen in the same order regardless.

When a node receives an IPFS ref, it does do some checking of the fetched contents to verify they match the blob hash before recording the blob as received. The question is whether this is "good enough" to say we can send IPFS refs without including them in the batch's hash proof.

Copy link
Contributor Author

@peterbroadhurst peterbroadhurst Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like they should happen in the same order regardless.

While I agree the discrepancy is annoying - I do not think this is possible, as IPFS does not have a "messaging" capability. It's just a storage system. So the blockchain has to be the messaging system in this case.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
…ache on upsert

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
@codecov-commenter
Copy link

codecov-commenter commented Mar 8, 2022

Codecov Report

Merging #582 (8952a4f) into main (e7c080f) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##              main      #582    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files          304       304            
  Lines        17544     17801   +257     
==========================================
+ Hits         17544     17801   +257     
Impacted Files Coverage Δ
internal/orchestrator/orchestrator.go 100.00% <ø> (ø)
internal/apiserver/route_get_batch_by_id.go 100.00% <100.00%> (ø)
internal/apiserver/route_get_batches.go 100.00% <100.00%> (ø)
internal/apiserver/route_get_data.go 100.00% <100.00%> (ø)
internal/apiserver/route_get_msg_data.go 100.00% <100.00%> (ø)
internal/apiserver/route_get_status_pins.go 100.00% <100.00%> (ø)
internal/batch/batch_manager.go 100.00% <100.00%> (ø)
internal/batch/batch_processor.go 100.00% <100.00%> (ø)
internal/batchpin/batchpin.go 100.00% <100.00%> (ø)
internal/batchpin/operations.go 100.00% <100.00%> (ø)
... and 37 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e7c080f...8952a4f. Read the comment docs.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>

var getStatusPins = &oapispec.Route{
Name: "getStatusPins",
Path: "status/pins",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these under status? Doesn't feel entirely like a "status" object to me since it's just a collection listing... I guess the alternative would be a root endpoint though. Open to anything really, just wanted to call it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could put them at the root, if we wanted to explain them in more detail as a first class object.
My thinking here was they are an internal read-only view of the state for problem diagnosis. Understand it's not perfect, and happy to discuss more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall be merging through the chain with this here, but with an open-ness to move it in a future commit

msgIDs[i] = msg.Header.ID
// We don't want to have to read the DB again if we want to query for the batch ID, or pins,
// so ensure the copy in our cache gets updated.
bp.data.UpdateMessageIfCached(ctx, msg)
Copy link
Contributor

@awrichar awrichar Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to make sure it's OK that we update the batchID in the cache before we write the batchID to the database itself (ie in case we somehow fail to update the database)... I think it's OK and we would come back around and try to write the same batchID on the second try. Just wanted to put a note because I was staring at this for a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the only way to avoid this would be to push all of the cache updating (in all cases) to post-commit actions.
I convinced myself that wasn't required, but happy to discuss more.

for di, dataRef := range msg.Data {
msgData[di] = dataByID[*dataRef.ID]
if msgData[di] == nil || !msgData[di].Hash.Equals(dataRef.Hash) {
log.L(ctx).Debugf("Message '%s' in batch '%s' - data not in-line in batch id='%s' hash='%s'", msg.Header.ID, batch.ID, dataRef.ID, dataRef.Hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be higher than debug? Is this an expected situation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The architecture prior to this code change allows it.
e.g. you could send some data to a party, then send a message referring to that data, without sending that data again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A specific example would be a broadcast, followed by a private message.

fftypes.OpTypeDataExchangeBatchSend)
op.Input = fftypes.JSONObject{
"batch": tw.Batch.ID,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be overwritten by addBatchSendInputs below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, think this was a merge error

Copy link
Contributor

@awrichar awrichar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - marking approved, although I did leave a few comments inline that are worth a look before you decide to merge.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
@peterbroadhurst peterbroadhurst merged commit be62be6 into hyperledger:main Mar 11, 2022
@peterbroadhurst peterbroadhurst deleted the batch-upgrade branch March 11, 2022 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Only store the batch manifest

3 participants