Add Operations Manager, wrap all operations, and add retry functionality #517

awrichar · 2022-02-14T17:57:04Z

First step toward hyperledger/firefly-fir#10

internal/operations/manager.go

codecov-commenter · 2022-02-14T18:02:06Z

Codecov Report

Merging #517 (3369cea) into main (82b8d54) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              main      #517    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files          295       303     +8     
  Lines        17000     17386   +386     
==========================================
+ Hits         17000     17386   +386

Impacted Files	Coverage Δ
internal/config/config.go	`100.00% <ø> (ø)`
internal/txcommon/txcommon.go	`100.00% <ø> (ø)`
pkg/fftypes/operation.go	`100.00% <ø> (ø)`
internal/apiserver/route_post_op_retry.go	`100.00% <100.00%> (ø)`
internal/assets/manager.go	`100.00% <100.00%> (ø)`
internal/assets/operations.go	`100.00% <100.00%> (ø)`
internal/assets/token_approval.go	`100.00% <100.00%> (ø)`
internal/assets/token_pool.go	`100.00% <100.00%> (ø)`
internal/assets/token_transfer.go	`100.00% <100.00%> (ø)`
internal/batch/batch_processor.go	`100.00% <100.00%> (ø)`
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 82b8d54...3369cea. Read the comment docs.

Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>

Other managers can register to handle particular operation types. This keeps the logic for each type concentrated in the owning Manager instead of giving too much specialized knowledge to the Operations Manager. Also introduce a serializable PreparedOperation type for wrapping operations before they are sent off to plugins - makes for a neater split between parsing and running operations, and may also be useful for tests/debugging later. Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>

Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>

Retrying an operation that has already been retried will cause it to look up the newest copy of the operation, and retry that one. In this way, retries will always form a single chain, and attempting to re-run any of them will always add a new one to the end of the chain. Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>

awrichar · 2022-02-17T21:20:43Z

I've continued working this so it now encompasses all operations and actually adds the /retry route for operations. Unit tests and basic manual verification look good. This now fully implements hyperledger/firefly-fir#10, except for the unresolved questions listed there and the E2E tests.

peterbroadhurst

Really think this worked out well. A very clear template for what an operation means in the codebase, that's easy to extend 👍

Some minor comments to consider @awrichar

peterbroadhurst · 2022-02-28T15:08:53Z

db/migrations/postgres/000061_add_operation_retry.down.sql

@@ -0,0 +1,3 @@
+BEGIN;
+ALTER TABLE operations DROP COLUMN retry_id;


Just a note that you'll need to get a different slot (think currently 70 is next after PRs in the pipe)

➡️ ➡️ ➡️ ➡️ ➡️

internal/contracts/operations.go

internal/operations/manager.go

peterbroadhurst · 2022-02-28T15:32:29Z

internal/operations/manager.go

+
+func (om *operationsManager) writeOperationSuccess(ctx context.Context, opID *fftypes.UUID) {
+	if err := om.database.ResolveOperation(ctx, opID, fftypes.OpStatusSucceeded, "", nil); err != nil {
+		log.L(ctx).Errorf("Failed to update operation %s: %s", opID, err)


Think it would be good to write the full data of the operation here - particularly the outputs - as they would have been lost.

We don't have any path that writes operation outputs here. I'm not sure there's any more data to capture beyond what is in the log. Need to think further to be sure there's no time we should be capturing outputs here...

Ok - I understand now the ResolveOperation call is only modifying one field on the operation, it's not the call that does the full update with the outputs etc.

peterbroadhurst · 2022-02-28T15:32:35Z

internal/operations/manager.go

+
+func (om *operationsManager) writeOperationFailure(ctx context.Context, opID *fftypes.UUID, err error) {
+	if err := om.database.ResolveOperation(ctx, opID, fftypes.OpStatusFailed, err.Error(), nil); err != nil {
+		log.L(ctx).Errorf("Failed to update operation %s: %s", opID, err)


As above: Think it would be good to write the full data of the operation here - particularly the outputs - as they would have been lost.

Same as above - the only "outputs" from a synchronous failure are in the form of a Go error which is logged here. I'm not sure there's anything else useful to log... unless it's a sign that we're not returning enough info from some other layer.

internal/privatemessaging/operations.go

peterbroadhurst · 2022-02-28T15:42:44Z

pkg/database/plugin.go

 	InsertOperation(ctx context.Context, operation *fftypes.Operation) (err error)

 	// ResolveOperation - Resolve operation upon completion
 	ResolveOperation(ctx context.Context, id *fftypes.UUID, status fftypes.OpStatus, errorMsg string, output fftypes.JSONObject) (err error)


Now we have a generic UpdateOperation - I wonder if it would be more consistent to remove the ResolveOperation from the DB layer as that's just a thin wrapper:

firefly/internal/database/sqlcommon/operation_sql.go

Lines 178 to 186 in 60e00ff

func (s *SQLCommon) ResolveOperation(ctx context.Context, id *fftypes.UUID, status fftypes.OpStatus, errorMsg string, output fftypes.JSONObject) (err error) {

update := database.OperationQueryFactory.NewUpdate(ctx).

Set("status", status).

Set("error", errorMsg)

if output != nil {

update.Set("output", output)

}

return s.updateOperation(ctx, id, update)

}

Yea, was thinking the same thing. I can do this.

After looking at this again, there are a lot of places where we invoke ResolveOperation, and this change would take all of those from 1 line to 5 lines. So it would be more consistent, but more verbose as well. I slightly prefer keeping the concise helper.

Could also move the helper somewhere other than the database layer. For instance, it could be on the Operations Manager (but would then need to add a dependency from Event Manager on Operations Manager).

Includes support for token approvals in Operations Manager.

Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>

peterbroadhurst

I've marked approval @awrichar as I think you were proposing this goes in at this point, and the comments are taken as input for future work in other changes.

peterbroadhurst · 2022-03-02T20:38:24Z

internal/operations/cache.go

+		Plugin:      op.Plugin,
+		Input:       op.Input,
+	}
+	key, err := json.Marshal(opCopy)


This feels like extra evidence to the point that we should treat the inputs as references, rather than full data wherever possible. As this is going to be a large string cache key that we are marshaling.

... as an aside I still agree serialized JSON is a more efficient cache key than a SHA etc. - so the choice looks right here to me

Cool, that was my inkling as well but wanted to have someone else back me up. But agreed it's yet another reason to minimize the size of "Input" as much as is practical.

peterbroadhurst

Thanks for adding the caching. Like that implementation, and think it's great we'll have de-dup on retry within the batch processor.

…pmanager

awrichar requested review from nguyer and peterbroadhurst as code owners February 14, 2022 17:57

awrichar commented Feb 14, 2022

View reviewed changes

internal/operations/manager.go Outdated Show resolved Hide resolved

awrichar force-pushed the opmanager branch 3 times, most recently from 7b66d2b to 4ff5b01 Compare February 14, 2022 20:30

awrichar marked this pull request as draft February 16, 2022 15:39

awrichar force-pushed the opmanager branch from bae6926 to dcb75f3 Compare February 16, 2022 18:28

awrichar added 4 commits February 16, 2022 14:09

Add Operations Manager and use for token operations

a270297

Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>

Add handling for all remaining operations to Operations Manager

05605e5

Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>

Move some operation helpers from txcommon to operations

307002f

Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>

awrichar force-pushed the opmanager branch from dcb75f3 to 307002f Compare February 16, 2022 19:10

awrichar added 2 commits February 16, 2022 15:35

Fix broken unit tests

40ebfff

Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>

Add remaining test coverage for new Operations logic

915bf6e

Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>

awrichar changed the title ~~Add Operations Manager and use for token operations~~ Add Operations Manager and use for all operations Feb 17, 2022

awrichar marked this pull request as ready for review February 17, 2022 01:43

awrichar added 3 commits February 16, 2022 20:48

Merge remote-tracking branch 'origin/main' into opmanager

ab1af2b

Add route for operation retry

52ced7c

Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>

Merge remote-tracking branch 'origin/main' into opmanager

fbd717f

awrichar requested a review from nickgaski as a code owner February 17, 2022 20:47

awrichar changed the title ~~Add Operations Manager and use for all operations~~ Add Operations Manager, wrap all operations, and add retry functionality Feb 17, 2022

awrichar mentioned this pull request Feb 22, 2022

Rename Public Storage to Shared storage #538

Merged

peterbroadhurst mentioned this pull request Feb 28, 2022

Only store the batch manifest #506

Closed

peterbroadhurst requested changes Feb 28, 2022

View reviewed changes

awrichar added 2 commits March 1, 2022 17:23

Merge branch 'main' of github.com:hyperledger/firefly into opmanager

5323028

Includes support for token approvals in Operations Manager.

Cache duplicate operations when in a retry loop (such as batch dispatch)

6496d0b

Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>

Add names to all operation handlers

b836bcd

Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>

peterbroadhurst approved these changes Mar 2, 2022

View reviewed changes

peterbroadhurst reviewed Mar 2, 2022

View reviewed changes

peterbroadhurst approved these changes Mar 2, 2022

View reviewed changes

awrichar added 2 commits March 3, 2022 11:26

Merge branch 'shared-storage' of github.com:kaleido-io/firefly into o…

e991fa2

…pmanager

Merge branch 'main' of github.com:hyperledger/firefly into opmanager

3369cea

awrichar merged commit ea83d3a into hyperledger:main Mar 3, 2022

awrichar deleted the opmanager branch March 3, 2022 18:21

		@@ -0,0 +1,3 @@
		BEGIN;
		ALTER TABLE operations DROP COLUMN retry_id;

	func (s SQLCommon) ResolveOperation(ctx context.Context, id fftypes.UUID, status fftypes.OpStatus, errorMsg string, output fftypes.JSONObject) (err error) {
	update := database.OperationQueryFactory.NewUpdate(ctx).
	Set("status", status).
	Set("error", errorMsg)
	if output != nil {
	update.Set("output", output)
	}
	return s.updateOperation(ctx, id, update)
	}

Add Operations Manager, wrap all operations, and add retry functionality #517

Add Operations Manager, wrap all operations, and add retry functionality #517

Uh oh!

Conversation

awrichar commented Feb 14, 2022

Uh oh!

Uh oh!

codecov-commenter commented Feb 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

awrichar commented Feb 17, 2022

Uh oh!

peterbroadhurst left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterbroadhurst left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterbroadhurst Mar 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterbroadhurst left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Feb 14, 2022 •

edited

Loading

peterbroadhurst Mar 2, 2022 •

edited

Loading