[Consensus] Queue for `synchronization.Engine` #910

durkmurder · 2021-06-29T14:54:55Z

Context

This PR implements support of inbound queue for messages that are submitted by network layer and/or other engines.
As described in issue this PR mainly implements messages queening logic as it is implemented in other consensus engines(sealing, matching, compliance). Also currently there is quite inefficient access to final protocol state, this PR contains changes for this as well.

In proposed implementation we have two separate goroutines which handle requests and responses in parallel, maybe that's not needed but I though that it's a good idea to more evenly distribute load.

Contributions of this PR

Implemented queening logic for synchronization engine
Changed how synchronization engine access latest finalized block
Optimized concurrent access to shared data
Extended tests
IMPORTANT: implemented new way to deliver finalization events on node level, initialization of every node has been changed.

…ed implementing queue consuming in engine

…tributor. Updated all nodes

zhangchiqing

Looks good

zhangchiqing · 2021-07-02T22:51:00Z

engine/common/synchronization/engine.go

+
+// setupRequestMessageHandler initializes the inbound queues and the MessageHandler for UNTRUSTED requests.
+func (e *Engine) setupRequestMessageHandler() {
+	e.pendingSyncRequests = NewRequestQueue(defaultSyncRequestQueueCapacity)


Suggested change

e.pendingSyncRequests = NewRequestQueue(defaultSyncRequestQueueCapacity)

// RequestQueue deduplicates requests by keeping only one sync request for each requester.

e.pendingSyncRequests = NewRequestQueue(defaultSyncRequestQueueCapacity)

… implemented by `badger.Headers`

• reduced locking in `synchronization.Core`

AlexHentschel

Comments & Suggestions:

It would be great to fix this performance bottleneck:

flow-go/engine/common/synchronization/engine.go

Line 489 in 5fd2ff2

block, err := e.blocks.ByHeight(height)
- In contrast to badger.Headers, which has a cache for heights, badger.blocks does not have such a cache and instead always hits the data base.

I think we could inline method

flow-go/engine/common/synchronization/engine.go

Lines 581 to 596 in 5fd2ff2

    
           // processIncoming processes an incoming block, so we can take into account the 
        
           // overlap between block IDs and heights. 
        
           func (e *Engine) processIncomingBlock(originID flow.Identifier, block *flow.Block) { 
        
           	shouldProcess := e.core.HandleBlock(block.Header) 
        
           	if !shouldProcess { 
        
           		return 
        
           	} 
        
           	synced := &events.SyncedBlock{ 
        
           		OriginID: originID, 
        
           		Block:    block, 
        
           	} 
        
           	e.comp.SubmitLocal(synced) 
        
           }

as it is only used once.

We might be able to reduce lock congestion in Core. HandleHeight :

we always lock right away, but on the happy path when the node is up to date, this code would return right away:

flow-go/module/synchronization/core.go

Lines 102 to 105 in 5fd2ff2

    
           // don't bother queueing anything if we're within tolerance 
        
           if c.WithinTolerance(final, height) { 
        
           	return 
        
           }

. WithinTolerance is fully concurrency safe without any locks (we also call it externally, here). Hence, we could move the lock further down into the if statement, where we actually update Core's state.

Implemented these suggestions☝️ in my PR #926.

The suggestions below (code comments) are not implemented in PR 926.

engine/common/synchronization/engine.go

AlexHentschel · 2021-07-02T22:58:52Z

engine/common/synchronization/request_queue.go

+
+	// we keep reducing the cache size until we are at limit again
+	for len(q.requests) >= int(q.limit) {
+
+		// eject first element using go map properties
+		var key flow.Identifier
+		for originID := range q.requests {
+			key = originID
+			break
+		}
+
+		delete(q.requests, key)
+	}


I feel we could write this much more concisely:

Suggested change

// we keep reducing the cache size until we are at limit again

for len(q.requests) >= int(q.limit) {

// eject first element using go map properties

var key flow.Identifier

for originID := range q.requests {

key = originID

break

}

delete(q.requests, key)

}

// we keep reducing the cache size until we are at limit again

for overCapacity := len(q.requests) - int(q.limit); overCapacity >= 0; overCapacity -- {

for originID := range q.requests {

delete(q.requests, originID)

}

}

Did I miss something?

AlexHentschel · 2021-07-02T23:11:31Z

engine/common/synchronization/request_queue.go

+	// first try to eject if we are at max capacity, we need to do this way
+	// to prevent a situation where just inserted item gets ejected
+	q.reduce()


Consider the following scenario:

the queue is at max capacity

it contains already an element from Origin A

we are attempting to put a new element from A into the queue

The result is that:

the old message from A is overwritten by the new one (desired).

but we also ejected some other message (which seems not very intuitive)

Suggestion:

Suggested change

// first try to eject if we are at max capacity, we need to do this way

// to prevent a situation where just inserted item gets ejected

q.reduce()

if _, found := q.requests[message.OriginID]; !found {

// if no message from the origin is stored, make sure we have room to store the new message:

q.reduce()

}

engine/common/synchronization/request_queue.go

AlexHentschel · 2021-07-03T00:02:35Z

engine/common/synchronization/engine.go

+		filter.Not(filter.HasNodeID(e.me.NodeID())),
+	))
+	if err != nil {
+		return fmt.Errorf("could not send get consensus participants: %w", err)


Suggested change

return fmt.Errorf("could not send get consensus participants: %w", err)

return fmt.Errorf("could get consensus participants at latest finalized block: %w", err)

engine/common/synchronization/engine.go

…queue_-_suggestions suggestions for PR 910

codecov-commenter · 2021-07-05T13:14:56Z

Codecov Report

Merging #910 (a00f07e) into master (2b9772a) will increase coverage by 0.13%.
The diff coverage is 67.04%.

@@            Coverage Diff             @@
##           master     #910      +/-   ##
==========================================
+ Coverage   56.56%   56.69%   +0.13%     
==========================================
  Files         424      425       +1     
  Lines       25008    25146     +138     
==========================================
+ Hits        14145    14257     +112     
- Misses       8941     8955      +14     
- Partials     1922     1934      +12

Flag	Coverage Δ
unittests	`56.69% <67.04%> (+0.13%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
model/flow/aggregated_signature.go	`0.00% <0.00%> (ø)`
storage/badger/blocks.go	`53.44% <0.00%> (-0.94%)`	⬇️
engine/consensus/approvals/assignment_collector.go	`64.15% <33.33%> (-0.82%)`	⬇️
engine/consensus/approvals/approval_collector.go	`76.05% <38.46%> (-5.49%)`	⬇️
engine/common/synchronization/engine.go	`65.59% <62.16%> (+9.05%)`	⬆️
module/validation/seal_validator.go	`76.85% <91.66%> (+6.37%)`	⬆️
engine/common/synchronization/request_heap.go	`100.00% <100.00%> (ø)`
...ngine/consensus/approvals/aggregated_signatures.go	`100.00% <100.00%> (ø)`
module/synchronization/core.go	`75.46% <100.00%> (ø)`
storage/badger/headers.go	`53.40% <100.00%> (-0.81%)`	⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d648e6a...a00f07e. Read the comment docs.

…hub.com/onflow/flow-go into yurii/5615-synchronization-engine-queue

durkmurder added 11 commits June 28, 2021 11:52

Added request queue for synchronization engine based on go map. Start…

f8873dd

…ed implementing queue consuming in engine

Integrated queueing logic into engine

49860bf

Updated queue and messages processors implementation for sync engine

68485cc

Godoc. Updated engine queuening logic

e3f64eb

Updated godoc

bfa722b

Implemented pipeline for processing finalization events

b4e38e5

Added subscriptions for finalization events. Updated finalization dis…

4c3e253

…tributor. Updated all nodes

Work in progress on tests

2ff939b

Updated tests

857e1a8

Linted, updated tests

7d85303

Extended tests for engine

885efc8

durkmurder requested review from AlexHentschel, jordanschalm, m4ksio, ramtinms, yhassanzadeh13 and zhangchiqing as code owners June 29, 2021 14:54

durkmurder removed request for m4ksio, ramtinms, jordanschalm and yhassanzadeh13 June 29, 2021 15:15

durkmurder assigned zhangchiqing and AlexHentschel Jun 29, 2021

Implemented new way of handling of finalization events

5fd2ff2

zhangchiqing approved these changes Jul 2, 2021

View reviewed changes

Alexander Hentschel added 2 commits July 4, 2021 21:36

minor revisions to badger.Blocks to utilize caching by block height…

2366de8

… implemented by `badger.Headers`

• in-place method processIncomingBlock;

f6b42a0

• reduced locking in `synchronization.Core`

AlexHentschel approved these changes Jul 5, 2021

View reviewed changes

AlexHentschel mentioned this pull request Jul 5, 2021

suggestions for PR 910 #926

Merged

Merge pull request #926 from onflow/alex/5615-synchronization-engine-…

2c6f3cd

…queue_-_suggestions suggestions for PR 910

durkmurder added 2 commits July 5, 2021 15:40

Apply suggestions from PR review

16e70b4

Apply suggestions from PR review

051bb59

durkmurder added 3 commits July 5, 2021 16:26

Merge branch 'master' into yurii/5615-synchronization-engine-queue

879c998

Fixed test

e75c7fd

Merge branch 'yurii/5615-synchronization-engine-queue' of https://git…

a00f07e

…hub.com/onflow/flow-go into yurii/5615-synchronization-engine-queue

durkmurder merged commit 2641774 into master Jul 5, 2021

durkmurder deleted the yurii/5615-synchronization-engine-queue branch July 5, 2021 13:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Consensus] Queue for `synchronization.Engine` #910

[Consensus] Queue for `synchronization.Engine` #910

durkmurder commented Jun 29, 2021 •

edited

zhangchiqing left a comment

zhangchiqing Jul 2, 2021

AlexHentschel left a comment •

edited

AlexHentschel Jul 2, 2021

AlexHentschel Jul 2, 2021

AlexHentschel Jul 3, 2021

codecov-commenter commented Jul 5, 2021 •

edited

	e.pendingSyncRequests = NewRequestQueue(defaultSyncRequestQueueCapacity)
	// RequestQueue deduplicates requests by keeping only one sync request for each requester.
	e.pendingSyncRequests = NewRequestQueue(defaultSyncRequestQueueCapacity)

	// processIncoming processes an incoming block, so we can take into account the
	// overlap between block IDs and heights.
	func (e Engine) processIncomingBlock(originID flow.Identifier, block flow.Block) {

	shouldProcess := e.core.HandleBlock(block.Header)
	if !shouldProcess {
	return
	}

	synced := &events.SyncedBlock{
	OriginID: originID,
	Block: block,
	}

	e.comp.SubmitLocal(synced)
	}

	// don't bother queueing anything if we're within tolerance
	if c.WithinTolerance(final, height) {
	return
	}

	return fmt.Errorf("could not send get consensus participants: %w", err)
	return fmt.Errorf("could get consensus participants at latest finalized block: %w", err)

[Consensus] Queue for synchronization.Engine #910

[Consensus] Queue for synchronization.Engine #910

Conversation

durkmurder commented Jun 29, 2021 • edited

Context

Contributions of this PR

zhangchiqing left a comment

Choose a reason for hiding this comment

zhangchiqing Jul 2, 2021

Choose a reason for hiding this comment

AlexHentschel left a comment • edited

Choose a reason for hiding this comment

Comments & Suggestions:

AlexHentschel Jul 2, 2021

Choose a reason for hiding this comment

AlexHentschel Jul 2, 2021

Choose a reason for hiding this comment

AlexHentschel Jul 3, 2021

Choose a reason for hiding this comment

codecov-commenter commented Jul 5, 2021 • edited

Codecov Report

[Consensus] Queue for `synchronization.Engine` #910

[Consensus] Queue for `synchronization.Engine` #910

durkmurder commented Jun 29, 2021 •

edited

AlexHentschel left a comment •

edited

codecov-commenter commented Jul 5, 2021 •

edited