Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Consensus] Queue for synchronization.Engine #910

Merged
merged 20 commits into from Jul 5, 2021

Conversation

durkmurder
Copy link
Member

@durkmurder durkmurder commented Jun 29, 2021

dapperlabs/flow-go/issues/5615

Context

This PR implements support of inbound queue for messages that are submitted by network layer and/or other engines.
As described in issue this PR mainly implements messages queening logic as it is implemented in other consensus engines(sealing, matching, compliance). Also currently there is quite inefficient access to final protocol state, this PR contains changes for this as well.

In proposed implementation we have two separate goroutines which handle requests and responses in parallel, maybe that's not needed but I though that it's a good idea to more evenly distribute load.

Contributions of this PR

  • Implemented queening logic for synchronization engine
  • Changed how synchronization engine access latest finalized block
  • Optimized concurrent access to shared data
  • Extended tests
  • IMPORTANT: implemented new way to deliver finalization events on node level, initialization of every node has been changed.

Copy link
Member

@zhangchiqing zhangchiqing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good


// setupRequestMessageHandler initializes the inbound queues and the MessageHandler for UNTRUSTED requests.
func (e *Engine) setupRequestMessageHandler() {
e.pendingSyncRequests = NewRequestQueue(defaultSyncRequestQueueCapacity)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
e.pendingSyncRequests = NewRequestQueue(defaultSyncRequestQueueCapacity)
// RequestQueue deduplicates requests by keeping only one sync request for each requester.
e.pendingSyncRequests = NewRequestQueue(defaultSyncRequestQueueCapacity)

Alexander Hentschel added 2 commits July 4, 2021 21:36
• reduced locking in `synchronization.Core`
Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments & Suggestions:

  • It would be great to fix this performance bottleneck:
    block, err := e.blocks.ByHeight(height)
    • In contrast to badger.Headers, which has a cache for heights, badger.blocks does not have such a cache and instead always hits the data base.
  • I think we could inline method
    // processIncoming processes an incoming block, so we can take into account the
    // overlap between block IDs and heights.
    func (e *Engine) processIncomingBlock(originID flow.Identifier, block *flow.Block) {
    shouldProcess := e.core.HandleBlock(block.Header)
    if !shouldProcess {
    return
    }
    synced := &events.SyncedBlock{
    OriginID: originID,
    Block: block,
    }
    e.comp.SubmitLocal(synced)
    }
    as it is only used once.
  • We might be able to reduce lock congestion in Core. HandleHeight :
    • we always lock right away, but on the happy path when the node is up to date, this code would return right away:
      // don't bother queueing anything if we're within tolerance
      if c.WithinTolerance(final, height) {
      return
      }
      . WithinTolerance is fully concurrency safe without any locks (we also call it externally, here). Hence, we could move the lock further down into the if statement, where we actually update Core's state.

Implemented these suggestions☝️ in my PR #926.

The suggestions below (code comments) are not implemented in PR 926.

engine/common/synchronization/engine.go Outdated Show resolved Hide resolved
Comment on lines 65 to 77

// we keep reducing the cache size until we are at limit again
for len(q.requests) >= int(q.limit) {

// eject first element using go map properties
var key flow.Identifier
for originID := range q.requests {
key = originID
break
}

delete(q.requests, key)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we could write this much more concisely:

Suggested change
// we keep reducing the cache size until we are at limit again
for len(q.requests) >= int(q.limit) {
// eject first element using go map properties
var key flow.Identifier
for originID := range q.requests {
key = originID
break
}
delete(q.requests, key)
}
// we keep reducing the cache size until we are at limit again
for overCapacity := len(q.requests) - int(q.limit); overCapacity >= 0; overCapacity -- {
for originID := range q.requests {
delete(q.requests, originID)
}
}

Did I miss something?

Comment on lines 32 to 34
// first try to eject if we are at max capacity, we need to do this way
// to prevent a situation where just inserted item gets ejected
q.reduce()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following scenario:

  • the queue is at max capacity
  • it contains already an element from Origin A
  • we are attempting to put a new element from A into the queue

The result is that:

  • the old message from A is overwritten by the new one (desired).
  • but we also ejected some other message (which seems not very intuitive)

Suggestion:

Suggested change
// first try to eject if we are at max capacity, we need to do this way
// to prevent a situation where just inserted item gets ejected
q.reduce()
if _, found := q.requests[message.OriginID]; !found {
// if no message from the origin is stored, make sure we have room to store the new message:
q.reduce()
}

engine/common/synchronization/request_queue.go Outdated Show resolved Hide resolved
filter.Not(filter.HasNodeID(e.me.NodeID())),
))
if err != nil {
return fmt.Errorf("could not send get consensus participants: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("could not send get consensus participants: %w", err)
return fmt.Errorf("could get consensus participants at latest finalized block: %w", err)

engine/common/synchronization/engine.go Show resolved Hide resolved
engine/common/synchronization/engine.go Outdated Show resolved Hide resolved
engine/common/synchronization/engine.go Outdated Show resolved Hide resolved
engine/common/synchronization/engine.go Outdated Show resolved Hide resolved
…queue_-_suggestions

suggestions for PR 910
@codecov-commenter
Copy link

codecov-commenter commented Jul 5, 2021

Codecov Report

Merging #910 (a00f07e) into master (2b9772a) will increase coverage by 0.13%.
The diff coverage is 67.04%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #910      +/-   ##
==========================================
+ Coverage   56.56%   56.69%   +0.13%     
==========================================
  Files         424      425       +1     
  Lines       25008    25146     +138     
==========================================
+ Hits        14145    14257     +112     
- Misses       8941     8955      +14     
- Partials     1922     1934      +12     
Flag Coverage Δ
unittests 56.69% <67.04%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
model/flow/aggregated_signature.go 0.00% <0.00%> (ø)
storage/badger/blocks.go 53.44% <0.00%> (-0.94%) ⬇️
engine/consensus/approvals/assignment_collector.go 64.15% <33.33%> (-0.82%) ⬇️
engine/consensus/approvals/approval_collector.go 76.05% <38.46%> (-5.49%) ⬇️
engine/common/synchronization/engine.go 65.59% <62.16%> (+9.05%) ⬆️
module/validation/seal_validator.go 76.85% <91.66%> (+6.37%) ⬆️
engine/common/synchronization/request_heap.go 100.00% <100.00%> (ø)
...ngine/consensus/approvals/aggregated_signatures.go 100.00% <100.00%> (ø)
module/synchronization/core.go 75.46% <100.00%> (ø)
storage/badger/headers.go 53.40% <100.00%> (-0.81%) ⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d648e6a...a00f07e. Read the comment docs.

@durkmurder durkmurder merged commit 2641774 into master Jul 5, 2021
@durkmurder durkmurder deleted the yurii/5615-synchronization-engine-queue branch July 5, 2021 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants