Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix matching engine #1294

Merged
merged 2 commits into from Sep 20, 2021
Merged

Fix matching engine #1294

merged 2 commits into from Sep 20, 2021

Conversation

zhangchiqing
Copy link
Member

@zhangchiqing zhangchiqing commented Sep 15, 2021

This PR fixes a bug in matching engine that we didn't request receipt where we should and caused sealing to halt

@codecov-commenter
Copy link

codecov-commenter commented Sep 15, 2021

Codecov Report

Merging #1294 (07b3944) into master (6b18761) will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1294      +/-   ##
==========================================
- Coverage   54.83%   54.80%   -0.03%     
==========================================
  Files         502      502              
  Lines       31852    31848       -4     
==========================================
- Hits        17466    17455      -11     
- Misses      12029    12037       +8     
+ Partials     2357     2356       -1     
Flag Coverage Δ
unittests 54.80% <ø> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
engine/consensus/matching/core.go 50.29% <ø> (+0.59%) ⬆️
...sus/approvals/assignment_collector_statemachine.go 40.38% <0.00%> (-6.74%) ⬇️
module/mempool/epochs/transactions.go 94.73% <0.00%> (-5.27%) ⬇️
...ngine/common/synchronization/finalized_snapshot.go 68.75% <0.00%> (-4.17%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b18761...07b3944. Read the comment docs.

Comment on lines 321 to 322
// The follow optimization doesn't work, because sealing requires 2 matching receipts.
// Sealing would still halt even if there is a candidate seal but no matching receipts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should completely remove

  • // set of blocks for which we have a candidate seal:
    blocksWithCandidateSeal := make(map[flow.Identifier]struct{})
    for _, s := range c.seals.All() {
    blocksWithCandidateSeal[s.Seal.BlockID] = struct{}{}
    }
  • // if we have already a candidate seal, we skip any further processing
    // CAUTION: this is not BFT, as the existence of a candidate seal
    // does _not_ imply that all parent results are sealable.
    // TODO: update for full BFT
    // The follow optimization doesn't work, because sealing requires 2 matching receipts.
    // Sealing would still halt even if there is a candidate seal but no matching receipts.
    // if _, hasCandidateSeal := blocksWithCandidateSeal[blockID]; hasCandidateSeal {
    // continue
    // }

I don't think it is worth keeping the commented-out code

We have an analogous performance optimization that is compatible with our current safety-logic of requiring two consistent execution receipts:

// We require at least 2 consistent receipts from different ENs to seal a block. If don't need to fetching receipts.
// CAUTION: This is a temporary shortcut incompatible with the mature BFT protocol!
// There might be multiple consistent receipts that commit to a wrong result. To guarantee
// sealing liveness, we need to fetch receipts from those ENs, whose receipts we don't have yet,
for _, receiptsForResult := range receipts.GroupByResultID() {
if receiptsForResult.GroupByExecutorID().NumberGroups() >= 2 {
continue HEIGHT_LOOP
}
}

It also includes the respective comment about BFT safety.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

@zhangchiqing zhangchiqing merged commit 74c648c into master Sep 20, 2021
@zhangchiqing zhangchiqing deleted the leo/fix-matching-engine branch September 20, 2021 17:42
durkmurder added a commit that referenced this pull request Sep 21, 2021
* wip

* update mutator tests

* update epoch lockup tests

* update committee leader selection tests

* update comments and consolidated code a bit

* update sampling of entities

* minor polishing

* adding the peer manager back to the staked AN which supports the unstaked AN; adding option to disable connection pruning

* consolidated EECC logic in Protocol State mutator

* changing DisableConnectionPruning to  WithConnectionPruning

* Update engine/collection/synchronization/engine.go

Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>

* Replaced return with continue

* added access-tests to local integration tests

* Update state/protocol/badger/mutator.go

* copy epoch status when in EECC

* Replaced log for fatal

* adjust numbers

* Remove temp local cadence version

* define new staked topic validator

* Rename ExtractPeerID

* Added test

* address comments

* Add new methods to identity provider

* modify api

* update fixed_provider

* address some comments

* move messagesigningid

* Fix Middleware test

* fix topic validator test

* Move key translation to its own package, break circular dependency

* Update libp2pUtils.go

* Update contLoadGenerator.go

* Update middleware_test.go

* add error logging for topic validator unregister

* rename GetIdentity -> Identity

* Update middleware_test.go

* fix lint

* adding comments to the topic validator test (#1253)

* adding a comment to middelware.Subscribe describing the new change to add topic validators for stake nodes

* goimports

* making middleware.topologyPeers private again

* Update state/protocol/badger/mutator.go

Co-authored-by: Leo Zhang <zhangchiqing@gmail.com>

* fix formatting

* Update engine.go

* update Cadence to commit 283fa4b180bf44abe9b46455ef0028ab5cb04226

* makes time to live exportable

* activates dns resolver on libp2p node

* adds dns ttl to scaffold

* adds dns cache ttl to base config

* adds flag for dns node cache ttl

* refactors scaffold with base config for dns cache

* update Cadence to commit eeb3fa23d8754367f93e5f725acd7e7f50a4579b

* adds dns cache for staked access node

* Sync with latest cadence changes

* adds dns cache for unstaked access node

* refactors tests with resolver

* fixes an error

* Dump removed contract codes to a file

* Fix nested type infering

* Invert the storage cleanup flag

* Refactor code

* add separate fallback-explicit EpochForView method

* refactors builder

* refactors resolver invocations

* Fix storage migration V5 tests

* fix mocked function

* Update to Cadence v0.19.0

* stripping out addresses from the root protocol snapshot for the unstaked AN

* Fix fvm payloads getting skipped

* Use cadence v0.19.0

* Use cadence v0.19.0 for intergration

* Update storage migration V4 to latest cadence changes

* Fix tests

* Remove previous storage migration V4 tests

* Fix linter errors

* skipping all validations which involve flow.Identity for the unstaked AN

* Removing unecessary fmt.Println

* Fix --no-migration flag for state extract

* Extracted core from proposal engine

* Renamed proposal to compliance engine

* Updated metrics. Updated compliaence engine for collection nodes to follow new design

* fix bug with tracer sensitivity

* add isSampled flag

* minor fix

* update mock files

* mock update

* clean up

* clean up

* hot fix

* clean up

* fix tests

* fix tests

* fix test

* test fix

* handle empty collection case

* test fix

* Fixed tests for core

* Updated core implementation

* Updated majority of engine tests

* Fixed compilation, linted

* Misc updated to godoc. Linted. Updated labels

* update Cadence to commit 2759300ebf616214cf8a04287f4379bb666b7c98

* Introduced LifecycleManager into hotstuff event loop, compliance engines

* Add block final state commitment

* adding an extra skipNwAddressRelatedValidations parameter to skip validations which use network addresses for the unstaked access node

* Update request_handler.go

* add random delay to periodic peer update thread

We are observing synchronized goroutine spikes causing consensus
finalization issues on large committees on benchet, with the same period
as this peer update process. This adds a random delay before spawning
the thread which requests updates, so that updates are not synchronised.

* update core-contracts version 0.7.7->0.7.9

This new version includes a critical bug fix for the FlowClusterQC
contract

* exit goroutine launched periodically while in initial delay

* add test case for stopping during delay

* add new contract contructor argument

* update state commitments

* tests

* incorporating review comments

* adding comments

* fixing the integration test framework

* minor update to goDoc

* refactoring and removing the sentinel NetworkAddressError

* Improve error handling in access node

* Remove unnecessary error wrapping

* uncommenting code to remove IDs from custers

* reverting previous change to result ID verification

* handle MarkFlagRequired errors

* add testing.T to machine account fixture

* add log hook test utility

* extend machine account tests to check log warn cases

* Updated error handling for consensus engine

* adding unit test

* Update engine_test.go

* update Cadence to commit 05fdc6cb695dca1ccc17b26eccb1d123a2049556

* update Cadence to commit 3d1926c0c05a7a661e9fb192d0bba661afc0d4c1

* Jaeger version update

* make tidy

* don't process empty spans

* go mod clean up

* update Cadence to commit 16d477f6d7d63ae9048f955e5ec22f52337f915c

* small optimization

* [Verification] Adds telemetry for outstanding chunk data pack requests (#1290)

* adds max chunk data pack attempts metric

* adds set max chunk data pack attempts at requester to interface

* generates mocks

* wires in metrics

* fixes TestCompleteRequestingUnsealedChunkLifeCycle

* refactors a log

* fixes TestRequestPendingChunkSealedBlock

* fixes TestRequestPendingChunkSealedBlock_Hybrid

* fixes TestDispatchingRequests_Hybrid(

* adds new metric to demo examples

* fixes tests

* refactors metric to track next unsealed height chunk attempts

* Update module/metrics/verification.go

Co-authored-by: Leo Zhang <zhangchiqing@gmail.com>

* refactors metrics name

* refactors metrics name

Co-authored-by: Leo Zhang <zhangchiqing@gmail.com>

* fix timetracking

* clean up exec node traces

* cleanup

* reduce nesting

* performance: cache the ID for Header entities (#1279)

* performance: cache the ID for Header entities

* performance: identify unit tests that have race conditions.

This has fixes for unit tests that were failing due to race conditions
that became apparent when the cache sped up the MakeID() call.

* performance:  adjustments for race condition tests and thread-safety

* performance:  fix lint problem

* Update compactor_test.go

* Update compactor_test.go

* Update cluster_switchover_test.go

* Update compactor_test.go

* Update cluster_switchover_test.go

* Update engine_test.go

* Update header.go

* Update header.go

* Update compactor_test.go

* performance: adjusted this file for linter

* trigger GitHub actions

* trigger GitHub actions

* trigger GitHub actions

* trigger GitHub actions

Co-authored-by: Jeff Winkler <jeff.winkler@dapperlabs.com>

* lower emergency sealing threshold (#1301)

* make tidy

* Update model/flow/identifier.go

Co-authored-by: Leo Zhang <zhangchiqing@gmail.com>

* applying PR comments

* refactor sampling metod

* hot fix

* hot fix

* Revert "refactor sampling metod"

This reverts commit edcd95b.

* rename save State

* cleanup unnecesary data

* add chainID as a tag on root span

* Update model/flow/identifier.go

Co-authored-by: Leo Zhang <zhangchiqing@gmail.com>

* fix matching engine (#1294)

* hot fix

* make tidy

Co-authored-by: vishal <1117327+vishalchangrani@users.noreply.github.com>
Co-authored-by: Ramtin M. Seraj <ramtinms@users.noreply.github.com>
Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>
Co-authored-by: Alexander Hentschel <alex.hentschel@axiomzen.co>
Co-authored-by: ramtinms <ramtin@axiomzen.co>
Co-authored-by: Yurii Oleksyshyn <yuraolex@gmail.com>
Co-authored-by: Misha Rybalov <gomisha>
Co-authored-by: Supun Setunga <supun.setunga@gmail.com>
Co-authored-by: Simon Zhu <simon.zsiyan@gmail.com>
Co-authored-by: François Garillot <francois.garillot@dapperlabs.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Kan Zhang <kan@axiomzen.co>
Co-authored-by: Kay-Zee <Kay-Zee@users.noreply.github.com>
Co-authored-by: Yahya <yahya@dapperlabs.com>
Co-authored-by: SupunS <SupunS@users.noreply.github.com>
Co-authored-by: Yahya Hassanzadeh <yhassanzadeh13@ku.edu.tr>
Co-authored-by: Bastian Müller <bastian@axiomzen.co>
Co-authored-by: Maks Pawlak <120831+m4ksio@users.noreply.github.com>
Co-authored-by: Peter Argue <89119817+peterargue@users.noreply.github.com>
Co-authored-by: Janez Podhostnik <janez.podhostnik@gmail.com>
Co-authored-by: janezpodhostnik <janezpodhostnik@users.noreply.github.com>
Co-authored-by: turbolent <turbolent@users.noreply.github.com>
Co-authored-by: Jeff Winkler <jwinkler2083233@users.noreply.github.com>
Co-authored-by: Jeff Winkler <jeff.winkler@dapperlabs.com>
Co-authored-by: Janez Podhostnik <67895329+janezpodhostnik@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants