Skip to content

hare certification: sync flow is not working yet tests passed #3475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
countvonzero opened this issue Aug 22, 2022 · 1 comment
Closed

hare certification: sync flow is not working yet tests passed #3475

countvonzero opened this issue Aug 22, 2022 · 1 comment
Assignees
Labels

Comments

@countvonzero
Copy link
Contributor

Description

two issues in the implementation:

  • early gossip messages are not handled properly, causing the following two logs
{"L":"ERROR","T":"2022-08-22T10:51:10.345-0400","N":"3dd22.blockCert    ","M":"failed to process certify message gossip","node_id":"3dd225dd843c54f9e711d71db4d013f283aca653f55632a1293f393b7cd6d8dc","module":"blockCert","requestId":"d47b5112-7630-49c8-b772-776accb6d6e8","errmsg":"unexpected lid/bid","name":"blockCert"}

{"L":"WARN","T":"2022-08-22T10:51:10.274-0400","N":"ec05f.blockCert    ","M":"layer not registered for cert","node_id":"ec05f85c6f081b7dcf0842a855aa37add882bfcaf8b94ee7463d2f17c2f6f6e1","module":"blockCert","requestId":"2ee0f936-bd18-4751-923f-1f148b52c2fc","layer_id":11,"block_id":"65a664148b"}
  • sync'ed certificate is not actually verified, due to the layer/block check
{"L":"WARN","T":"2022-08-22T09:30:01.659-0400","N":"3d88b.layerFetcher ","M":"failed to handle certificate","node_id":"3d88bc875aabe5c1b59803b5d814d22886177bf5434693f99e670da07ca053e6","module":"layerFetcher","sessionId":"76ec9ad8-472e-4a7c-baa4-2c3ea7585b2a","layer_id":15,"peer":"12D3KooWSaj7fTVv33Be6BbXUdQqxxbuBXy9qA8VToQfq4EmE1qz","errmsg":"cert msg too old","name":"layerFetcher"}

another important question: why do all system tests passed?

@countvonzero countvonzero self-assigned this Aug 22, 2022
@countvonzero
Copy link
Contributor Author

another important question: why do all system tests passed?

mystery solved:

the expected behavior is that if a certificate is not synced from all peers, sync for that layer will fail. so it's expected that in TestAddNode, the last 2 nodes added will fail to catch up.

but here

c.clients = nil
c.clients = append(c.clients, bootnodes...)
c.clients = append(c.clients, clients...)
c.smeshers = len(clients)

the cluster smeshers have been reduced to only 2+bootnodes in TestAddNodes. the 2 nodes "added" also turned out to be existing nodes that are already synced. hence the passing tests.

@pigmej pigmej moved this to 🏗 In progress in Dev iterations Aug 24, 2022
@pigmej pigmej moved this to 🏗 In progress in Dev team kanban Aug 26, 2022
bors bot pushed a commit that referenced this issue Aug 29, 2022
## Motivation
<!-- Please mention the issue fixed by this PR or detailed motivation -->
Closes #3475

<!-- `Closes #XXXX, closes #XXXX, ...` links mentioned issues to this PR and automatically closes them when this it's merged -->

## Changes
<!-- Please describe in detail the changes made -->
- block certifier accept early CertifyMessage
- syncer: 
  - separate fetching layer data and layer opinions (only certificate for now)
  - only fetch certificates from peers within hdist layers of processed layer. processed layer is tortoise's last layer
- fix sql bug where certificate is retrieved correctly and not overwritten once set
- systest
  - fixed systests TestAddNode so that the test fails with the certificate sync failures
  - make systests TestFailedNodes to reassess layer hashes from nodes. because blocks are applied optimistically, the applied block can be changed after tortoise verify a layer
bors bot pushed a commit that referenced this issue Aug 29, 2022
## Motivation
<!-- Please mention the issue fixed by this PR or detailed motivation -->
Closes #3475

<!-- `Closes #XXXX, closes #XXXX, ...` links mentioned issues to this PR and automatically closes them when this it's merged -->

## Changes
<!-- Please describe in detail the changes made -->
- block certifier accept early CertifyMessage
- syncer: 
  - separate fetching layer data and layer opinions (only certificate for now)
  - only fetch certificates from peers within hdist layers of processed layer. processed layer is tortoise's last layer
- fix sql bug where certificate is retrieved correctly and not overwritten once set
- systest
  - fixed systests TestAddNode so that the test fails with the certificate sync failures
  - make systests TestFailedNodes to reassess layer hashes from nodes. because blocks are applied optimistically, the applied block can be changed after tortoise verify a layer
bors bot pushed a commit that referenced this issue Aug 29, 2022
## Motivation
<!-- Please mention the issue fixed by this PR or detailed motivation -->
Closes #3475

<!-- `Closes #XXXX, closes #XXXX, ...` links mentioned issues to this PR and automatically closes them when this it's merged -->

## Changes
<!-- Please describe in detail the changes made -->
- block certifier accept early CertifyMessage
- syncer: 
  - separate fetching layer data and layer opinions (only certificate for now)
  - only fetch certificates from peers within hdist layers of processed layer. processed layer is tortoise's last layer
- fix sql bug where certificate is retrieved correctly and not overwritten once set
- systest
  - fixed systests TestAddNode so that the test fails with the certificate sync failures
  - make systests TestFailedNodes to reassess layer hashes from nodes. because blocks are applied optimistically, the applied block can be changed after tortoise verify a layer
bors bot pushed a commit that referenced this issue Aug 29, 2022
## Motivation
<!-- Please mention the issue fixed by this PR or detailed motivation -->
Closes #3475

<!-- `Closes #XXXX, closes #XXXX, ...` links mentioned issues to this PR and automatically closes them when this it's merged -->

## Changes
<!-- Please describe in detail the changes made -->
- block certifier accept early CertifyMessage
- syncer: 
  - separate fetching layer data and layer opinions (only certificate for now)
  - only fetch certificates from peers within hdist layers of processed layer. processed layer is tortoise's last layer
- fix sql bug where certificate is retrieved correctly and not overwritten once set
- systest
  - fixed systests TestAddNode so that the test fails with the certificate sync failures
  - make systests TestFailedNodes to reassess layer hashes from nodes. because blocks are applied optimistically, the applied block can be changed after tortoise verify a layer
bors bot pushed a commit that referenced this issue Aug 29, 2022
## Motivation
<!-- Please mention the issue fixed by this PR or detailed motivation -->
Closes #3475

<!-- `Closes #XXXX, closes #XXXX, ...` links mentioned issues to this PR and automatically closes them when this it's merged -->

## Changes
<!-- Please describe in detail the changes made -->
- block certifier accept early CertifyMessage
- syncer: 
  - separate fetching layer data and layer opinions (only certificate for now)
  - only fetch certificates from peers within hdist layers of processed layer. processed layer is tortoise's last layer
- fix sql bug where certificate is retrieved correctly and not overwritten once set
- systest
  - fixed systests TestAddNode so that the test fails with the certificate sync failures
  - make systests TestFailedNodes to reassess layer hashes from nodes. because blocks are applied optimistically, the applied block can be changed after tortoise verify a layer
@bors bors bot closed this as completed in af21bf3 Aug 29, 2022
Repository owner moved this from 🏗 In progress to ✅ Done in Dev iterations Aug 29, 2022
@pigmej pigmej moved this from 🏗 Doing to ✅ Done in Dev team kanban Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant