Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pending blocks might not be submitted to DA #1548

Closed
tzdybal opened this issue Feb 16, 2024 · 3 comments · Fixed by #1568
Closed

Pending blocks might not be submitted to DA #1548

tzdybal opened this issue Feb 16, 2024 · 3 comments · Fixed by #1568
Assignees
Labels
P:high Priority: High T:bug Something isn't working va

Comments

@tzdybal
Copy link
Member

tzdybal commented Feb 16, 2024

Currently, blocks are submitted to DA via pendingBlocks queue. This queue is stored in memory, so in case of a restart of the aggregator node, the information might be lost. After the restart aggregator will continue to produce blocks, and submit those new blocks to DA, but all that were pending before the restart were lost so they will not be submitted to DA.
This causes issues since if any blocks are not in DA, full nodes are not able to sync.

Additionally:

  • blocks are safely stored in database before submission to DA
  • blocks are always pushed to DA in order (by height)
  • DA submission of multiple blocks is atomic - it's impossible to submit only part of a batch
  • rollkit can handle duplicate blocks as well as blocks out of order during sync process

Example:

  1. Last produced block height: 1010.
  2. pendingBlocks contains blocks between 1000 and 1010.
  3. Node is restarted.
  4. Block 1011 is produced and pushed to DA.
  5. Blocks between 1000 and 1010 will never be submitted to DA.
@tzdybal tzdybal added T:bug Something isn't working P:high Priority: High va labels Feb 16, 2024
@tzdybal tzdybal self-assigned this Feb 16, 2024
@tzdybal
Copy link
Member Author

tzdybal commented Feb 16, 2024

Simple and effective solution is to modify the code to ensure that height of the latest block successfully submitted to DA is persisted in store. After restart we need to continue DA submission.

On the implementation level, there are several possibilities:

  1. push blocks from store to pendingBlocks when node is (re)starting
  2. get rid of pendingBlocks and always read blocks directly from store (this seems to be more generic approach).

@arhamj
Copy link
Contributor

arhamj commented Feb 20, 2024

@tzdybal Can I pick this? Option 2 sounds better. We can have a cron-like go-routine which runs every X number of seconds and queries Y number of blocks from DB using a prefix filter. X and Y can vary based on the lag. What do you think?

@tzdybal
Copy link
Member Author

tzdybal commented Feb 20, 2024

Hi @arhamj, I'm currently tackling this issue. Thanks for understanding!

@Manav-Aggarwal Manav-Aggarwal removed the P:high Priority: High label Feb 23, 2024
tzdybal added a commit that referenced this issue Feb 27, 2024
TestPendingBlocks should fail to proof the existence of the bug.

This commit introduces a new test to check pending blocks as described in issue #1548. It mimics the behavior of a node producing blocks, stopping and restarting, then producing more blocks.
tzdybal added a commit that referenced this issue Feb 28, 2024
TestPendingBlocks should fail to proof the existence of the bug.

This commit introduces a new test to check pending blocks as described in issue #1548. It mimics the behavior of a node producing blocks, stopping and restarting, then producing more blocks.
tzdybal added a commit that referenced this issue Feb 28, 2024
TestPendingBlocks should fail to proof the existence of the bug.

This commit introduces a new test to check pending blocks as described in issue #1548. It mimics the behavior of a node producing blocks, stopping and restarting, then producing more blocks.
@tzdybal tzdybal added the P:high Priority: High label Feb 29, 2024
tzdybal added a commit that referenced this issue Feb 29, 2024
TestPendingBlocks should fail to proof the existence of the bug.

This commit introduces a new test to check pending blocks as described in issue #1548. It mimics the behavior of a node producing blocks, stopping and restarting, then producing more blocks.
tzdybal added a commit that referenced this issue Mar 6, 2024
TestPendingBlocks should fail to proof the existence of the bug.

This commit introduces a new test to check pending blocks as described in issue #1548. It mimics the behavior of a node producing blocks, stopping and restarting, then producing more blocks.
github-merge-queue bot pushed a commit that referenced this issue Mar 6, 2024
<!--
Please read and fill out this form before submitting your PR.

Please make sure you have reviewed our contributors guide before
submitting your
first PR.
-->

## Overview

This PR creates a new test to check pending blocks as described in issue
#1548. It mimics the behavior of a node producing blocks, stopping and
restarting, then producing more blocks.

Looking at commit history you can check that test was initially failing
and then was fixed.

New implementation of `pendingBlocks` is safer because its data is
persisted in store. If node is restarted, block submission to DA will
restart when it ended. The worst case scenario (very unlikely), where
`pendingBlocks` can't save information to store, results in resubmission
of blocks already submitted to DA, which is only the extra cost. The
solution ensures that all the blocks are successfully submitted to DA.

This PR is bigger than I expected, and because of that I will work on
#1524 and limiting the number
of blocks returned by `getPendingBlocks` in follow-up PR.

Resolves #1548
Resolves #457 

<!-- 
Please provide an explanation of the PR, including the appropriate
context,
background, goal, and rationale. If there is an issue with this
information,
please provide a tl;dr and link the issue. 
-->

## Checklist

<!-- 
Please complete the checklist to ensure that the PR is ready to be
reviewed.

IMPORTANT:
PRs should be left in Draft until the below checklist is completed.
-->

- [x] New and updated code has appropriate documentation
- [x] New and updated code has new and/or updated testing
- [x] Required CI checks are passing
- [x] Visual proof for any user facing features like CLI or
documentation updates
- [x] Linked issues closed with keywords


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a new mock generation command for enhanced testing
capabilities.
- Added functionality for improved block management and submission to
Data Availability layers.
- Enhanced metadata management in the store for better data handling and
retrieval.
- Implemented new testing functions and improved existing ones for
better coverage and reliability.

- **Bug Fixes**
- Fixed handling of empty block submissions to prevent errors during
block publishing.

- **Refactor**
- Refactored block management to improve code efficiency and
readability.
- Consolidated mock application creation logic in integration tests for
better maintainability.

- **Tests**
- Expanded testing suite with new tests for block submission, metadata
operations, and mock DA interactions.

- **Chores**
- Updated dependencies and imports across various files to enhance
functionality and testing.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P:high Priority: High T:bug Something isn't working va
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants