Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔷 [ProjectTracking] Expand GCS-based State Sync to allow Decentralized transfer of parts #9575

Open
2 of 8 tasks
Sarah-NEAR opened this issue Sep 25, 2023 · 12 comments
Open
2 of 8 tasks

Comments

@Sarah-NEAR
Copy link

Sarah-NEAR commented Sep 25, 2023

Goals

Background

State Sync equips validator nodes with State data they need in order to produce blocks. Without it, nodes need to get the state data from outside the chain (e.g. from an S3 snapshot) and constantly spend effort to keep the state up to date with the chain. Decentralised State Sync is the second part of the effort, building a data sharing overlay between the network nodes. It provides a scalable and decentralised way of transferring state parts between nodes.

Why should NEAR One work on this

State Sync unblocks two features:

  • It allows nodes to track a subset of the shards (e.g. chunk-only producers can track a single shard). Since nodes are able to get the state data they need from the network, they no longer required to track all of it locally. Tracking a single shard is one of the main pillars of chain scalability.
  • It allows validators and RPC operators to spin-up nodes without relying on the S3 snapshots.

What needs to be accomplished

  1. Largely, for Decentralised State Sync we need to transfer the generated state parts using an architecture similar to the BitTorrent one.
  2. We need to optimise the way State Snapshots are generated. Today they take a large, unpredictable amounts of space. The plan for mitigating this is the following: as soon as the a snapshot is open, we need to copy its FlatState to a new DB and after that we can drop the snapshot.

Main use case

  • Eliminate a centralized resource (GCS) from the chain scalability architecture, and reduce the cost of State Sync by lowering the operational costs.
  • Unblock easier spin-up of nodes, without relying on S3 snapshots. Additional work (EpochSync) is required for this to be achieved.

Links to external documentations and discussions

Additional resources will be added here when they become available.

Estimated effort

Engineers assigned: @VanBarbascu, @marcelo-gonzalez and @saketh-are.

Initial effort estimate is about 6-8 PM (person months). Currently remaining effort is presented in the latest comment of this issue.

Assumptions

There are no specific assumptions that this project is making.

Pre-requisites

N/A

Out of scope

N/A

Task list:

For next release

  1. Near Core Node
    marcelo-gonzalez
  2. Near Core Node
    VanBarbascu marcelo-gonzalez
    saketh-are
  3. 1 of 3
    Near Core Node
    saketh-are
  4. Node
    saketh-are
  5. Node
  6. Node
    VanBarbascu
  7. Near Core Node
    marcelo-gonzalez

Bugs

@gmilescu gmilescu added the Node Node team label Oct 19, 2023
@gmilescu gmilescu changed the title 🔷 Expand GCS-based State Sync to allow Decentralized transfer of parts 🔷 [DSS] Expand GCS-based State Sync to allow Decentralized transfer of parts Nov 2, 2023
@gmilescu
Copy link

2023-11-27

  • Work still in progress
  • Two large items in development: refactoring the actors and implementing peer selection
  • Work is split between Node and Core
  • Broadcasting shards included in a snapshot is complete.

@VanBarbascu
Copy link
Contributor

2023-12-19

  • Work in progress
  • Items in development:
    • Refactoring code to allow one actor per shard
    • Peer selection
    • Dedicated connection pool for state parts messages
  • We have decided to enable DSS after resharding and we aim to have it working in 1.38

@gmilescu gmilescu changed the title 🔷 [DSS] Expand GCS-based State Sync to allow Decentralized transfer of parts 🔷 [ProjectTracking] Expand GCS-based State Sync to allow Decentralized transfer of parts Jan 18, 2024
@gmilescu
Copy link

gmilescu commented Jan 26, 2024

2024-01-26

  • Status: Coding work is in progress, just completed peer selection.
  • Remaining tasks: Mange opening connections to the selected peers and do integration testing.
  • Remaining effort: The remaining efort is about 3-4 engineering weeks to be ready for including decentralized state sync into 1.38 finish testing.
  • Blockers: None
  • Release date: We aim to include Decentralised State Sync in the 1.38 release.
  • Note: work on this project was deprioritised until early February to make room for fixing single-shard tracking via GCS-based State Sync for Statelessnet and Stake Wars IV.

@VanBarbascu
Copy link
Contributor

First release will work for current state of mainnet where nodes track all shards.
As mentioned in the previous update, we had no active progress on this project this week.

  • Remaining tasks until first release: Start new connections to download the parts & header from peers. Refactor the state snapshot storage.

@VanBarbascu
Copy link
Contributor

VanBarbascu commented Feb 8, 2024

Over the past week we focused on enabling shingle shard tracking via GCS state sync and DSS work was postponed.

We are now focusing on setting up a new connection to a peer for each part we need to download.
We are also working on optimizing the state snapshot to reduce the strain on the memory due to compaction.

  • This optimization was not initially planned for DSS work but the issues we had with state dumping nodes require a mitigation that naturally lands with the DSS. The end goal is to save 100s of GB of storage that state snapshot currently requires.

The rest of the plan until release includes:

  • set rate limiting for incoming and outgoing requests
  • add monitoring
  • fine grain controls through configs
  • integration testing

Estimated effort is 1 month so we continue planning to releaser it in 1.38.

@VanBarbascu
Copy link
Contributor

VanBarbascu commented Feb 16, 2024

Over the past week we decided that the node will manage the state sync connections in a new lightweight pool. They will be short lived connections and they will be closed after every part exchanged. On the requesting side, we will use raw connections without a handshake and the first message will be a state sync heard/part request.

We will increase the priority of the fixing state snapshots, to make sure it lands in time to be shipped with DSS.

We encountered an additional effort to implement the state sync peer connection manager, and this will add two additional weeks to the timeline, translating into an estimated production date of end of March. The extended timeline may push this beyond the 1.38 estimated date.

The rest of the plan remains the same:

  • set rate limiting for incoming and outgoing requests
  • add monitoring
  • fine grain controls through configs
  • integration testing

@VanBarbascu
Copy link
Contributor

VanBarbascu commented Feb 24, 2024

During the previous week, our focus was on refining the connection between peers for requesting and serving parts. However, this work is still in progress as I was on call for the week.

While we haven't made any advancements in resolving the state snapshot compaction issue, we did identify another problem with the existing implementation. If compaction is enabled on the node, it will crash and fail to clear the generated state parts for the last epoch. Consequently, this leads to a gradual reduction in available storage space on the data partition over time.

As mentioned last week, we encountered an additional effort to implement the state sync peer connection manager (we initially believed that the raw connections are managed in a diferent pool but this is not true. In this case, we need to handle the lifetime of the connection on the serving side as well by implementing the TIER3 pool for short lived connections), and based on alignment with Saketh this will add two additional weeks to the timeline, translating into an estimated ready for production date of end of March. We are looking into the 1.38 release timeline to see what options we have, and we plan allocate more engineering bandwidth in March to reduce the additional time.

The remaining work items are:

  • set rate limiting for incoming and outgoing requests
  • add monitoring
  • fine grain controls through configs
  • integration testing

@bowenwang1996
Copy link
Collaborator

bowenwang1996 commented Feb 27, 2024

@VanBarbascu will this be included in the 1.38 release or not?

@VanBarbascu
Copy link
Contributor

VanBarbascu commented Mar 4, 2024

2024-03-03

Last week we completed implementation of the connection handling on the server side.

This week we plan to implement rate limiting for incoming connections, and establishing connections to request parts from peers. Additionally, we will begin work on the fix for memory leak that occurs during state snapshots.

DSS project timeline remains unchanged compared to last week: code complete and testing done by the end of March. Since the nearcore release schedule is moving back to strict timelines, we will miss 1.38 and we plan to release DSS on mainnet with version 1.39 (branch cut 2024-04-15).

Remaining work includes:

  • fine grain controls in configs
  • monitoring
  • integration testing

@VanBarbascu
Copy link
Contributor

VanBarbascu commented Mar 11, 2024

2024-03-11

Refactoring of the state sync components is currently in progress. We're in the process of integrating the new peer selection mechanism, which relies on state snapshot host gossip. Regarding the status of the state snapshot fix, it's being actively worked on.

The focus on (DSS) will be reduced for this week until we roll out resharing on the mainnet. As a result, the timeline is expected to be extended by approximately 1.5 weeks, pushing the estimated completion date to the first week of April.

Remaining work includes:

  • fine grain controls in configs
  • monitoring
  • integration testing

@VanBarbascu
Copy link
Contributor

2024-04-01

We fixed the state snapshot size bug and now the snapshot is no longer prohibitively expensive to keep.
Due to the release schedule changes, we reduced the focus on DSS and shifted toward addressing mainnet congestion.

We will resume work on DSS next week with the estimate completion date of end of April.

@walnut-the-cat
Copy link
Contributor

We will resume work on DSS next week with the estimate completion date of end of April.

Is this still accurate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests

6 participants