Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Scalability Design Document] Design document and plan to test SMT scalability #23

Open
1 of 5 tasks
Olshansk opened this issue Sep 9, 2023 · 2 comments
Open
1 of 5 tasks
Labels
documentation Improvements or additions to documentation

Comments

@Olshansk
Copy link
Member

Olshansk commented Sep 9, 2023

Objective

Put together a design document and a plan to determine the scalability, efficiency and required improvements we need to make to the SMT to make v1 as scalable as possible.

Goals

  • Establish the limits of our current SMT (read, write, storage)
  • Determine the limits for leaf sizes to accommodate things like websockets
  • Explore alternative key-value stores to badgerdb and compare insert/retrieval speeds
  • Investigate the potential benefits of extending the prefix trie from a binary tree to a k-ary tree
  • Understand the implications of proof size on data availability costs and estimate our costs

Deliverable

NOTE: The deliverables below are just a starting point

A design document / plan that'll address the following in different steps:

  • Benchmarking data on the current SMT's insert/retrieval speeds and proof sizes. For example:
    - Can we insert 0.5/1/10 million leaves in 10/60/200 seconds?
    - What are the retrieval speeds depending on the tree size?
  • A report on the feasible leaf sizes for different protocols.
    - If we plan to support larger unhashed values (e.g. for websockets), is there a limit on the leaf size?
    - Do we need to split it into multiple smaller leaves as data is streamed?
  • Comparative analysis of different key-value stores alongside badgerdb, focusing on insert/retrieval speeds.
    - Could/should we add adapters for leveldb, gorocksdb, boltdb or others and rerun the benchmarks above?
  • Research and consideration if we should look into implementing k-ary support instead of just binary support in this tree?
  • Detailed breakdown of proof sizes and strategies to map this onto data availability costs.
    - How big are proofs for a tree of 0.5/1/10 million leaves?
    - How long should our sessions be based on expected volume per servicer per session?
    - What will the data availability cost on Celestia or other DA layers?
    d-js.github.io/mermaid/) diagrams

Creator: @Olshansk
Co-Owners: @h5law

@Olshansk Olshansk added the documentation Improvements or additions to documentation label Sep 9, 2023
@h5law
Copy link
Collaborator

h5law commented Sep 9, 2023

Issues that are relavent to this:

Potentially also (regarding optimising for bulk inserts?)

@Olshansk
Copy link
Member Author

Capturing offline discussion...

Focus on benchmarking and understand the cost (time, insert, read, proof size, DA cost, etc...) of the current state of the tree and use that as a starting point to prioritize future features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: 📋 Backlog
Development

No branches or pull requests

2 participants