-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wip: DataApi
trait for querying/inserting data
#29
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having functions like this for every type and/or table will make the data model very inflexible/hard to change.
Imo we should look at just having some interface for reading, writing and iterating on types that implement e.g. Encode
/Decode
.
Obviously they would need some sort of identifier like a table ID.
The abstraction layer here would just mostly decide where to read and write the data to and how. That way the surface area for the databases is small, and the surface area for types being written/read from the database is also small (just impl Encode
/Decode
)
We also still need an abstraction for the notion of transactions though, even if the underlying database does not use transactions
This layer is above that though. If I want an
That's what I am going for here. Why do you think that's not whats happening ? |
I think it should just be a trait that is generic over anything that is If it was more generic, it would be easier to implement for new databases/stores (the extreme case would be 2 functions although that is unlikely). If we want to add a new table/type, we need to add new functions to this trait + implement other traits for the type etc. + specify the table in the tables file and so on. If it was generic, this overhead (I think) would be lowered since it's 1 less thing to implement Although I guess there is something to be said for per-store codecs... I would imagine memdb and staticfile would represent their data differently, maybe one is compressed and the other is not. Not entirely sure how that would be solved - maybe some wrapper types that specify how the data should ideally be encoded? See e.g the Edit: Another example, this one is a fully typed "high level" (not that much) wrapper around LMDB (which MDBX is based on) https://docs.rs/heed/latest/heed/. This one also sort of uses wrapper types (e.g. |
Yes, as suggested on discord, this trait could be split into smaller ones.
I mean, it really depends on what kind of type for what purpose. If it's an index, you probably wouldn't need to, since it would be abstracted away anyway. I think what I'm having trouble grasping at the moment is: Much of the data doesn't have a straight encode <-> decode mapping. It has to be constructed from different places/indexes/tables/files depending on the store specifications. Getting a However, the type that RPC/Executor/Stage expects is supposed to be the same, If we don't have a common API (even if split among smaller Traits), won't it just get kinda messy...? About how each data should be ultimately encoded before writing to disk... this is what I was going for: https://github.com/foundry-rs/reth/blob/joshie/kv_mdbx/crates/db/src/kv/table.rs . I'll take a look into those, thanks! I feel like I'm misunderstanding your point. I'll think more about it Block example from erigon: |
fn write_total_difficulty(&self, difficulty: U256) -> Result<(), ApiError>; | ||
fn truncate_total_difficulty(&self, from: BlockNumber) -> Result<(), ApiError>; | ||
|
||
fn read_receipt(&self, number: Option<TxNumber>) -> Result<Receipt, ApiError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this TxIndex
instead of number
fn read_receipt(&self, number: Option<TxNumber>) -> Result<Receipt, ApiError>; | |
fn read_receipt(&self, number: Option<TxIndex>) -> Result<Receipt, ApiError>; |
fn move_to_canonical_txes(&self) -> Result<(), ApiError>; | ||
fn move_to_non_canonical_txes(&self) -> Result<(), ApiError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you envision this doing?
|
||
fn read_canonical_hash(&self, number: u64) -> Result<H256, ApiError>; | ||
fn write_total_difficulty(&self, difficulty: U256) -> Result<(), ApiError>; | ||
fn truncate_total_difficulty(&self, from: BlockNumber) -> Result<(), ApiError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do?
fn read_account_code(&self, address: Address, tx_number: Option<TxNumber>) -> Result<Bytes, ApiError>; | ||
fn read_account_code_size(&self, address: Address, tx_number: Option<TxNumber>) -> Result<u64, ApiError>; | ||
fn read_account_storage(&self, address: Address, tx_number: Option<TxNumber>, key: H256) -> Result<Bytes, ApiError>; | ||
fn read_account_incarnation(&self, address: Address, tx_number: Option<TxNumber>) -> Result<u64, ApiError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should incarnations be part of the struct Account or should they be in a separate table? I can imagine in Ethereum we'd want to make Incarnations a separate table and eventually move it out of MDBX, because SELFDESTRUCT is going away. Might save some space, but probably trivial?
(same msg sent on discord
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, leaning towards having it in a different table tbh
/// Common API for querying and writing data. | ||
#[rustfmt::skip] | ||
pub trait DataApi { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on splitting this in smaller logically coupled traits which get implemented by each in-mem type.
One idea is that we could define trait HeaderReader
/ trait HeaderWriter
, and do the same for Blocks, Transaction, Account.
fn encode(self) -> Self::Encoded { | ||
todo!(); | ||
} | ||
} | ||
|
||
impl Decode for Vec<U256> { | ||
fn decode(_value: &[u8]) -> Result<Self, KVError> { | ||
todo!(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do
impl<T: Encode> for Vec<T> {
// fn encode
// serialize len
// for each item in vec call `encode`
}
and same for decode, read the len from the first number and then decode each element in 0..len
Not sure if you could serialize length as u8/u16/u32 etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally, the codec we choose, should do that?
#[allow(non_camel_case_types)] | ||
type BlockNumber_BlockHash_TxId = Vec<u8>; | ||
type BlockNumber_BlockHash_TxNumber = Vec<u8>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit weird typedef here, is this 3 things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted some representation of a concatenation of (BlockNumber, BlockHash, TxNumber), open to change it 🤷♂️
This is a great point @onbjerg and I think we'd want to maintain the optionality.
I think it's likely our abstraction is leaky and requires that in-mem test dbs will need to implement some transaction-related logic. Do we consider that acceptable? |
/// Returns the lowest and highest block number the data structure contains. | ||
fn read_block_number_range(&self) -> Result<(BlockNumber, BlockNumber), ApiError>; | ||
fn read_block_latest(&self) -> Result<Block, ApiError>; | ||
fn read_block(&self, number: Option<BlockNumber>, hash: Option<BlockHash>) -> Result<Block, ApiError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, maybe having an enum here BlockId
with fields Number
and Hash
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for sure let's use the enum type instead of option.
I don't see functions that give us Edit: additionally transactions are created and cleared (committed), this comment here: https://github.com/foundry-rs/reth/blob/c41c6b99a66bdb662602d0941f353984cd37a052/crates/db/src/kv/mod.rs#L92-L93 |
I don't think we have any other option. If we want to have control over how things are committed to the database, which is essential to staged sync, then we need transactions. Otherwise how would e.g. an MDBX impl know when to commit? |
Re: common types/API etc., I really still don't think the way to go is to have a The main difference in how data is treated between the different databases are how the data is indexed and colocated. Indexes are a function of table keys, and indexes largely also determine how data is colocated, so I'd assume a more generic abstraction is possible |
Yep. OK sounds good to me. I'm good with that.
Let's have a call tomorrow with @joshieDo and iron this out? |
i think we can close this one Added it to the tracking issue |
release: prepare for v0.1.0-beta.2
WIP
Ideally,
mdbx
,memory_db
andstatic
have a common well-defined API to query for data. Is it a too naive of an approach?All 3 elements would implement the same
DataApi
trait.Other crates (RPC,pool, executor etc) request the data through an
ApiManager
(which implements the same trait) without knowing where the data is.ApiManager
will accordingly to the parameters choose the best course of action. (eg. if no block is passed, then its the tip and will query mdbx.)If we all agree this is the way to go, then I'd ask that you'd also add whatever API you need here.
This also helps out with the schema design, to know what's really needed in the DB/Index... or how we can merge/structure data