Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(db): codec encoding/decoding #51

Merged
merged 43 commits into from
Oct 17, 2022
Merged

feat(db): codec encoding/decoding #51

merged 43 commits into from
Oct 17, 2022

Conversation

joshieDo
Copy link
Collaborator

@joshieDo joshieDo commented Oct 12, 2022

Summary:

When data is to be stored or read from the database, the type should implement table::Encoding and table::Decoding so it knows how to transform from a structured type to a list of bytes.

However, at the same time we have to care for how we encode this data. Either in a way that helps with sorting (Table::Key) or compactness (Table::Value). That's why we add external codecs such as scale for example.

Having them to be configurable through annotations allows us to contain their implementations/leakage to isolated portions of the project, and allows us to switch between them relatively easily (eg. benchmarking).

old

Added scale encoding, which allowed to remove most of the impl of some basic types.

Added Headers example.

Although, having to put #[codec(compact)] in every integer is kinda of annoying. I was kinda hoping it would do it by default. Another limitation of the lib is that inner types are not getting compacted at all (Vec, Option, etc...)


Added another one called postcard, which seemed to be good on the benchmarks. However, when I tested it with Account... it didn't perform as I expected it. I'd expect the maxsize to be 72 (u64 + 32bytes + 32bytes)... but it's encoding it to 144 bytes.. which is strange. Maybe I'm missing something.

@joshieDo joshieDo added the A-db Related to the database label Oct 12, 2022
/// BlockNumber concatenated with BlockHash. Used as a key for multiple tables. Having the first
/// element as BlockNumber, helps out with querying/sorting.
#[derive(Debug)]
#[allow(non_camel_case_types)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we want non camel case here?

Copy link
Collaborator Author

@joshieDo joshieDo Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#29 (comment)

I wanted some representation of a concatenation of (BlockNumber, BlockHash, TxNumber), open to change it man_shrugging

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo just using camel case is sufficient

crates/db/src/kv/table.rs Show resolved Hide resolved
#[allow(non_camel_case_types)]
type BNum_BHash_TxId = Vec<u8>;
type RlpHeader = Vec<u8>;
type BlockNumber_BlockHash_TxNumber = Vec<u8>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same q

where
T: ScaleOnly + parity_scale_codec::Decode + Sync + Send + std::fmt::Debug,
{
fn decode(value: bytes::Bytes) -> Result<T, KVError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this take Into<Bytes>?

@gakonst
Copy link
Member

gakonst commented Oct 12, 2022

if !path.path.segments.is_empty() {
let _type = format!("{}", path.path.segments[0].ident);
if compactable_types.contains(&_type.as_str()) {
field.attrs.push(syn::parse_quote! {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there's a better way to check the type

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattsse may know, but seems fine to me, per L22 above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine for now

crates/db/src/kv/table.rs Show resolved Hide resolved
impl Decode for BlockNumHash {
fn decode<B: Into<Bytes>>(value: B) -> Result<Self, KVError> {
let value = value.into();
Ok(BlockNumHash((u64::decode(value.clone())?, H256::decode(value.slice(8..))?)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can avoid the clone here by doing the u64::decode(&value) in a separate line from value.slice? else let's impl decode for refs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the decode object is often provided as mu, &mut BufMut for example.

this way you can take what you want from a buffer and advance it once done.


/// Account saved in database
#[use_postcard]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be #[main_codec]`?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added it just to showcase that both can be used, but yea

if !path.path.segments.is_empty() {
let _type = format!("{}", path.path.segments[0].ident);
if compactable_types.contains(&_type.as_str()) {
field.attrs.push(syn::parse_quote! {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattsse may know, but seems fine to me, per L22 above

crates/db/src/kv/table.rs Show resolved Hide resolved
@@ -0,0 +1,58 @@
//! Block related models and types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also that similar XForStorage types that Akula has, which we maybe could adopt as a pattern

@gakonst
Copy link
Member

gakonst commented Oct 13, 2022

This all looks good as it sets up the infra for encoding/decoding with configurable codec.

Next steps:

  1. Ensure we have the derive everywhere needed
  2. Add missing types
  3. Test ser/de (I'd strongly encourage we use proptest and fuzz-test every single ser/de function)
  4. Benchmarks for ser, de & disk-size for every type, for each codec

Can be done in follow-up PR or any additional small changes you wanna do.

crates/db/src/kv/table.rs Show resolved Hide resolved
crates/db/src/kv/table.rs Show resolved Hide resolved
@@ -0,0 +1,63 @@
use proc_macro::{self, TokenStream};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move this to derive folder which is it's own proc-macro crate

if !path.path.segments.is_empty() {
let _type = format!("{}", path.path.segments[0].ident);
if compactable_types.contains(&_type.as_str()) {
field.attrs.push(syn::parse_quote! {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine for now

@joshieDo joshieDo mentioned this pull request Oct 14, 2022
6 tasks
Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understood it, this serves as a way to easily switch between codecs which we'll make use of for benchmarks?

once we settled on a codec this will be become obsolete, right?

can we add this to the readme?

crates/codecs/derive/Cargo.toml Show resolved Hide resolved
@onbjerg onbjerg added the C-enhancement New feature or request label Oct 14, 2022
@joshieDo joshieDo requested a review from mattsse October 17, 2022 06:05
@@ -11,6 +11,12 @@ description = "Staged syncing primitives used in reth."
# reth
reth-primitives = { path = "../primitives" }

# codecs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be optional?

@@ -0,0 +1,2 @@
mod postcard;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be feature gated?

reth-rlp = { path = "../common/rlp", features = ["derive"]}
parity-scale-codec = { version = "3.2.1", features = ["derive", "bytes"] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we're using scale, so when the macro #[main_codec] gets expanded, it expects parity-scale-codec to be reachable

Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@joshieDo
Copy link
Collaborator Author

joshieDo commented Oct 17, 2022

couldn't find a straightforward way to include/exclude code & dependencies depending on a downstream dependency feature (primitives -> codecs!) .

the impl code should get optimized out anyway, but yeah it slows down compilation i guess.

if it's okay, i'll look into it another time

@gakonst gakonst merged commit 063b444 into main Oct 17, 2022
@gakonst gakonst deleted the joshie/scale branch October 17, 2022 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-db Related to the database C-enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants