Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add size hint to serialization path #582

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

colin-grapl
Copy link
Contributor

@colin-grapl colin-grapl commented Oct 21, 2022

This is a mostly straightforward change to Value serialization. Previously, buffers used in serialiazation were preallocating capacity based on the size of a value, not the number of bytes it will use when serialized. This is described in two issues:

  1. Capacity for serialization is suboptimal #579
  2. Fix uses of SerializedValues::with_capacity in the driver code #385

I've fixed this by adding a new method to Value, size_hint. It's documented in the code, which I'll inline here:

pub trait Value {
    fn serialize(&self, buf: &mut Vec<u8>) -> Result<(), ValueTooBig>;
    /// A *hint* to callers indicating how much memory the serialized
    /// form of this `Value` will take. This hint is not defined as a
    /// lower bound, upper bound, nor is an exact size. Every implementation
    /// is free to return the "best guess" available.
    /// The default impl returns `std::mem::size_of::<i32>()` as every Value
    /// at minimum has an i32 sized tag.
    fn size_hint() -> usize {
        std::mem::size_of::<i32>()
    }
}

I've then implemented this method on a bunch of Value impls. The ranges tend to be either exact or optimistic. For example, i8 can be exact:

impl Value for i8 {
    fn serialize(&self, buf: &mut Vec<u8>) -> Result<(), ValueTooBig> { /*...*/ }

    fn size_hint() -> usize {
        size_of::<i32>() + size_of::<Self>()
    }
}

In other cases we know a lower bound but may optimize for slightly above that lower bound:

impl Value for &str {
    fn serialize(&self, buf: &mut Vec<u8>) -> Result<(), ValueTooBig> { /*...*/ }
    fn size_hint() -> usize {
        // 1i32 for the tag, 3i32 for additional characters. This optimizes
        // for the likely case that strings will rarely be empty, and likely
        // be at least a few characters
        4 * size_of::<i32>()
    }
}

In the case of &str the true lower bound is size_of::<i32>(), but typically strings aren't empty, and once you allocate 4 bytes it's reasonable to just allocate 16 and handle what is probably the majority case.

I only used this method in ValueList, basically. I'm not sure if it should be somewhere else too to preallocate.

The results are alright.

main
serialize_lz4_for_iai
  Instructions:               10459
  L1 Accesses:                13456
  L2 Accesses:                   32
  RAM Accesses:                 156
  Estimated Cycles:           19076

fix
serialize_lz4_for_iai
  Instructions:                9838 (-5.937470%)
  L1 Accesses:                12605 (-6.324316%)
  L2 Accesses:                   32 (No change)
  RAM Accesses:                 157 (+0.641026%)
  Estimated Cycles:           18260 (-4.277626%)

Ultimately I think that all of these individual allocations can be removed and, instead, a single buffer could be used. But this is a relatively small, non-breaking change, and it improves memory usage.

Pre-review checklist

(idk what Fixes annotations are, but)

Fixes: #579

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • [?] I added appropriate Fixes: annotations to PR description.

Copy link
Collaborator

@piodul piodul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have shown results from the serialize_lz4_for_iai benchmark, but I can't find it. Could you point me to it or tell what it does?

Comment on lines +460 to +462
// 1i32 for the tag, 3i32 for additional bytes. This optimizes
// for the likely case that bytes will rarely be empty
2 * size_of::<i32>()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions (1 + 3) * i32, but the implementation returns a size for (1 + 1) * i32.

@@ -372,6 +423,10 @@ impl Value for BigInt {

Ok(())
}
fn size_hint() -> usize {
// Internally the smallest BigInt is [u64; 2]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on the choice of the size? The Rust BigInt type represents varint in the cql spec, and the size_of (which I guess you meant by "internal size") has nothing to do with it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigInt as implemented by num_bigint is internally just a [u64;2]

Comment on lines +618 to +619
// Size, number of keys, assume not empty
4 * size_of::<i32>()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on this? One i32 for the serialized size, one i32 for the element count, but what about the remaining 2 * i32?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine that that's optimizing for the assumption of non-emptiness.

@@ -620,6 +732,12 @@ impl Value for CqlValue {
}
}

// utility macro
macro_rules! _count {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underscore at the beginning of the name is usually used only for names that you are going to ignore later. Please align the name with Rust conventions.

$(
result.add_value(&self.$FieldI) ?;
)*

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: unnecessary whitespace change

@@ -639,6 +757,8 @@ macro_rules! impl_value_for_tuple {

Ok(())
}

fn size_hint() -> usize { size_of::<i32>() + _count!($($FieldI)*) * size_of::<i32>() }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about invoking size_hint for each variant of the tuple and returning the sum?

@havaker
Copy link
Contributor

havaker commented Nov 2, 2022

What is the motivation behind making size_hint an associated function instead of a method?

As far as I understand, making it a method would allow hints to be more precise, e.g. it would be possible to take &str lenght into account when computing its size hint.

@mykaul
Copy link
Contributor

mykaul commented Jan 31, 2023

@colin-grapl - can you respond to the review comments?

@insanitybit
Copy link

insanitybit commented Jun 26, 2023

Hey, sorry, not long after this PR the company I worked for was dissolved. I'm no longer working on anything Scylla related. I apologize for any reviewer time that may have been wasted on this, but I'd also be happy to hand off anything related to this work. I'll do some responses here based on what I recall, at least, so that if someone does want to pick this up they have the option to do so.

@insanitybit
Copy link

You have shown results from the serialize_lz4_for_iai benchmark, but I can't find it. Could you point me to it or tell what it does?

https://github.com/bheisler/iai

It was an iai benchmark that I no longer have access to.

@wprzytula wprzytula added this to the 1.0.0 milestone Jun 20, 2024
@wprzytula wprzytula added the performance Improves performance of existing features label Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improves performance of existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Capacity for serialization is suboptimal
6 participants