-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce alloc in copy_to_overlay #178
Reduce alloc in copy_to_overlay #178
Conversation
*bytes += key.len(); | ||
*bytes += value.len(); | ||
*bytes += key.value().len(); | ||
*bytes += value.value().len(); | ||
overlay.insert(key.clone(), (record_id, Some(value.clone()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key
and value
are Arc<Vec<u8>>
now
@@ -1313,7 +1364,7 @@ impl IndexedChangeSet { | |||
match &change { | |||
Operation::Set(k, v) => { | |||
*bytes += k.len(); | |||
*bytes += v.len(); | |||
*bytes += v.value().len(); | |||
overlay.indexed.insert(*k, (record_id, Some(v.clone()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value
is Arc<Vec<u8>>
now
c8c585c
to
095fc7b
Compare
095fc7b
to
6e7ffb6
Compare
Thank you for the PR. |
src/db.rs
Outdated
@@ -56,6 +57,47 @@ const KEEP_LOGS: usize = 16; | |||
/// Value is just a vector of bytes. Value sizes up to 4Gb are allowed. | |||
pub type Value = Vec<u8>; | |||
|
|||
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord)] | |||
pub struct ValuePtr(Arc<Value>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is used for both the key and the value, it should be named in a more generic way. Something like SharedVec
or ArcVec
or ArcBuf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this nice change! I just share the same naming concern as @arkpar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it sounds like a good idea to switch to Rc in the change overlay.
src/db.rs
Outdated
@@ -56,6 +57,47 @@ const KEEP_LOGS: usize = 16; | |||
/// Value is just a vector of bytes. Value sizes up to 4Gb are allowed. | |||
pub type Value = Vec<u8>; | |||
|
|||
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord)] | |||
pub struct ValuePtr(Arc<Value>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also maybe just consider a type alias like type RcKey = Arc<Key>; type RcValue = Arc<Value>
to reduce the boilerplate implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming done
src/db.rs
Outdated
} | ||
} | ||
|
||
impl<const N: usize> TryFrom<ValuePtr> for [u8; N] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this function used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in tests, let me mark it as #[cfg(test)]
src/btree/iter.rs
Outdated
At(Vec<u8>), | ||
Seeked(Vec<u8>), | ||
At(ValuePtr), | ||
Seeked(ValuePtr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit hesitant to switch this one to rc as it changes the iterator api. Seeked is not really useful, At will be to save one key memory per iterator and a mem copy of the key on all alloc (could use an allocated buffer but not sure if relevant) but the iterator is nowhere near performant or trying to be.
Personally I would rather keep the rc to the change overlay only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on which API to keep unchanged? I just tried reverting it here but it does not change any API signature. commit: 1af4855
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just iterator next
and prev
returns IterResult that used to contain plain Vec (as used in the fuzzer, I mean the change to the fuzzer could also be reverted then).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cheme Done. Please take another look. (Some changes come from cargo clippy --fix
and I didn't revert them)
Where can I find the stress test code? In my scenario, I have ~70M pairs to inject upfront so it's a write-only scenario, and I have to commit batches at 1GB or 2GB to work around the hard-coded 16MB write queue size (This reduces time cost by ~50%). This change does help my scenario a bit |
cargo run -p parity-db-admin -- stress --help |
@cheme thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me.
I notice that the
copy_to_overlay
method clones thevalue
vector in Hash index mode and clones both thekey
andvalue
vectors in Btree index mode, which seems to be an unnecessary overhead, this PR tries to, instead, clone the pointer typeArc<Vec<u8>>
which is much cheaper, please let me know your feedback, thanks!