Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvements #1

Open
2 of 3 tasks
mccolljr opened this issue Jun 30, 2021 · 9 comments
Open
2 of 3 tasks

Performance Improvements #1

mccolljr opened this issue Jun 30, 2021 · 9 comments

Comments

@mccolljr
Copy link
Owner

mccolljr commented Jun 30, 2021

Right now, the implementation is significantly slower than Vec for most things. This can be fixed.

TODO:

  • insert, remove, and drain can copy mem to avoid some iterative shifting.
  • sort implementation can be optimized, and should probably be more than a naive quicksort
  • Use SegVec::with_capacity in FromIterator implementation
@mccolljr mccolljr changed the title Performance Improvement Performance Improvements Jun 30, 2021
@connorskees
Copy link

@mccolljr I wonder if it's possible to store Vec<[Option<T>; 4]>, rather than Vec<Vec<T>>. this should avoid the double indirection at the potential expense of memory. though, i think it might not be that much worse in terms of memory usage than storing vectors, since small vectors go from 0 -> 4

also might make sense to write a wrapper struct that enforces the invariant of contiguous elements (i.e. never having an array that is Some(..), None, Some(..))

struct Segment<T>([Option<T>; 4]);

struct SegVec<T> {
    cap: usize,
    size: usize,
    segments: Vec<Segment<T>>,
}

@mccolljr
Copy link
Owner Author

mccolljr commented Jul 3, 2021

@connorskees That's a good idea to reduce the overhead of each segment, but I think it wouldn't work here since one of the properties I want to maintain is that each doubling of vector size only requires a single allocation, which means that each new segment is necessarily progressively larger than the last. However, I've been playing with this over the last day and I've realized that:

  1. it is possible to allow the user to specify the base segment size (which can reduce the overhead of the storage by requiring fewer chunks), and
  2. because the len and capacity of each vector is deterministic for any fixed number of elements and a fixed base segment size, so I can get away with storing just the head pointer for each segment which reduces the size of the reference to each segment from 3 * size_of<usize>() to just size_of<usize>(), and
  3. I can use the smallvec optimization instead of a plain Vec<Segment<T>> to optimize the size of the representation of segments

Doing this could allow an arbitrarily large number of items to be stored in the SegVec with only one level of indirection required for data access, provided a sufficient N (# of segment headers to keep on the stack) and base segment size are chosen

@mccolljr
Copy link
Owner Author

Status on this:

I modified the remove, insert, and in part drain implementations to use std::mem::copy where applicable, which should hopefully offer some improvement to the timing there. I need to revisit that code and do a cleanup pass.

@cehteh
Copy link
Contributor

cehteh commented Jun 12, 2023

Notice from the performance department:

I expressed before that I am not very fond of using 'thin-vec', now while playing around with the benchmark and different features (trying to see if cacheline alignment would speed things up) I noticed that thin-vec makes things much slower (30-60% decrease in performance). I'd expected it to be at least as fast as without :/

@mccolljr
Copy link
Owner Author

Notice from the performance department:

I expressed before that I am not very fond of using 'thin-vec', now while playing around with the benchmark and different features (trying to see if cacheline alignment would speed things up) I noticed that thin-vec makes things much slower (30-60% decrease in performance). I'd expected it to be at least as fast as without :/

This is good to know. This was an early attempt to keep the segment header size as small as possible, but the extra indirection needed to access segment length and capacity probably causes cache misses for a very marginal gain in space segment header size.

When I have some time this week I intend to abstract the storage into a trait so that a consumer of the data structure can choose their own storage implementation, and I will provide a default implementation for Vec, which will be the default. This should allow me to eliminate both cargo features, and then if someone wants thinvec support, they can implement the trait for a wrapper type.

@cehteh
Copy link
Contributor

cehteh commented Jun 12, 2023

i am playing with what i noted before:

pub mod detail {
    use super::MemConfig;
    #[cfg(feature = "raw-alloc")]
    pub struct Segment<T, C: MemConfig>(*mut T);
    #[cfg(feature = "thin-segments")]
    pub struct Segment<T, C: MemConfig>(thin_vec::ThinVec<T>);
    #[cfg(not(feature = "thin-segments"))]
    pub struct Segment<T, C: MemConfig>(Vec<T>);

    ....
}

This way the Segments becomes thin pointers (later ideally in a SmallVec)
and Segment is a raw unsized array without header/metadata. The size comes from MemConfig.

@cehteh
Copy link
Contributor

cehteh commented Jun 12, 2023

i am playing with what i noted before:

pub mod detail {
    use super::MemConfig;
    #[cfg(feature = "raw-alloc")]
    pub struct Segment<T, C: MemConfig>(*mut T);
    #[cfg(feature = "thin-segments")]
    pub struct Segment<T, C: MemConfig>(thin_vec::ThinVec<T>);
    #[cfg(not(feature = "thin-segments"))]
    pub struct Segment<T, C: MemConfig>(Vec<T>);

    ....
}

When you make this a Trait, please pass C: MemConfig as parameter to the trait! .. I think i wait for you to do this and then continue on the raw-alloc.

@cehteh
Copy link
Contributor

cehteh commented Jun 13, 2023

'Segments<T, C: MemConfig>' should become a newtype w/ trait too.

Rationale: MemConfig determines the segment_size() arithmetically, but Vec / slice based storage backends could query that without calculation, only my planned thin '*mut T' would need arithmetic here.

@cehteh
Copy link
Contributor

cehteh commented Jun 20, 2023

another note mostly for myself, but opinions would be welcome:

while the 'SegmentCache' try was working it turned out to be way to intrusive to be useful and 'Linear' outperformed it in many cases.

I am thinking about reviving the caching idea only for Slices:

trait SegmentCache {
   //...
}

struct NoCache; // ZST, the default does nothing

struct Cache {
    segment_start: usize,
    segment_end: usize,
    segment: usize,
}

impl SegmentCache for NoCache { /*NOP proxies*/}
impl SegmentCache for Cache { /*.. do the caching here..*/}

pub struct Slice<'a, T: 'a, C SegmentCache = NoCache> {
    inner: &'a (dyn SegmentIndex<T>),
    start: usize,
    len: usize,
    cache: C,
}

So far thats only in brainfart stage.

So Caching will be opt-in with some (3 usize) overhead. Default will stay as is.

Implemented only for Slice, not SliceMut since caching only works well for accesses within the same segment. One shouldo clone Slices when working on different parts of the Slice, SliceMut cant be cloned (eventually some SliceMut::split_at_mut() may become handy, then caching may be ok)

I have no immediate need for this, I just write it down here, would be a low hanging fruit to be implemented later and improve indexing operations for Slices into Proportional and Exponential configured SegVecs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants