Performance Improvements #1

mccolljr · 2021-06-30T04:27:51Z

Right now, the implementation is significantly slower than Vec for most things. This can be fixed.

TODO:

insert, remove, and drain can copy mem to avoid some iterative shifting.
sort implementation can be optimized, and should probably be more than a naive quicksort
Use SegVec::with_capacity in FromIterator implementation

The text was updated successfully, but these errors were encountered:

connorskees · 2021-06-30T17:20:50Z

@mccolljr I wonder if it's possible to store Vec<[Option<T>; 4]>, rather than Vec<Vec<T>>. this should avoid the double indirection at the potential expense of memory. though, i think it might not be that much worse in terms of memory usage than storing vectors, since small vectors go from 0 -> 4

also might make sense to write a wrapper struct that enforces the invariant of contiguous elements (i.e. never having an array that is Some(..), None, Some(..))

struct Segment<T>([Option<T>; 4]);

struct SegVec<T> {
    cap: usize,
    size: usize,
    segments: Vec<Segment<T>>,
}

mccolljr · 2021-07-03T00:16:21Z

@connorskees That's a good idea to reduce the overhead of each segment, but I think it wouldn't work here since one of the properties I want to maintain is that each doubling of vector size only requires a single allocation, which means that each new segment is necessarily progressively larger than the last. However, I've been playing with this over the last day and I've realized that:

it is possible to allow the user to specify the base segment size (which can reduce the overhead of the storage by requiring fewer chunks), and
because the len and capacity of each vector is deterministic for any fixed number of elements and a fixed base segment size, so I can get away with storing just the head pointer for each segment which reduces the size of the reference to each segment from 3 * size_of<usize>() to just size_of<usize>(), and
I can use the smallvec optimization instead of a plain Vec<Segment<T>> to optimize the size of the representation of segments

Doing this could allow an arbitrarily large number of items to be stored in the SegVec with only one level of indirection required for data access, provided a sufficient N (# of segment headers to keep on the stack) and base segment size are chosen

mccolljr · 2021-07-11T16:18:45Z

Status on this:

I modified the remove, insert, and in part drain implementations to use std::mem::copy where applicable, which should hopefully offer some improvement to the timing there. I need to revisit that code and do a cleanup pass.

cehteh · 2023-06-12T19:08:04Z

Notice from the performance department:

I expressed before that I am not very fond of using 'thin-vec', now while playing around with the benchmark and different features (trying to see if cacheline alignment would speed things up) I noticed that thin-vec makes things much slower (30-60% decrease in performance). I'd expected it to be at least as fast as without :/

mccolljr · 2023-06-12T19:14:40Z

Notice from the performance department:

I expressed before that I am not very fond of using 'thin-vec', now while playing around with the benchmark and different features (trying to see if cacheline alignment would speed things up) I noticed that thin-vec makes things much slower (30-60% decrease in performance). I'd expected it to be at least as fast as without :/

This is good to know. This was an early attempt to keep the segment header size as small as possible, but the extra indirection needed to access segment length and capacity probably causes cache misses for a very marginal gain in space segment header size.

When I have some time this week I intend to abstract the storage into a trait so that a consumer of the data structure can choose their own storage implementation, and I will provide a default implementation for Vec, which will be the default. This should allow me to eliminate both cargo features, and then if someone wants thinvec support, they can implement the trait for a wrapper type.

cehteh · 2023-06-12T19:20:17Z

i am playing with what i noted before:

pub mod detail {
    use super::MemConfig;
    #[cfg(feature = "raw-alloc")]
    pub struct Segment<T, C: MemConfig>(*mut T);
    #[cfg(feature = "thin-segments")]
    pub struct Segment<T, C: MemConfig>(thin_vec::ThinVec<T>);
    #[cfg(not(feature = "thin-segments"))]
    pub struct Segment<T, C: MemConfig>(Vec<T>);

    ....
}

This way the Segments becomes thin pointers (later ideally in a SmallVec)
and Segment is a raw unsized array without header/metadata. The size comes from MemConfig.

cehteh · 2023-06-12T19:36:41Z

i am playing with what i noted before:

pub mod detail {
    use super::MemConfig;
    #[cfg(feature = "raw-alloc")]
    pub struct Segment<T, C: MemConfig>(*mut T);
    #[cfg(feature = "thin-segments")]
    pub struct Segment<T, C: MemConfig>(thin_vec::ThinVec<T>);
    #[cfg(not(feature = "thin-segments"))]
    pub struct Segment<T, C: MemConfig>(Vec<T>);

    ....
}

When you make this a Trait, please pass C: MemConfig as parameter to the trait! .. I think i wait for you to do this and then continue on the raw-alloc.

cehteh · 2023-06-13T11:09:05Z

'Segments<T, C: MemConfig>' should become a newtype w/ trait too.

Rationale: MemConfig determines the segment_size() arithmetically, but Vec / slice based storage backends could query that without calculation, only my planned thin '*mut T' would need arithmetic here.

cehteh · 2023-06-20T10:12:15Z

another note mostly for myself, but opinions would be welcome:

while the 'SegmentCache' try was working it turned out to be way to intrusive to be useful and 'Linear' outperformed it in many cases.

I am thinking about reviving the caching idea only for Slices:

trait SegmentCache {
   //...
}

struct NoCache; // ZST, the default does nothing

struct Cache {
    segment_start: usize,
    segment_end: usize,
    segment: usize,
}

impl SegmentCache for NoCache { /*NOP proxies*/}
impl SegmentCache for Cache { /*.. do the caching here..*/}

pub struct Slice<'a, T: 'a, C SegmentCache = NoCache> {
    inner: &'a (dyn SegmentIndex<T>),
    start: usize,
    len: usize,
    cache: C,
}

So far thats only in brainfart stage.

So Caching will be opt-in with some (3 usize) overhead. Default will stay as is.

Implemented only for Slice, not SliceMut since caching only works well for accesses within the same segment. One shouldo clone Slices when working on different parts of the Slice, SliceMut cant be cloned (eventually some SliceMut::split_at_mut() may become handy, then caching may be ok)

I have no immediate need for this, I just write it down here, would be a low hanging fruit to be implemented later and improve indexing operations for Slices into Proportional and Exponential configured SegVecs.

mccolljr changed the title ~~Performance Improvement~~ Performance Improvements Jun 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Improvements #1

Performance Improvements #1

mccolljr commented Jun 30, 2021 •

edited

Loading

connorskees commented Jun 30, 2021

mccolljr commented Jul 3, 2021 •

edited

Loading

mccolljr commented Jul 11, 2021

cehteh commented Jun 12, 2023

mccolljr commented Jun 12, 2023

cehteh commented Jun 12, 2023

cehteh commented Jun 12, 2023

cehteh commented Jun 13, 2023

cehteh commented Jun 20, 2023

Performance Improvements #1

Performance Improvements #1

Comments

mccolljr commented Jun 30, 2021 • edited Loading

connorskees commented Jun 30, 2021

mccolljr commented Jul 3, 2021 • edited Loading

mccolljr commented Jul 11, 2021

cehteh commented Jun 12, 2023

mccolljr commented Jun 12, 2023

cehteh commented Jun 12, 2023

cehteh commented Jun 12, 2023

cehteh commented Jun 13, 2023

cehteh commented Jun 20, 2023

mccolljr commented Jun 30, 2021 •

edited

Loading

mccolljr commented Jul 3, 2021 •

edited

Loading