Releases: orxfun/orx-concurrent-vec
IntoIterator is required for PinnedVec
PinnedConcurrentCol migration
ConcurrentVec
now is only a wrapper around a ConcurrentBag
with safe get and iter implementations.
ConcurrentBag
implementsDeref
andDerefMut
forPinnedConcurrentCol
. AndConcurrentVec
forConcurrentBag
.- Documentation is revised.
- Benchmarks are repeated, performance has not changed (as expected).
fix/issue-with-heap-corruption-is-fixed
The fix is applied in orx-concurrent-bag. This PR involves updating the dependency version. Further, relevant test coverage is increased.
Performance improvements by faster mem zeroing
The main difference of ConcurrentVec
from more performant ConcurrentBag
is due to the fact that ConcurrentVec
zeroes out new positions upon each new allocation. This is a requirement to provide safe versions of the get
and iter
methods; in other words, to allow safe concurrent read and write opeations.
In the prior version, memory zeroing had been handled by the concurrent bag's fill strategy which requires iteration over the new allocated positions.
In the new version. a memory zeroing concurrent bag is wrapped which now directly uses PinnedVec
s grow_to
method to zero out the memory on allocation. The implementations often reduce to a memset call and hence provides performance improvements.
Benchmarks are repeated, expected improvements are observed and reported.
`ConcurrentVec` wraps and extends `ConcurrentBag`
- To completely avoid code repetition,
ConcurrentVec<T>
now wraps aConcurrentBag<Option<T>>
. - The separation is made clear:
ConcurrentBag
is performance optimal for concurrent collection,ConcurrentVec
is a an efficient alternative which provides safe reading during growth.- Read and write feature is demonstrated with a simple example.
- On the other hand, concurrent vec api and concurrent bag api are made almost identical to allow for changing the concurrent collection type easily.
- Benchmarks are added, run and reported.
- Documentation is revised.
- Tests are extended.
Changing underlying storage to any `PinnedVec`
- Instead of hard-coded
SplitVec
, the concurrent bag now wraps anyPinnedVec
. capacity
method is added.Debug
is implemented.- Benchmarks added and reported.
AtomicUsize
is used for capacity rather thanusize
. This update does not lead to diminished performance.- Memory orderings of atomic operations are strengthened.
UnsafeCell
is used for the pinned vector, rather than direct cast.- Tests extended.
Inside the push
In the prior version, growth of the bag required two operations:
- setting length of the pinned vector to its capacity, and
- growing the pinned vector to its next size.
The reason the first operation is required is due to the fact that the second operation can only succeed when vector is at its capacity; i.e., len == capacity
.
Having to make two mutations is not good for heavy load situations. grow_to
method, on the other hand, does not require the pinned vector to be at its length, and hence, we do not need to make the set_len
call. Furthemore, it will be required for a possible extend
implementation.
Also
- Errors are organized.
- Memory orderings are strengthened.
Benchmarks and unsafe primitives
- Benchmarks are documented and reported.
push_none
is implemented to allow reserving positions, or pushing holes to the vector.unsafe
primitivestake
andput
are implemented.
ConcurrentVec is defined with push, iter, get
orx-concurrent-vec
An efficient and convenient thread-safe grow-only read-and-write collection, ideal for collecting results concurrently.
- convenient: the vector can be shared among threads simply as a shared reference, not even requiring
Arc
, - efficient: for collecting results concurrently:
- rayon is significantly faster than
ConcurrentVec
when the elements are small and there is an extreme load (no work at all among push calls), ConcurrentVec
is significantly faster than rayon when elements are large or there there is some computation happening to evaluate the elements before the push calls,- you may see the details of the benchmarks at benches/grow.rs.
- rayon is significantly faster than
The bag preserves the order of elements with respect to the order the push
method is called.
Examples
Safety guarantees to push to the bag with an immutable reference makes it easy to share the bag among threads.
Using std::sync::Arc
Following the common approach of using an Arc
, we can share the vector among threads and collect results concurrently.
use orx_concurrent_vec::*;
use std::{sync::Arc, thread};
let (num_threads, num_items_per_thread) = (4, 8);
let convec = Arc::new(ConcurrentVec::new());
let mut thread_vec: Vec<thread::JoinHandle<()>> = Vec::new();
for i in 0..num_threads {
let convec = convec.clone();
thread_vec.push(thread::spawn(move || {
for j in 0..num_items_per_thread {
convec.push(i * 1000 + j); // concurrently collect results simply by calling `push`
}
}));
}
for handle in thread_vec {
handle.join().unwrap();
}
let mut vec_from_convec: Vec<_> = convec.iter().copied().collect();
vec_from_convec.sort();
let mut expected: Vec<_> = (0..num_threads).flat_map(|i| (0..num_items_per_thread).map(move |j| i * 1000 + j)).collect();
expected.sort();
assert_eq!(vec_from_convec, expected);
Using std::thread::scope
An even more convenient approach would be to use thread scopes. This allows to use shared reference of the vec across threads, instead of Arc
.
use orx_concurrent_vec::*;
use std::thread;
let (num_threads, num_items_per_thread) = (4, 8);
let convec = ConcurrentVec::new();
let convec_ref = &convec; // just take a reference
std::thread::scope(|s| {
for i in 0..num_threads {
s.spawn(move || {
for j in 0..num_items_per_thread {
convec_ref.push(i * 1000 + j); // concurrently collect results simply by calling `push`
}
});
}
});
let mut vec_from_convec: Vec<_> = convec.iter().copied().collect();
vec_from_convec.sort();
let mut expected: Vec<_> = (0..num_threads).flat_map(|i| (0..num_items_per_thread).map(move |j| i * 1000 + j)).collect();
expected.sort();
assert_eq!(vec_from_convec, expected);
Safety
ConcurrentVec
uses a SplitVec
as the underlying storage.
SplitVec
implements PinnedVec
which guarantees that elements which are already pushed to the vector stay pinned to their memory locations.
This feature makes it safe to grow with a shared reference on a single thread, as implemented by ImpVec
.
In order to achieve this feature in a concurrent program, ConcurrentVec
pairs the SplitVec
with an AtomicUsize
.
AtomicUsize
fixes the target memory location of each element to be pushed at the time thepush
method is called. Regardless of whether or not writing to memory completes before another element is pushed, every pushed element receives a unique position reserved for it.SplitVec
guarantees that already pushed elements are not moved around in memory and new elements are written to the reserved position.
The approach guarantees that
- only one thread can write to the memory location of an element being pushed to the vec,
- at any point in time, only one thread is responsible for the allocation of memory if the vec requires new memory,
- no thread reads an element which is being written, reading is allowed only after the element is completely written,
- hence, there exists no race condition.
This pair allows a lightweight and convenient concurrent bag which is ideal for collecting results concurrently.
Write-Only vs Read-Write
The concurrent vec is read-and-write & grow-only vec which is convenient and efficient for collecting elements.
While allowing growth by pushing elements from multiple threads, each thread can safely read already pushed elements.
See ConcurrentBag
for a write-only variant which allows only writing during growth.
The advantage of the bag, on the other hand, is that it stores elements as T
rather than Option<T>
.