Use stm-containers instead of IORef (Map k v)
#1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Concurrency in the
VectortypeThe
Vectortype inPrometheus.Metric.Vectoruses anIORef (Map l (Metric m)).This is a known antipattern for poor performance.
The proper fix is to use the
stm-containerswhich allows signifiacntly better concurrent updates to theMap.The problem is that the
Vectorrequires a lock on the entireMapin order to update a single value.Suppose you have 200 threads, each wanting to update the
Map.The first will do
atomicModifyIORef, which will grab a lock on the map.Every other thread is now blocked.
The first thread will update the label
k0and put the map back in.The next thread will now lock the whole map for label
k1.Rinse and repeat.
With
StmContainers.Map, the key is locked - so there is only contention on specific keys.If 200 threads want to update the map, and they all have distinct keys, then this can all happen concurrently without any locking or STM retrying.
Benchmarks
To understand this behavior, I created a benchmark that would concurrently make many writes to a
Vector.To measure impact of concurrency, I am going to iterate this with
-with-rtsopts=-Nx.I am using an Intel i9 with 32 cores.
NOTE: Oops. This code is not faithful to Hackage/released code.
The released code uses
Data.Atomics.atomicModifyIORefCAS, which avoids the locking and contention.However, I'm going to leave the benchmarks here because they're very interesting as a way of seeing how
Data.Atomics.atomicModifyIORefCAScompares toData.IORef.atomicModifyIORef'.-N1In a single threaded context,
stm-containersis still over twice as fast.-N2With two threads, the normal vector encounters contention and becomes significantly slower (1.6x more time).
Meanwhile, the STM vector takes 56% the time.
-N4With four threads, contention on the traditional
Vectoris high, and we take even more time - however, it's a modest increase.The STM vector improves performance significantly again - taking 57% of the time of using 2 threads.
-N8The pattern repeats: contention makes the
IORefvector slower (120% prior time), while more threads makes STM faster (64% of the time).-N16With 16 threads, contention has almost doubled the time on the
IORef.Meanwhile, the
STMimplementation has stabilized around 45ms.Oops
At this point, I realize an error that I have made.
The Hackage version of code uses
Data.Atomics.atomicModifyIORefCAS, but I have switched it toatomicModifyIORef'.If I switch it back and rerun, I get significantly better results:
-N1With a single threaded implementation, the stock implementation is faster.
STM is about 33% slower.
-N2Stock implementation is slightly faster with two threads.
STM implemenation is almost twice as fast, and is now slightly faster.
-N4With 4 threads, both see significant performance improvements, with STm gaining more of a lead.
-N8With 8 cores, performance of the
IORefhas stabilized.The
STMimplementation continues to improve.-N16At 16 cores, contention on the IORef increases overhead.
STM appears stable.
-N32With 32 cores,
STMis constant and even the atomic IORef runs into significant contention.Defaults
Based on these observations, this commit changes the default exported in
Prometheusto the STM vector.