Poor performance when weaving many branches together #64

Moelf · 2021-07-24T21:23:38Z

Sometimes in HEP the memory access pattern is inherently bad. For example we have jagged branches for electron and muon related quantities: Lorentz Vectors, isolation flags etc.

Let's say the e_mask is quality cut, and people want to extract isolation cuts for electrons that passed the e_mask:

function get_Isos(e_mask, m_mask, evt)
    v_l_passIso = Vector{Bool}[]

    v_e_passIso_HighPtCaloOnly = evt.v_e_passIso_HighPtCaloOnly
    v_e_passIso_TightTrackOnly_VarRad = evt.v_e_passIso_TightTrackOnly_VarRad
    v_e_passIso_TightTrackOnly_FixedRad = evt.v_e_passIso_TightTrackOnly_FixedRad
    v_e_passIso_Tight_VarRad = evt.v_e_passIso_Tight_VarRad
    #.... 12 more

    @inbounds for (i,f) in enumerate(e_mask)
        f || continue
        push!(
            v_l_passIso,
            Bool[
                v_e_passIso_HighPtCaloOnly[i],
                v_e_passIso_TightTrackOnly_VarRad[i],
                v_e_passIso_TightTrackOnly_FixedRad[i],
                v_e_passIso_Tight_VarRad[i],
                v_e_passIso_Loose_VarRad[i],
            ],
        )
    end
	# one more for loop for muons

	return v_l_passIso

which results in

Each of those piles is actually just the v_e_passIso_HighPtCaloOnly = evt.v_e_passIso_HighPtCaloOnly line.

This is probably horrible no matter what language you use, but I'm wondering if we should explore https://github.com/JuliaArrays/ArraysOfArrays.jl/ to see if we can cut down allocations of making concrete vectors (or vector of vectors) since in many cases they just get teared apart manually in the looper.

The text was updated successfully, but these errors were encountered:

oschulz · 2021-07-25T14:30:24Z

Using ArrayOfArrays.jl should definitely help to reduce the number of memory allocations a lot.

oschulz · 2021-07-25T14:34:34Z

Why is v_l_passIso be a vector of boolean vectors, instead of a vector of NamedTuples?

Moelf · 2021-07-25T16:56:00Z

because it's used like this later:

    #isolation cut, require all 4 of them to be true
    if all((
        v_l_passIso[pr1[1]][1],
        v_l_passIso[pr1[2]][1],
        v_l_passIso[W_id[1]][1],
        v_l_passIso[W_id[2]][1],
    ))
    else
        return false, wgt
    end

and

    if (abs(v_l_pid[fifth_l]) == 11)
        !v_l_passIso[fifth_l][4] && return false, wgt
    end
    if (abs(v_l_pid[fifth_l]) == 13)
        !v_l_passIso[fifth_l][3] && return false, wgt
    end

in short, it needs to have the same order as all other "lepton related" branches so you can index all of them with the same number that identifies a unique lepton in the event

oschulz · 2021-07-25T19:18:23Z

I guess one option would be to user a static array or tuple, since the length of the elements of v_l_passIso is fixed. An ArraysOfArrays.VectorOrVectors would work too, of course.

Moelf · 2021-07-25T19:32:27Z

length of the elements of v_l_passIso is

it is not, it only contains stuff for leptons that passed e_mask and m_mask, the inner most vector is of length 16, not worth StaticArray probably.

Moelf · 2021-07-25T20:31:39Z

index 86d8909..e26ff74 100644
--- a/src/root.jl
+++ b/src/root.jl
@@ -215,11 +215,16 @@ function interped_data(rawdata, rawoffsets, ::Type{T}, ::Type{J}) where {T, J<:J
-        @views [
-                ntoh.(reinterpret(
-                                  T, rawdata[ (rawoffsets[i]+jagg_offset+1):rawoffsets[i+1] ]
-                                 )) for i in 1:(length(rawoffsets) - 1)
-               ]
+        _size = sizeof(eltype(T))
+        data = UInt8[]
+        offset = Int64[0] # god damn 0-based index
+        @views @inbounds for i in 1:(length(rawoffsets) - 1)
+            rg = (rawoffsets[i]+jagg_offset+1) : rawoffsets[i+1]
+            append!(data, rawdata[rg])
+            push!(offset, last(offset) + length(rg))
+        end
+        real_data = ntoh.(reinterpret(T, data))
+        VectorOfVectors(real_data, offset .÷ _size .+ 1)
     end

here's the number AFTER applying this diff:

# 1st run
 17.403734 seconds (72.69 M allocations: 6.877 GiB, 9.20% gc time, 3.31% compilation time)
# 2nd run
3.135352 seconds (34.34 M allocations: 2.249 GiB, 12.11% gc time)
# 3rd run
2.990273 seconds (34.34 M allocations: 2.249 GiB, 9.52% gc time)

BEFORE this diff:

# 1st run
32.069656 seconds (213.86 M allocations: 11.560 GiB, 21.16% gc time, 1.26% compilation time)

# 2nd run
3.976044 seconds (27.73 M allocations: 1.779 GiB, 21.24% gc time)

# 3rd run
  3.725007 seconds (27.73 M allocations: 1.779 GiB, 13.36% gc time)

@oschulz definitely helpful at asymptotic limit! A little bit confused by why subsequent runs allocates more, may be because some weird hidden conversion somewhere...

Moelf · 2021-07-25T20:59:38Z

hmmm, I tried a Ref{}() hack and it seems to be working, will make a PR now and we can discuss technical details there

oschulz · 2021-07-25T21:14:01Z

Using view(rawdata, rg) instead of rawdata[rg] could save allocations, since views are stack-allocated now.

oschulz · 2021-07-25T21:19:00Z

Ah, sorry, overlooked the @views

oschulz · 2021-07-25T21:39:38Z

Maybe try this version:

function interped_data(rawdata, rawoffsets, ::Type{T}, ::Type{J}) where {T, J<:JaggType}
    # ...
        _size = Base.SignedMultiplicativeInverse{Int}(sizeof(eltype(T))
        data = UInt8[]
        elem_ptr = sizehint!(Vector{Int}(), length(eachindex(rawoffsets)))
        push!(elem_ptr,  0)
        # Can we sizehint! data too?
        @inbounds for i in eachindex(rawoffsets)[1:end-1] 
            rg = (rawoffsets[i]+jagg_offset+1) : rawoffsets[i+1]
            append!(data, view(rawdata, rg))
            push!(elem_ptr, last(elem_ptr) + div(length(rg), _size))
        end
        real_data = ntoh.(reinterpret(T, data))
        elem_ptr .+= firstindex(real_data)
        VectorOfVectors(real_data, elem_ptr)
    #...
end

Moelf · 2021-07-25T21:56:52Z

Can we sizehint! data too?

maybe something smart like last(offset) \div sizeof(eltype(T)

Moelf · 2021-07-25T22:07:32Z

before:

your version:

no visible difference

oschulz · 2021-07-26T07:52:33Z

Hm, I'm not sure where these allocs are coming from, but it shouldn't be from within interped_data, since it only does a small, fixed number of allocations. What does main_looper do? And can you benchmark just the data reading phase?

Moelf · 2021-07-26T08:33:36Z

see #65 (comment)

for minimal reproduction of the mysterious allocation

oschulz · 2021-07-26T15:46:35Z

I'll try to have go at it ...

Moelf · 2021-07-26T15:51:46Z

I have reduced the minimal reproduction:

julia> using ArraysOfArrays

julia> const V = [rand(Float32, rand(0:4)) for _ = 1:10^5];

julia> const Vov = VectorOfVectors{Float32}()
0-element VectorOfVectors{Float32, Vector{Float32}, Vector{Int64}, Vector{Tuple{}}}

julia> foreach(V) do i
           push!(Vov, i)
       end

julia> @benchmark begin
           for i in eachindex($V)
               length($V[i])
           end
       end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  50.451 μs … 512.027 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     56.100 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   59.019 μs ±  14.257 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁▂▂ ▇█▂▆▃▂▂▂▁     ▁▁                                         ▂
  ███▁██████████████████▇▇▇▆▆▇█▇▆▆▇▇▆▆▆▆▇▆▅▅▇▅▆▅▅▆▅▆▇▆▆▆▆▆▆▇▇▇ █
  50.5 μs       Histogram: log(frequency) by time       110 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark begin
           for i in eachindex($Vov)
               length($Vov[i])
           end
       end
BenchmarkTools.Trial: 9314 samples with 1 evaluation.
 Range (min … max):  472.014 μs …  1.195 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     514.337 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   533.550 μs ± 70.951 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▁▂▆▇▇█▆▅▅▄▃▃▂▁▂                                             ▂
  ███████████████████▇▇▆▆▆▆▅▅▄▆▄▄▄▅▄▅▃▄▄▄▅▆▆▆▆▅▅▆▅▆▇███▇█▇▇▆▄▄ █
  472 μs        Histogram: log(frequency) by time       870 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> Vov == V
true

maybe this can be a ArraysOfArrays issue, not sure.

I tested the newly added Int32-offset of VoV, it's 30% faster than Int64 VoV. I really hope this is not some false sharing shenanigans

oschulz · 2021-07-26T16:12:18Z

I don't think this is the issue - VectorOfVectors has to do a little bit more work to check bounds and create the views than getindex for a vector of separate vectors. But it's still very little time, and without bounds checking the difference is actually just about a factor two:

julia> using ArraysOfArrays, BenchmarkTools

julia> V = [rand(Float32, rand(0:4)) for _ = 1:10^5];

julia> Vov = VectorOfVectors(V)

julia> @benchmark begin
           @inbounds for i in eachindex($V)
               length($V[i])
           end
       end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  34.919 μs … 132.067 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     38.391 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   40.404 μs ±   7.572 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄▁▅▆█▄▃▄▃ ▄  ▂ ▂▂▃                                           ▁
  █████████▄█▄▄█▄███▅▅▆▇▇█▇▆▆▄▅▅▆▅▄▅▄▅▅▄▄▅▅▅▆▄▅▆▄▄▆▅▄▄▆▅▂▄▄▄▃▄ █
  34.9 μs       Histogram: log(frequency) by time      79.2 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark begin
           @inbounds for i in eachindex($Vov)
               length($Vov[i])
           end
       end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  77.848 μs … 226.026 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     87.407 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   92.920 μs ±  16.003 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁ ▄▅ ▇█ ▅▄ ▂  ▃▂▄▁ ▁   ▂  ▁   ▁                              ▂
  █▅██▆█████▇█▆▇████████▇██▆█▇▇████▆▇▆▆▆▆▇▇▆▅▆▆▆▆▅▅▅▅▆▅▅▅▆▆▃▅▆ █
  77.8 μs       Histogram: log(frequency) by time       165 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

So that's about 1 ns for each element access (view creation plus length()) for VectorOfVectors (with @inbounds), and there are no memory allocations.

Moelf · 2021-07-26T19:25:59Z

some new observation:

julia> const t1 = LazyTree(ROOTFile("/home/akako/Downloads/doublemu.root"), "t", [r"^Muon_(pt|eta|phi|mass)$","MET_pt"]);

julia> length(t1.Muon_pt) ÷ 10^4
247

julia> @time begin
           for idx in 1:10^4:length(t1.Muon_pt)
               t1.Muon_pt[idx]
           end
       end
  0.000530 seconds (3.56 k allocations: 406.859 KiB)

julia> @time begin
           for idx in 1:10^3:length(t1.Muon_pt)
               t1.Muon_pt[idx]
           end
       end
  0.001023 seconds (8.01 k allocations: 546.172 KiB)

julia> @time begin
           for idx in 1:10^2:length(t1.Muon_pt)
               t1.Muon_pt[idx]
           end
       end
  0.005169 seconds (51.95 k allocations: 1.884 MiB)


julia> @time begin
           for idx in 1:10:length(t1.Muon_pt)
               t1.Muon_pt[idx]
           end
       end
  0.014718 seconds (494.29 k allocations: 15.714 MiB)

julia> @time begin
           for idx in 1:1:length(t1.Muon_pt)
               t1.Muon_pt[idx]
           end
       end
  0.194961 seconds (4.89 M allocations: 150.909 MiB, 44.43% gc time)

notice how the allocation is not linear, initially, it seems to me cache is working properly because accessing 10x more elements didn't cause 10x allocation, but later you see it scales linearly

Moelf · 2021-07-26T19:48:52Z

@oschulz I think I found a "real" minimal reproducing example

julia> const Vov = VectorOfVectors{Float32}()
0-element VectorOfVectors{Float32, Vector{Float32}, Vector{Int64}, Vector{Tuple{}}}

julia> const V = [rand(Float32, rand(0:4)) for _ = 1:10^5];

julia> foreach(V) do i
           push!(Vov, i)
       end

julia> struct A
           v::VectorOfVectors{Float32}
       end

julia> struct B
           v::Vector{Vector{Float32}}
       end

julia> const _a = A(Vov);

julia> const _b = B(V);

julia> function f(stru, idx)
           @inbounds stru.v[idx]
       end
f (generic function with 1 method)

# warm run
julia> @time begin
           for i = 1:10^5
               f(_a, i)
           end
       end
  0.005362 seconds (199.49 k allocations: 6.096 MiB)

# warm run
julia> @time begin
           for i = 1:10^5
               f(_b, i)
           end
       end
  0.000216 seconds

in fact for some reason, getting index is only allocating in @benchmark even with @inbounds

julia> @benchmark f($_a, 123)
BenchmarkTools.Trial: 10000 samples with 995 evaluations.
 Range (min … max):  22.168 ns …  1.808 μs  ┊ GC (min … max): 0.00% … 96.08%
 Time  (median):     23.265 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   26.942 ns ± 48.507 ns  ┊ GC (mean ± σ):  5.52% ±  3.06%

  ▇██▆▅▄▃▁        ▁                ▁▁▁▁                       ▂
  ████████▇▆▆▄▆▆███▇▆▅▅▄▃▃▂▂▂▄▄▇█████████████▇███▇█▇▅▅▆▅▄▆▅▆▆ █
  22.2 ns      Histogram: log(frequency) by time      50.5 ns <

 Memory estimate: 48 bytes, allocs estimate: 1.

julia> @benchmark f($_b, 123)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.344 ns … 15.178 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.514 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.521 ns ±  0.242 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

        █▁                                                    
  ▂▃▂▂▁▂███▄▂▂▄▃▂▂▁▁▄▃▂▂▁▁▂▄▂▂▁▁▁▅▇▅▄▂▁▁▂█▄▄▂▂▁▁▃▇▅▄▃▂▁▁▁▂▂▂ ▂
  1.34 ns        Histogram: frequency by time        1.73 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

oschulz · 2021-07-26T20:29:53Z

That one's easy to explain, your struct A doesn't allow for type-stability since VectorOfVectors{Float32} isa UnionAll (because VectorOfVectors has several type parameters). Try this version:

using ArraysOfArrays, BenchmarkTools

const V = [rand(Float32, rand(0:4)) for _ = 1:10^5];

const Vov = VectorOfVectors(V)

struct Foo{VV<:AbstractVector{<:AbstractVector{<:Real}}}
    v::VV
end

const _a = Foo(Vov);

const _b = Foo(V);

function f(stru, idx)
    @inbounds stru.v[idx]
end

julia> @benchmark f($_a, 123)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.977 ns … 42.563 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.307 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.337 ns ±  0.692 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                      ▇    ▅▁                 █▃    █▆     ▁  
  ▆▅▂▂▁▂▁▁▁▃▄▂▂▁▄▇▂▂▁▆█▄▅█▃██▃▃▁▃▇▄▅▃▂▂▂▆▇▂▂▁▃██▃▃▁▃██▃▃▁▁▆█ ▄
  1.98 ns        Histogram: frequency by time        2.54 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark f($_b, 123)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.108 ns … 29.334 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.219 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.215 ns ±  0.492 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                         ▅█▅▁▇▄               
  ▂▅▇▆▆▄▅▃▂▂▁▁▁▁▁▁▁▁▁▃▄▃▄▄▄▃▂▂▃▆▅▃▅▅▄▂▂▂▅███████▄▃▂▂▃▂▂▂▂▂▁▁ ▃
  1.11 ns        Histogram: frequency by time        1.27 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

tamasgal · 2021-07-26T20:32:08Z

Well spotted @oschulz. I think JET.jl should reveal such things quite easily.

Moelf · 2021-07-26T20:35:55Z

but it doesn't solve all the problem

julia> struct C
           v::VectorOfVectors{Float32, Vector{Float32}, Vector{Int64}, Vector{Tuple{}}}
       end

julia> @benchmark begin
           for i = 1:10^5
               f(_c, i)
           end
       end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  89.424 μs … 547.878 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     89.486 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   94.538 μs ±  14.213 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █   ▂ ▅▃       ▁  ▁  ▁                                       ▁
  █▆█▅█▃██▅█▇▆█▇██▇▇█▆▆█▆▆█▆▆▅▅▅▅▅▇▇▆▆▇▇▅▅▆▅▂▃▅▄▄▄▅▃▄▅▅▅▄▃▃▃▃▂ █
  89.4 μs       Histogram: log(frequency) by time       153 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark begin
           for i = 1:10^5
               f(_b, i)
           end
       end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  39.804 μs … 117.688 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     39.957 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   41.774 μs ±   6.077 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █  ▂ ▄▂          ▁  ▁                                        ▁
  ██▇█▇███▇▅▇▅█▄█▇▅█▅▅█▃▄▅▇▇▇▇▇▆▅▃▆▆▆▆▆▆▅▅▆▆▅▅▅▄▅▅▃▄▄▁▄▃▁▃▅▅▅▇ █
  39.8 μs       Histogram: log(frequency) by time      74.4 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

tested to have same speed as Foo(), and when I tested this for LazyBranch, it still allocates a lot (although 2x faster than when type is not stable)

oschulz · 2021-07-26T20:40:09Z

But this is just 1 ns vs. 0.5 ns for the access to the element vectors. Compared to an actual operation that does something real-life on those element vectors, that time difference should be negligible.

I don't think ArraysOfArrays causes the allocations, as long as things are type-stable around it.

tamasgal · 2021-07-26T20:40:39Z

I am wondering if this is somehow related to runtime dispatch deeper in the rabbit hole, but the difference is still quite "small".

Moelf · 2021-07-26T21:00:00Z

I tried the ultimate solution which is just to add another type to track buffer type and it seems to be working:

now:

julia> @time begin
           for evt in t1
               length(evt.Muon_pt)
           end
       end
  0.154825 seconds (26.07 k allocations: 177.184 MiB, 15.19% gc time, 3.46% compilation time)


julia> @time begin
           for evt in t1
               length(evt.Muon_pt)
           end
       end
  0.026082 seconds (12.45 k allocations: 1.470 MiB)

master:

julia> @time begin
           for evt in t1
               length(evt.Muon_pt)
           end
       end
  3.676705 seconds (26.81 M allocations: 1.196 GiB, 15.58% gc time, 25.18% compilation time)

julia> @time begin
           for evt in t1
               length(evt.Muon_pt)
           end
       end
  0.021609 seconds (5.90 k allocations: 767.688 KiB)

I still can't believe how well current master works with cache....

tamasgal · 2021-07-26T21:22:24Z

Ha! Indeed 😆 That looks promising.

tamasgal · 2021-07-26T21:27:09Z

@all-contributors please add @oschulz ideas

allcontributors · 2021-07-26T21:27:18Z

@tamasgal

I've put up a pull request to add @oschulz! 🎉

oschulz · 2021-07-27T08:12:38Z

Very nice @Moelf !

I hope I can find some time in the future to contribute more directly. I'm happy with UpROOT.jl as a stopgap, but being able to deal with ROOT files without installing a whole Python stack is very attractive. :-) Also, we should be able to outperform uproot a bit, esp. on ROOT files with small buffer sizes.

Moelf · 2021-07-27T08:15:31Z

we already do ;) I believe our lazy iteration (when cached) is within 2x concrete Numba loop over allocated arrays, which means we're probably a lot faster than chunked uproot code

Moelf mentioned this issue Jul 25, 2021

use ArraysOfArrays to reduce allocation #65

Merged

allcontributors bot mentioned this issue Jul 26, 2021

docs: add oschulz as a contributor for ideas #66

Merged

Moelf closed this as completed in #65 Jul 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance when weaving many branches together #64

Poor performance when weaving many branches together #64

Moelf commented Jul 24, 2021 •

edited

oschulz commented Jul 25, 2021

oschulz commented Jul 25, 2021

Moelf commented Jul 25, 2021 •

edited

oschulz commented Jul 25, 2021

Moelf commented Jul 25, 2021 •

edited

Moelf commented Jul 25, 2021 •

edited

Moelf commented Jul 25, 2021

oschulz commented Jul 25, 2021 •

edited

oschulz commented Jul 25, 2021

oschulz commented Jul 25, 2021

Moelf commented Jul 25, 2021 •

edited

Moelf commented Jul 25, 2021 •

edited

oschulz commented Jul 26, 2021

Moelf commented Jul 26, 2021

oschulz commented Jul 26, 2021

Moelf commented Jul 26, 2021 •

edited

oschulz commented Jul 26, 2021

Moelf commented Jul 26, 2021 •

edited

Moelf commented Jul 26, 2021 •

edited

oschulz commented Jul 26, 2021

tamasgal commented Jul 26, 2021

Moelf commented Jul 26, 2021 •

edited

oschulz commented Jul 26, 2021

tamasgal commented Jul 26, 2021

Moelf commented Jul 26, 2021 •

edited

tamasgal commented Jul 26, 2021

tamasgal commented Jul 26, 2021

allcontributors bot commented Jul 26, 2021

oschulz commented Jul 27, 2021

Moelf commented Jul 27, 2021 •

edited

Poor performance when weaving many branches together #64

Poor performance when weaving many branches together #64

Comments

Moelf commented Jul 24, 2021 • edited

oschulz commented Jul 25, 2021

oschulz commented Jul 25, 2021

Moelf commented Jul 25, 2021 • edited

oschulz commented Jul 25, 2021

Moelf commented Jul 25, 2021 • edited

Moelf commented Jul 25, 2021 • edited

Moelf commented Jul 25, 2021

oschulz commented Jul 25, 2021 • edited

oschulz commented Jul 25, 2021

oschulz commented Jul 25, 2021

Moelf commented Jul 25, 2021 • edited

Moelf commented Jul 25, 2021 • edited

oschulz commented Jul 26, 2021

Moelf commented Jul 26, 2021

oschulz commented Jul 26, 2021

Moelf commented Jul 26, 2021 • edited

oschulz commented Jul 26, 2021

Moelf commented Jul 26, 2021 • edited

Moelf commented Jul 26, 2021 • edited

oschulz commented Jul 26, 2021

tamasgal commented Jul 26, 2021

Moelf commented Jul 26, 2021 • edited

oschulz commented Jul 26, 2021

tamasgal commented Jul 26, 2021

Moelf commented Jul 26, 2021 • edited

tamasgal commented Jul 26, 2021

tamasgal commented Jul 26, 2021

allcontributors bot commented Jul 26, 2021

oschulz commented Jul 27, 2021

Moelf commented Jul 27, 2021 • edited

Moelf commented Jul 24, 2021 •

edited

Moelf commented Jul 25, 2021 •

edited

Moelf commented Jul 25, 2021 •

edited

Moelf commented Jul 25, 2021 •

edited

oschulz commented Jul 25, 2021 •

edited

Moelf commented Jul 25, 2021 •

edited

Moelf commented Jul 25, 2021 •

edited

Moelf commented Jul 26, 2021 •

edited

Moelf commented Jul 26, 2021 •

edited

Moelf commented Jul 26, 2021 •

edited

Moelf commented Jul 26, 2021 •

edited

Moelf commented Jul 26, 2021 •

edited

Moelf commented Jul 27, 2021 •

edited