Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance with sparse arrays #6

Closed
ivirshup opened this issue Oct 4, 2019 · 14 comments
Closed

Performance with sparse arrays #6

ivirshup opened this issue Oct 4, 2019 · 14 comments

Comments

@ivirshup
Copy link
Contributor

ivirshup commented Oct 4, 2019

I'm seeing performance drops of about three orders of magnitude when using sparse data in a DimensionalArray vs. just performing operations on the sparse array directly. This doesn't happen with dense arrays. I've included an example below for mean, but I see the same thing for other reductions like sum or maximum and even methods like copy.

# Setup
using SparseArrays
using DimensionalData
using DimensionalData: @dim, Forward
using Statistics

using BenchmarkTools

@dim Var "Variable"
@dim Obs "Observation"

sparsear = sprand(10000, 10000, .1)
sparsed = DimensionalArray(
    sparsear,
    (Var <| ["var$i" for i in 1:10000], Obs <| ["obs$i" for i in 1:10000])
)

# Jit warmups
mean(sparsear, dims=1)
mean(sparsear, dims=2)
mean(sparsed, dims=Var)
mean(sparsed, dims=Obs)

# Benchmarks
@benchmark mean($sparsear, dims=$1)
# BenchmarkTools.Trial: 
#   memory estimate:  78.20 KiB
#   allocs estimate:  2
#   --------------
#   minimum time:     7.317 ms (0.00% GC)
#   median time:      9.561 ms (0.00% GC)
#   mean time:        9.307 ms (0.11% GC)
#   maximum time:     15.023 ms (36.95% GC)
#   --------------
#   samples:          536
#   evals/sample:     1
@benchmark mean($sparsed, dims=$Var)
# BenchmarkTools.Trial: 
#   memory estimate:  305.24 MiB
#   allocs estimate:  18
#   --------------
#   minimum time:     4.467 s (2.40% GC)
#   median time:      4.873 s (2.13% GC)
#   mean time:        4.873 s (2.13% GC)
#   maximum time:     5.280 s (1.91% GC)
#   --------------
#   samples:          2
#   evals/sample:     1

This was run with DimensionalData v0.1.1 and Julia v1.2.0.

@rafaqz
Copy link
Owner

rafaqz commented Oct 4, 2019

Thanks. It's returning a sparse array, where the simple version without dims just converts to dense array.

Turns out SparseArrays defines its own version of reducedims_initarray that calls Array instead of similar, for the speed improvements you have pointed out.
https://github.com/JuliaLang/julia/blob/34d2b87b65b1643b1055b10aa5ea7d2bdbcf6cd2/stdlib/SparseArrays/src/sparsematrix.jl#L1698

But we are calling mean on a DimensionalArray wrapper that is running similar, and in turn calling similar to the parent array. I guess these methods should actually call mean on the sparse array directly - the problem is similar handles dimension rebuilding for us.

I'm not sure what the cleanest solution is, we might need to add a method for reducedims_initarray(A::DimensionalArray{T,N,SparseArray}) to override the base case. Nothing else in base has a method for that so its a one off, but I hope nothing in the ecosystem does either...

I'm not sure the approach I've taken to mean etc and dim rebuilding makes sense in the long term, it was an experiment that worked surprisingly well for simple things but might break due to my lack of knowledge of base methods.

@rafaqz
Copy link
Owner

rafaqz commented Oct 4, 2019

By the way ["var$i" for i in 1:10000] isn't a very efficient lookup (ie string comparisons), why not just use a range?

@ivirshup
Copy link
Contributor Author

ivirshup commented Oct 5, 2019

Thanks for the quick response, and the package! It's really nice to see progress being made on labelled arrays in Julia.

I don't think I've been using julia heavily enough recently to provide much insight for the implementation. That said, I was expecting the to look much more like DimensionalArray(mean(a.data, dimnum(a, dims)), #= calculated output dimensions =#). I think this would make sense for a number of cases, especially since there would be some space to deal with removing labels for reduced dimension (e.g. if you reduce over latitude, I wouldn't expect any latitude label in the output).

Are you sure the reduction is causing the performance issue? I agree it explains the difference in output type, but there are similar performance issues with other operations like copy or broadcast scalar functions (abs.):

`copy` benchmark
julia> @benchmark copy($sparsear)
BenchmarkTools.Trial: 
  memory estimate:  152.64 MiB
  allocs estimate:  7
  --------------
  minimum time:     29.745 ms (2.43% GC)
  median time:      40.765 ms (30.34% GC)
  mean time:        41.141 ms (31.30% GC)
  maximum time:     91.379 ms (68.68% GC)
  --------------
  samples:          122
  evals/sample:     1

julia> @benchmark copy($sparsed)
BenchmarkTools.Trial: 
  memory estimate:  152.64 MiB
  allocs estimate:  8
  --------------
  minimum time:     3.843 s (0.02% GC)
  median time:      3.849 s (0.20% GC)
  mean time:        3.849 s (0.20% GC)
  maximum time:     3.855 s (0.38% GC)
  --------------
  samples:          2
  evals/sample:     1
`abs.` benchmark
julia> @benchmark abs.(sparsear)
BenchmarkTools.Trial: 
  memory estimate:  152.64 MiB
  allocs estimate:  9
  --------------
  minimum time:     41.412 ms (1.47% GC)
  median time:      52.996 ms (25.29% GC)
  mean time:        53.401 ms (25.88% GC)
  maximum time:     110.823 ms (65.40% GC)
  --------------
  samples:          94
  evals/sample:     1

julia> @benchmark abs.(sparsed)
BenchmarkTools.Trial: 
  memory estimate:  763.09 MiB
  allocs estimate:  13
  --------------
  minimum time:     1.969 s (0.49% GC)
  median time:      2.025 s (3.01% GC)
  mean time:        2.028 s (2.98% GC)
  maximum time:     2.090 s (5.29% GC)
  --------------
  samples:          3
  evals/sample:     1

To me, the memory usage suggests an intermediate dense array is being created.

By the way ["var$i" for i in 1:10000] isn't a very efficient lookup (ie string comparisons), why not just use a range?

For my use case, the observations are barcodes for cells and the variable are gene identifiers. Access via string names is important for user interface, though any heavy computation should be accessing the values by position.

I think I was expecting xarray/ pandas like behavior, where an index (probably hash table) would be used for string labelled dimensions.

@rafaqz
Copy link
Owner

rafaqz commented Oct 5, 2019

The sparse array created in similar() is very likely to be causing the performance issue for mean. That's why base avoids it. But there may be more issues with copy etc. For sure. I haven't even thought about sparse arrays before this conversation because I don't use them. There might need to be some architectural changes.

The dims do actually change with mean, but julia arrays don't drop the dimension, just reduce it to size 1. So the lat dim has to stay, with a length of 1. What will eventually happen is there will be a cell size field on the dim that shows that the single index now covers the whole original lattitude span in one cell, which is good information to keep (and will show on the plot).

The lookup is very rudimentary at the moment. It will be fast enough with numbers. If you want fast hashes, AcceleratedArrays.jl should work inside dims, and trying it would be interesting. But I haven't worked on that yet. Xarray and Pandas have developer teams and this was just me hacking away for a week lol!! I just need GeoData.jl for the modelling packages that I'm actually paid to write and others I do research with, and DimensionalData.jl is a bonus.

It wont be practically comparable for a few years, and only then if enough people jump in and help (or the same happens with an alternative package).

@ivirshup
Copy link
Contributor Author

ivirshup commented Oct 7, 2019

Thanks for recommending AcceleratedArrays.jl! I've been wondering if a package like that existed for a while now.

Xarray and Pandas have developer teams and this was just me hacking away for a week lol!!

For sure! I'm definitely not expecting a single developer v0.1.1 package to be totally done.

The dims do actually change with mean, but julia arrays don't drop the dimension, just reduce it to size 1. So the lat dim has to stay, with a length of 1.

Definitely agree that the dimension should stay. The main thing was that a single label on that dimension stayed, which you've got a plan for.

The sparse array created in similar() is very likely to be causing the performance issue for mean.

I did a little more digging into what's happening in mean, and I think there is a small effect from using similar, but it there's a more general issue of dispatch. Swapping out the call to reducedims_initarray with a precomputed value shows very similar result times:

Benchmarks
julia> ddinit = Base.reducedim_init(t -> t/2, +, sparsed, 1)
       sainit = Base.reducedim_init(t -> t/2, +, sparsear, 1);

julia> @benchmark Statistics.mean!($ddinit, $sparsed)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.666 s (0.00% GC)
  median time:      1.673 s (0.00% GC)
  mean time:        1.674 s (0.00% GC)
  maximum time:     1.685 s (0.00% GC)
  --------------
  samples:          3
  evals/sample:     1

julia> @benchmark Statistics.mean!($sainit, $sparsed)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.677 s (0.00% GC)
  median time:      1.677 s (0.00% GC)
  mean time:        1.696 s (0.00% GC)
  maximum time:     1.734 s (0.00% GC)
  --------------
  samples:          3
  evals/sample:     1

It looks like the big change in time is actually coming from the sum! call inside mean!, and that time is independent on the preallocated output array.

Benchmarks
julia> @benchmark sum!($sainit, $sparsed)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.672 s (0.00% GC)
  median time:      1.675 s (0.00% GC)
  mean time:        1.680 s (0.00% GC)
  maximum time:     1.692 s (0.00% GC)
  --------------
  samples:          3
  evals/sample:     1

julia> @benchmark sum!($ddinit, $sparsear)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.222 ms (0.00% GC)
  median time:      4.337 ms (0.00% GC)
  mean time:        4.365 ms (0.00% GC)
  maximum time:     7.741 ms (0.00% GC)
  --------------
  samples:          1144
  evals/sample:     1

The main time discrepancy ends up coming from different methods being called by mapreduce. The one argument sum (which sees a performance difference and was easier to trace) calls Base._mapreduce(f, op, ::Base.IndexCartesian, A::SparseMatrixCSC{T}) for the sparse array, while mapfoldl(f, op, A) ends up being called on the DimensionalArray. When dimension is provided sum calls _mapreducedim!(f, op, R::AbstractArray, A::AbstractArray) on the DimensionalArray, while _mapreducedim!(f, op, R::AbstractArray, A::SparseMatrixCSC{T}) is called on the sparse array.

Why I say this is a more general issue of dispatch is methods specialized on the wrapped array aren't being called. I imagine similar issues will crop up with other array types, since there isn't really a step to allow dispatch based on the array holding the data. This does seem tough to solve in an elegant way.

Btw, AxisArrays seems to have the same performance issue 😉.

@rafaqz
Copy link
Owner

rafaqz commented Oct 7, 2019

Thanks for digging into that. That makes sense. It's specialised methods being dispatched on the SparseArray vs general methods on AbstractDimensionalArray. Fixing this will be a design goal when I rewrite the math/stats methods. But I feel better if AxisArrays does it too lol

Let me know how you go using AcceleratedArrays, and do put in an issue if any changes need to happen here for it to work.

@rafaqz
Copy link
Owner

rafaqz commented Oct 13, 2019

This should be fixed by fda792b

@rafaqz rafaqz closed this as completed Oct 13, 2019
@ivirshup
Copy link
Contributor Author

Thanks for that! It fixes some cases but others, like mean(sparsed) and copy(sparsed), still dispatch to an inefficient AbstractArray method.

Without using something like Cassette, can you think of anyway this could be done less manually? Maybe just a public interface for "registering" a function with DimensionalData could make this easier?

BTW, I gave AcceleratedArrays a shot, but didn't see a speedup. I think this is due to how DimensionalData.at is written. It looks simple enough to fix, but I think it might be worth waiting for the next release of AcceleratedArrays since the API is meant to change.

@rafaqz rafaqz reopened this Oct 14, 2019
@rafaqz
Copy link
Owner

rafaqz commented Oct 14, 2019

Really?
https://travis-ci.org/rafaqz/DimensionalData.jl/jobs/597286088#L312
https://github.com/rafaqz/DimensionalData.jl/blob/master/test/benchmarks.jl#L110

It should be even closer using Var() instead of Var (which isn't type stable).

Also thanks for checking out AcceleratedArrays. Yeah At needs work, Near is at least using binary search allready. Once the API changes I'll have a look, although I'm concerned about depending on AccelleratedArrays, maybe a glue package will be required if it can't be generic.

@rafaqz
Copy link
Owner

rafaqz commented Oct 14, 2019

Ok copy still needs a method, but mean should be fine now.

Edit: Copy should also be fixed in #13.

Let me know if any other methods give you trouble. I guess this process has to be piecemeal as its hard to know where generic methods aren't good enough and I don't want bloat for no reason, or have time to test everything.

@rafaqz
Copy link
Owner

rafaqz commented Oct 14, 2019

@ivirshup
Copy link
Contributor Author

Ok copy still needs a method, but mean should be fine now.

Oh, I mean the single argument version of mean. For the same variables defined above, using f9eeb37:

Benchmark
julia> @benchmark mean($sparsed)
BenchmarkTools.Trial: 
  memory estimate:  128 bytes
  allocs estimate:  6
  --------------
  minimum time:     1.614 s (0.00% GC)
  median time:      1.623 s (0.00% GC)
  mean time:        1.623 s (0.00% GC)
  maximum time:     1.633 s (0.00% GC)
  --------------
  samples:          4
  evals/sample:     1

julia> @benchmark mean($sparsear)
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  4
  --------------
  minimum time:     4.018 ms (0.00% GC)
  median time:      4.112 ms (0.00% GC)
  mean time:        4.185 ms (0.00% GC)
  maximum time:     9.738 ms (0.00% GC)
  --------------
  samples:          1193
  evals/sample:     1

I think it just needs something like Statistics.mean(A::DimensionalArray) = mean(A.data). I think other reductions could use this as well, at least sum has similar issues.

It should be even closer using Var() instead of Var (which isn't type stable).

Huh. Why isn't Var type-stable? I'm not sure I see how an instance would be, but the type wouldn't be.

@rafaqz
Copy link
Owner

rafaqz commented Oct 14, 2019

Oh right without dims. Those are just missing methods. There are probably 50 of them lets make a list... Obviously everything I allready have dims methods for.

Var can have issues if it's put in a tuple. UnionAll dispatch or something. I should be clearer on why... It might not apply here, but often does in DD.

@rafaqz
Copy link
Owner

rafaqz commented Oct 14, 2019

Ok I've added single argument methods for everything that already had dims methods to that pull request.

Thanks for the input, keep them coming :)

@rafaqz rafaqz closed this as completed Oct 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants