[WIP] issue DiskArrays.jl #131 #132

Alexander-Barth · 2023-10-24T13:23:26Z

In this PR, fixes the following issue

a1 = _DiskArray(zeros(5,5,10));
size(a1[[1,2],[2,3],:])
# previous output (2, 3, 2)
# now  (2,2,3)

also it is now type stable

ulia> using NCDatasets; v = NCDataset("/tmp/sample2.nc")["data"].var;
[ Info: Precompiling NCDatasets [85f8d34a-cbdd-5861-8df4-14fed0d494ab]

julia> foo(v) = v[:,:,[1]];

julia> @code_warntype foo(v)
MethodInstance for foo(::NCDatasets.Variable{Float32, 3, NCDataset{Nothing}})
  from foo(v) @ Main REPL[2]:1
Arguments
  #self#::Core.Const(foo)
  v::NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}
Body::Array{Float32, 3}
1 ─ %1 = Main.:(:)::Core.Const(Colon())
│   %2 = Main.:(:)::Core.Const(Colon())
│   %3 = Base.vect(1)::Vector{Int64}
│   %4 = Base.getindex(v, %1, %2, %3)::Array{Float32, 3}
└──      return %4

julia> @which v[:,:,[1]]
getindex(a::NCDatasets.Variable, i...)
     @ NCDatasets ~/.julia/dev/DiskArrays/src/diskarray.jl:211

It avoids also reading too much of data.

I had to set the need_batch = false to avoid infinite recursion. Any advice would be appreciated.

The implementation in NCDatasets is added to current implementation of the special case:

batchgetindex(a::TA,indices::Vararg{Union{Int,Colon,AbstractRange{<:Integer},Vector{Int}},N}) where TA <: AbstractArray{T,N} where {T,N}

to that we do not remove any functionality.

rafaqz · 2024-01-21T23:58:33Z

test/runtests.jl

@@ -493,7 +492,8 @@ end
    #Index with range stride much larger than chunk size
    a = _DiskArray(reshape(1:100, 20, 5, 1); chunksize=(1, 5, 1))
    @test a[1:9:20, :, 1] == trueparent(a)[1:9:20, :, 1]
-    @test getindex_count(a) == 3
+    # now getindex_count(a) == 1


Doesn't this mean sparse ranges will now read all the data in some cases?

I thought the ideas of batchgetindex was to explicitly just read the required chunks.

rafaqz · 2024-01-22T00:00:22Z

src/batchgetindex.jl

@@ -171,6 +256,8 @@ function _readblock!(A::AbstractArray, A_ret, r::AbstractVector...)
        mi, ma = extrema(ids)
        return largest_jump > cs && length(ids) / (ma - mi) < 0.5
    end
+    # What TODO?: necessary to avoid infinite recursion
+    need_batch = false


This is essentially deleting half the code in this file, there may be a case for reorganising things but I think a lot of optimisations are being thrown out doing this.

We need to fix the dispatch instead.

meggart · 2024-04-16T06:34:29Z

I think the issues mentioned here have been fixed in other PRs

Alexander-Barth · 2024-04-25T11:13:40Z

Thanks Fabian!

issue DiskArrays.jl meggart#131

3936eb9

Alexander-Barth mentioned this pull request Nov 28, 2023

optimization for var[:,:,[1,2]], wrong result with indexing with two vectors and the type-stability issue #131

Closed

rafaqz reviewed Jan 21, 2024

View reviewed changes

rafaqz reviewed Jan 22, 2024

View reviewed changes

meggart closed this Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] issue DiskArrays.jl #131 #132

[WIP] issue DiskArrays.jl #131 #132

Alexander-Barth commented Oct 24, 2023 •

edited

Loading

rafaqz Jan 21, 2024 •

edited

Loading

rafaqz Jan 22, 2024 •

edited

Loading

meggart commented Apr 16, 2024

Alexander-Barth commented Apr 25, 2024 •

edited

Loading

[WIP] issue DiskArrays.jl #131 #132

[WIP] issue DiskArrays.jl #131 #132

Conversation

Alexander-Barth commented Oct 24, 2023 • edited Loading

rafaqz Jan 21, 2024 • edited Loading

Choose a reason for hiding this comment

rafaqz Jan 22, 2024 • edited Loading

Choose a reason for hiding this comment

meggart commented Apr 16, 2024

Alexander-Barth commented Apr 25, 2024 • edited Loading

Alexander-Barth commented Oct 24, 2023 •

edited

Loading

rafaqz Jan 21, 2024 •

edited

Loading

rafaqz Jan 22, 2024 •

edited

Loading

Alexander-Barth commented Apr 25, 2024 •

edited

Loading