Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] issue DiskArrays.jl #131 #132

Closed
wants to merge 1 commit into from

Conversation

Alexander-Barth
Copy link
Contributor

@Alexander-Barth Alexander-Barth commented Oct 24, 2023

In this PR, fixes the following issue

a1 = _DiskArray(zeros(5,5,10));
size(a1[[1,2],[2,3],:])
# previous output (2, 3, 2)
# now  (2,2,3)

also it is now type stable

ulia> using NCDatasets; v = NCDataset("/tmp/sample2.nc")["data"].var;
[ Info: Precompiling NCDatasets [85f8d34a-cbdd-5861-8df4-14fed0d494ab]

julia> foo(v) = v[:,:,[1]];

julia> @code_warntype foo(v)
MethodInstance for foo(::NCDatasets.Variable{Float32, 3, NCDataset{Nothing}})
  from foo(v) @ Main REPL[2]:1
Arguments
  #self#::Core.Const(foo)
  v::NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}
Body::Array{Float32, 3}
1 ─ %1 = Main.:(:)::Core.Const(Colon())
│   %2 = Main.:(:)::Core.Const(Colon())
│   %3 = Base.vect(1)::Vector{Int64}
│   %4 = Base.getindex(v, %1, %2, %3)::Array{Float32, 3}
└──      return %4

julia> @which v[:,:,[1]]
getindex(a::NCDatasets.Variable, i...)
     @ NCDatasets ~/.julia/dev/DiskArrays/src/diskarray.jl:211

It avoids also reading too much of data.

I had to set the need_batch = false to avoid infinite recursion. Any advice would be appreciated.

The implementation in NCDatasets is added to current implementation of the special case:

batchgetindex(a::TA,indices::Vararg{Union{Int,Colon,AbstractRange{<:Integer},Vector{Int}},N}) where TA <: AbstractArray{T,N} where {T,N}

to that we do not remove any functionality.

@@ -493,7 +492,8 @@ end
#Index with range stride much larger than chunk size
a = _DiskArray(reshape(1:100, 20, 5, 1); chunksize=(1, 5, 1))
@test a[1:9:20, :, 1] == trueparent(a)[1:9:20, :, 1]
@test getindex_count(a) == 3
# now getindex_count(a) == 1
Copy link
Collaborator

@rafaqz rafaqz Jan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean sparse ranges will now read all the data in some cases?

I thought the ideas of batchgetindex was to explicitly just read the required chunks.

@@ -171,6 +256,8 @@ function _readblock!(A::AbstractArray, A_ret, r::AbstractVector...)
mi, ma = extrema(ids)
return largest_jump > cs && length(ids) / (ma - mi) < 0.5
end
# What TODO?: necessary to avoid infinite recursion
need_batch = false
Copy link
Collaborator

@rafaqz rafaqz Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially deleting half the code in this file, there may be a case for reorganising things but I think a lot of optimisations are being thrown out doing this.

We need to fix the dispatch instead.

@meggart meggart closed this Apr 16, 2024
@meggart
Copy link
Owner

meggart commented Apr 16, 2024

I think the issues mentioned here have been fixed in other PRs

@Alexander-Barth
Copy link
Contributor Author

Alexander-Barth commented Apr 25, 2024

Thanks Fabian!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants