Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,27 +13,29 @@ Tools for defining and running large computations on DiskArrays.
## Introduction

The package [`DiskArrays.jl`](https://github.com/meggart/DiskArrays.jl) implements Julia's AbstractArray interface for chunked (and possibly compressed) n-dimensional arrays that are stored on disk and operated on lazily.
Although DiskArrays.jl provides basic implementations for e.g. broadcasting or reductions over dimensions it has clear limitations when it comes to parallel computations or when broadcasting over arrays from different sources with non-aligning chunks. With `DiskArrayEngine` intend to provide a general-purpose computing backend that scales to very large n-dimensional arrays (GBs, TBs or larger) typically stored in a DiskArrays.jl-supported format like NetCDF, Zarr, ArchGDAL, HDF5Utils etc with parallelism supported by Dagger.jl.
Although DiskArrays.jl provides basic implementations for e.g. broadcasting or reductions over dimensions, it has clear limitations when it comes to parallel computations or when broadcasting over arrays from different sources with non-aligning chunks.

With `DiskArrayEngine`, we intend to provide a general-purpose computing backend that scales to very large n-dimensional arrays (GBs, TBs or larger), typically stored in a DiskArrays.jl-supported format like NetCDF, Zarr, ArchGDAL, HDF5, et cetera, with parallelism supported by `Dagger.jl`.

## Scope of the package

Before starting to jump into this package it is worth checking if it is actually the right tool for your problem. Here is a quick check-list of things to consider and possible alternatives:
Before starting to jump into this package, it is worth checking if it is actually the right tool for your problem. Here is a quick check-list of things to consider and possible alternatives:

1. Your data is too large to fit into one machine's memory (otherwise just use normal Julia Arrays)
2. [`Mmap`](https://docs.julialang.org/en/v1/stdlib/Mmap/#Mmap.mmap) is not an option (e.g. because your data comes in a compressed format, or data is stored in the cloud or your queueing system sees unrealistic memory usage by mmap)
3. Your data is too large to fit into the memory of all your workers when distributed among them (otherwise try [DistributedArrays.jl](https://github.com/JuliaParallel/DistributedArrays.jl))
4. You want to process *all* or almost all of your data and not just a small subset. Otherwise just read the subset of interest into memory and do your processing based on this one

If you are still here you should also note that this package is not intended to be used by end-users directly, but the plan is to wrap functionality from this package
in other packages, in particular YAXArrays.jl, DimensionalData.jl or PyramidSchemes.jl that provide more user-friendly interfaces for the end users.
If you are still here, you should also note that this package is not intended to be used by end-users directly, but the plan is to wrap functionality from this package
in other packages. In particular, these are YAXArrays.jl, DimensionalData.jl, or PyramidSchemes.jl, that provide more user-friendly interfaces.

## Status of the package

This package is still under active development and should be considered experimental. Expect things to break and to already be broken. In particular, extensive documentation and tests are still missing. However, some core functionality of the package is already used by e.g. [PyramidScheme.jl](https://github.com/JuliaDataCubes/PyramidScheme.jl) which is why we decided to already register this package while still under active development.
This package is still under active development and should be considered experimental. Expect things to break and to already be broken. In particular, extensive documentation and tests are still missing. However, some core functionality of the package is already used by e.g. [PyramidScheme.jl](https://github.com/JuliaDataCubes/PyramidScheme.jl) which is why we decided to register this package now, while it is still under active development.

## Basic package usage

To be done, describe the generalized moving window concept, how to define user functions, lazy interface and which runner options exist
To be done, describe the generalized moving window concept, how to define user functions, lazy interface and which runner options exist.

## EngineArrays

Expand Down
Loading