From 86f83668b94d4978f2d6edaa5aca9da88f66705b Mon Sep 17 00:00:00 2001 From: Anshul Singhvi Date: Tue, 18 Feb 2025 14:20:02 -0500 Subject: [PATCH 1/2] Update grammar in README --- README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 6e3b3e1..0f9d566 100644 --- a/README.md +++ b/README.md @@ -13,19 +13,21 @@ Tools for defining and running large computations on DiskArrays. ## Introduction The package [`DiskArrays.jl`](https://github.com/meggart/DiskArrays.jl) implements Julia's AbstractArray interface for chunked (and possibly compressed) n-dimensional arrays that are stored on disk and operated on lazily. -Although DiskArrays.jl provides basic implementations for e.g. broadcasting or reductions over dimensions it has clear limitations when it comes to parallel computations or when broadcasting over arrays from different sources with non-aligning chunks. With `DiskArrayEngine` intend to provide a general-purpose computing backend that scales to very large n-dimensional arrays (GBs, TBs or larger) typically stored in a DiskArrays.jl-supported format like NetCDF, Zarr, ArchGDAL, HDF5Utils etc with parallelism supported by Dagger.jl. +Although DiskArrays.jl provides basic implementations for e.g. broadcasting or reductions over dimensions, it has clear limitations when it comes to parallel computations or when broadcasting over arrays from different sources with non-aligning chunks. + +With `DiskArrayEngine`, we intend to provide a general-purpose computing backend that scales to very large n-dimensional arrays (GBs, TBs or larger), typically stored in a DiskArrays.jl-supported format like NetCDF, Zarr, ArchGDAL, HDF5, et cetera, with parallelism supported by `Dagger.jl`. ## Scope of the package -Before starting to jump into this package it is worth checking if it is actually the right tool for your problem. Here is a quick check-list of things to consider and possible alternatives: +Before starting to jump into this package, it is worth checking if it is actually the right tool for your problem. Here is a quick check-list of things to consider and possible alternatives: 1. Your data is too large to fit into one machine's memory (otherwise just use normal Julia Arrays) 2. [`Mmap`](https://docs.julialang.org/en/v1/stdlib/Mmap/#Mmap.mmap) is not an option (e.g. because your data comes in a compressed format, or data is stored in the cloud or your queueing system sees unrealistic memory usage by mmap) 3. Your data is too large to fit into the memory of all your workers when distributed among them (otherwise try [DistributedArrays.jl](https://github.com/JuliaParallel/DistributedArrays.jl)) 4. You want to process *all* or almost all of your data and not just a small subset. Otherwise just read the subset of interest into memory and do your processing based on this one -If you are still here you should also note that this package is not intended to be used by end-users directly, but the plan is to wrap functionality from this package -in other packages, in particular YAXArrays.jl, DimensionalData.jl or PyramidSchemes.jl that provide more user-friendly interfaces for the end users. +If you are still here, you should also note that this package is not intended to be used by end-users directly, but the plan is to wrap functionality from this package +in other packages. In particular, these are YAXArrays.jl, DimensionalData.jl,or PyramidSchemes.jl, that provide more user-friendly interfaces. ## Status of the package @@ -33,7 +35,7 @@ This package is still under active development and should be considered experime ## Basic package usage -To be done, describe the generalized moving window concept, how to define user functions, lazy interface and which runner options exist +To be done, describe the generalized moving window concept, how to define user functions, lazy interface and which runner options exist. ## EngineArrays From a8141cd43ef995f867164282e27ca83fa0c1796a Mon Sep 17 00:00:00 2001 From: Anshul Singhvi Date: Tue, 18 Feb 2025 14:20:53 -0500 Subject: [PATCH 2/2] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 0f9d566..d3f8286 100644 --- a/README.md +++ b/README.md @@ -27,11 +27,11 @@ Before starting to jump into this package, it is worth checking if it is actuall 4. You want to process *all* or almost all of your data and not just a small subset. Otherwise just read the subset of interest into memory and do your processing based on this one If you are still here, you should also note that this package is not intended to be used by end-users directly, but the plan is to wrap functionality from this package -in other packages. In particular, these are YAXArrays.jl, DimensionalData.jl,or PyramidSchemes.jl, that provide more user-friendly interfaces. +in other packages. In particular, these are YAXArrays.jl, DimensionalData.jl, or PyramidSchemes.jl, that provide more user-friendly interfaces. ## Status of the package -This package is still under active development and should be considered experimental. Expect things to break and to already be broken. In particular, extensive documentation and tests are still missing. However, some core functionality of the package is already used by e.g. [PyramidScheme.jl](https://github.com/JuliaDataCubes/PyramidScheme.jl) which is why we decided to already register this package while still under active development. +This package is still under active development and should be considered experimental. Expect things to break and to already be broken. In particular, extensive documentation and tests are still missing. However, some core functionality of the package is already used by e.g. [PyramidScheme.jl](https://github.com/JuliaDataCubes/PyramidScheme.jl) which is why we decided to register this package now, while it is still under active development. ## Basic package usage