improve explanation

gbeckers · Sep 20, 2023 · 95af4fe · 95af4fe
1 parent 33a7422
commit 95af4fe
Show file tree

Hide file tree

Showing 3 changed files with 78 additions and 59 deletions.
diff --git a/README.rst b/README.rst
@@ -4,30 +4,31 @@ Darr
 |Github CI Status| |Appveyor Status| |PyPi version| |Conda Forge|
 |Codecov Badge| |Docs Status| |Zenodo Badge|
 
-Darr is a Python science library that allows you to work with potentially
-very large, disk-based Numpy arrays that are self-documented, and that can
-be read in many other popular languages for data analysis with minimal
-effort.
+Darr is a Python science library that allows you to work efficiently with
+potentially very large, disk-based Numpy arrays that are self-documented.
+Documentation includes copy-paste ready code to read arrays in many popular
+data science languages such as R, Julia, Scilab, IDL, Matlab, Maple, and
+Mathematica, or in Python/Numpy without Darr. Without exporting
+them and with minimal effort.
 
 Universal readability of data is a pillar of good scientific practice. It is
 also generally a good idea for anyone who wants to flexibly move between
 analysis environments, who wants to save data for the longer term, or who
 wants to share data with others without spending much time on figuring out
-and/or explaining how the receiver can read it. As you work with you darr
-array, its documentation is automatically kept up to date, including a
-complete and human-readable description, as well as code to read the array
-in popular languages such as R, Julia, Scilab, IDL, Matlab, Maple, and
-Mathematica, or in Python/Numpy without Darr
-(see `example
+and/or explaining how the receiver can read it. No idea how to read
+your 7-dimensional uint32 numpy array in Matlab to quickly try out an
+algorithm your colleague wrote? No worries, a quick copy-paste of code from
+the array documentation is all that is needed to read your data in, e.g. R or
+Matlab (see `example
 <https://github.com/gbeckers/Darr/tree/master/examplearrays/arrays/array_int32_2D.darr>`__).
-A quick copy-paste of code from the array documentation is in most cases all
-that is needed to read your data in, e.g. R or Matlab. No need to export
-anything, make notes, or to provide elaborate explanation. No dependence on
-complicated formats or specialized libraries. No looking up things.
+As you work with your array, its documentation is automatically kept
+up to date. No need to export anything, make notes, or to provide elaborate
+explanation. No looking up things. No dependence on complicated formats or
+specialized libraries for reading you data elsewhere later.
 
 In essence, Darr makes it trivially easy to share your numerical arrays with
 others or with yourself when working in different computing environments,
-and makes them future-proof by providing documentation.
+and stores them in a future-proof way.
 
 More rationale for a tool-independent approach to numeric array storage is
 provided `here <https://darr.readthedocs.io/en/latest/rationale.html>`__.
@@ -37,7 +38,7 @@ established and trusted way of working with disk-based numerical data, and
 which makes Darr fully NumPy compatible. This enables efficient out-of-core
 read/write access to potentially very large arrays. In addition to
 automatic documentation, Darr adds other functionality to NumPy's memmap,
-such as easy appending and truncating data, support for ragged arrays,
+such as easy the appending and truncating of data, support for ragged arrays,
 the ability to create arrays from iterators, and easy use of metadata. Flat
 binary files and (JSON) text files are accompanied by a README text file
 that explains how the array and metadata are stored (`see example arrays
@@ -77,7 +78,9 @@ Features
 
 Drawbacks
 ---------
--  No compression, although compression for archiving purposes is supported.
+- No compression, although compression for archiving purposes is supported.
+- Array storages uses multiple files as binary data is separated from text
+documentation and metadata.
 
 Installation
 ------------

diff --git a/docs/index.rst b/docs/index.rst
@@ -9,30 +9,31 @@ Darr
 |Github CI Status| |Appveyor Status| |PyPi version| |Conda Forge|
 |Codecov Badge| |Docs Status| |Zenodo Badge|
 
-Darr is a Python science library that allows you to work with potentially
-very large, disk-based Numpy arrays that are self-documented, and that can
-be read in many other popular languages for data analysis with minimal
-effort.
+Darr is a Python science library that allows you to work efficiently with
+potentially very large, disk-based Numpy arrays that are self-documented.
+Documentation includes copy-paste ready code to read arrays in many popular
+data science languages such as R, Julia, Scilab, IDL, Matlab, Maple, and
+Mathematica, or in Python/Numpy without Darr. Without exporting
+them and with minimal effort.
 
 Universal readability of data is a pillar of good scientific practice. It is
 also generally a good idea for anyone who wants to flexibly move between
 analysis environments, who wants to save data for the longer term, or who
 wants to share data with others without spending much time on figuring out
-and/or explaining how the receiver can read it. As you work with you darr
-array, its documentation is automatically kept up to date, including a
-complete and human-readable description, as well as code to read the array
-in popular languages such as R, Julia, Scilab, IDL, Matlab, Maple, and
-Mathematica, or in Python/Numpy without Darr
-(see `example
+and/or explaining how the receiver can read it. No idea how to read
+your 7-dimensional uint32 numpy array in Matlab to quickly try out an
+algorithm your colleague wrote? No worries, a quick copy-paste of code from
+the array documentation is all that is needed to read your data in, e.g. R or
+Matlab (see `example
 <https://github.com/gbeckers/Darr/tree/master/examplearrays/arrays/array_int32_2D.darr>`__).
-A quick copy-paste of code from the array documentation is in most cases all
-that is needed to read your data in, e.g. R or Matlab. No need to export
-anything, make notes, or to provide elaborate explanation. No dependence on
-complicated formats or specialized libraries. No looking up things.
+As you work with your array, its documentation is automatically kept
+up to date. No need to export anything, make notes, or to provide elaborate
+explanation. No looking up things. No dependence on complicated formats or
+specialized libraries for reading you data elsewhere later.
 
 In essence, Darr makes it trivially easy to share your numerical arrays with
 others or with yourself when working in different computing environments,
-and makes them future-proof by providing documentation.
+and stores them in a future-proof way.
 
 More rationale for a tool-independent approach to numeric array storage is
 provided `here <https://darr.readthedocs.io/en/latest/rationale.html>`__.
@@ -42,7 +43,7 @@ established and trusted way of working with disk-based numerical data, and
 which makes Darr fully NumPy compatible. This enables efficient out-of-core
 read/write access to potentially very large arrays. In addition to
 automatic documentation, Darr adds other functionality to NumPy's memmap,
-such as easy appending and truncating data, support for ragged arrays,
+such as easy the appending and truncating of data, support for ragged arrays,
 the ability to create arrays from iterators, and easy use of metadata. Flat
 binary files and (JSON) text files are accompanied by a README text file
 that explains how the array and metadata are stored (`see example arrays
@@ -62,15 +63,18 @@ Features
    universal readability.
 -  Automatic self-documention, including copy-paste ready code snippets for
    reading the array in a number of popular data analysis environments, such as
-   Python (without Darr), R, Julia, Scilab, Octave/Matlab, GDL/IDL, and  
-   Mathematica (see `example array
+   Python (without Darr), R, Julia, Scilab, Octave/Matlab, GDL/IDL, and
+   Mathematica
+   (see `example array
    <https://github.com/gbeckers/Darr/tree/master/examplearrays/arrays/array_int32_2D.darr>`__).
 -  Disk-persistent array data is directly accessible through `NumPy
    indexing <https://numpy.org/doc/stable/reference/arrays.indexing.html>`__
-   and may be larger than RAM and that is easily appendable.
+   and may be larger than RAM
+-  Easy and efficient appending of data (`see example <https://darr.readthedocs.io/en/latest/tutorialarray.html#appending-data>`__).
 -  Supports ragged arrays.
 -  Easy use of metadata, stored in a widely readable separate
-   `JSON <https://en.wikipedia.org/wiki/JSON>`__ text file.
+   JSON text file (`see example
+   <https://darr.readthedocs.io/en/latest/tutorialarray.html#metadata>`__).
 -  Many numeric types are supported: (u)int8-(u)int64, float16-float64,
    complex64, complex128.
 -  Integrates easily with the `Dask <https://dask.pydata.org/en/latest/>`__
@@ -79,7 +83,9 @@ Features
 
 Drawbacks
 ---------
--  No compression, although compression for archiving purposes is supported.
+- No compression, although compression for archiving purposes is supported.
+- Array storages uses multiple files as binary data is separated from text
+documentation and metadata.
 
 Darr officially depends on Python 3.9 or higher. Older versions may work
 (probably >= 3.6) but are not tested anymore.

diff --git a/setup.py b/setup.py
@@ -10,30 +10,31 @@
 """|Github CI Status| |Appveyor Status| |PyPi version| |Conda Forge|
 |Codecov Badge| |Docs Status| |Zenodo Badge|
 
-Darr is a Python science library that allows you to work with potentially
-very large, disk-based Numpy arrays that are self-documented, and that can
-be read in many other popular languages for data analysis with minimal
-effort.
+Darr is a Python science library that allows you to work efficiently with 
+potentially very large, disk-based Numpy arrays that are self-documented. 
+Documentation includes copy-paste ready code to read arrays in many popular 
+data science languages such as R, Julia, Scilab, IDL, Matlab, Maple, and 
+Mathematica, or in Python/Numpy without Darr. Without exporting 
+them and with minimal effort.
 
 Universal readability of data is a pillar of good scientific practice. It is
 also generally a good idea for anyone who wants to flexibly move between
 analysis environments, who wants to save data for the longer term, or who
 wants to share data with others without spending much time on figuring out
-and/or explaining how the receiver can read it. As you work with you darr
-array, its documentation is automatically kept up to date, including a
-complete and human-readable description, as well as code to read the array
-in popular languages such as R, Julia, Scilab, IDL, Matlab, Maple, and
-Mathematica, or in Python/Numpy without Darr
-(see `example
+and/or explaining how the receiver can read it. No idea how to read 
+your 7-dimensional uint32 numpy array in Matlab to quickly try out an 
+algorithm your colleague wrote? No worries, a quick copy-paste of code from 
+the array documentation is all that is needed to read your data in, e.g. R or 
+Matlab (see `example
 <https://github.com/gbeckers/Darr/tree/master/examplearrays/arrays/array_int32_2D.darr>`__).
-A quick copy-paste of code from the array documentation is in most cases all
-that is needed to read your data in, e.g. R or Matlab. No need to export
-anything, make notes, or to provide elaborate explanation. No dependence on
-complicated formats or specialized libraries. No looking up things.
+As you work with your array, its documentation is automatically kept 
+up to date. No need to export anything, make notes, or to provide elaborate 
+explanation. No looking up things. No dependence on complicated formats or 
+specialized libraries for reading you data elsewhere later. 
 
 In essence, Darr makes it trivially easy to share your numerical arrays with
-others or with yourself when working in different computing environments, 
-and makes them future-proof by providing documentation.
+others or with yourself when working in different computing environments,
+and stores them in a future-proof way.
 
 More rationale for a tool-independent approach to numeric array storage is
 provided `here <https://darr.readthedocs.io/en/latest/rationale.html>`__.
@@ -43,7 +44,7 @@
 which makes Darr fully NumPy compatible. This enables efficient out-of-core
 read/write access to potentially very large arrays. In addition to
 automatic documentation, Darr adds other functionality to NumPy's memmap,
-such as easy appending and truncating data, support for ragged arrays,
+such as easy the appending and truncating of data, support for ragged arrays,
 the ability to create arrays from iterators, and easy use of metadata. Flat
 binary files and (JSON) text files are accompanied by a README text file
 that explains how the array and metadata are stored (`see example arrays
@@ -63,21 +64,30 @@
    universal readability.
 -  Automatic self-documention, including copy-paste ready code snippets for
    reading the array in a number of popular data analysis environments, such as
-   Python (without Darr), R, Julia, Scilab, Octave/Matlab, GDL/IDL, 
-   and Mathematica (see `example array
+   Python (without Darr), R, Julia, Scilab, Octave/Matlab, GDL/IDL, and
+   Mathematica
+   (see `example array
    <https://github.com/gbeckers/Darr/tree/master/examplearrays/arrays/array_int32_2D.darr>`__).
 -  Disk-persistent array data is directly accessible through `NumPy
    indexing <https://numpy.org/doc/stable/reference/arrays.indexing.html>`__
-   and may be larger than RAM and that is easily appendable.
+   and may be larger than RAM
+-  Easy and efficient appending of data (`see example <https://darr.readthedocs.io/en/latest/tutorialarray.html#appending-data>`__).
 -  Supports ragged arrays.
 -  Easy use of metadata, stored in a widely readable separate
-   `JSON <https://en.wikipedia.org/wiki/JSON>`__ text file.
+   JSON text file (`see example
+   <https://darr.readthedocs.io/en/latest/tutorialarray.html#metadata>`__).
 -  Many numeric types are supported: (u)int8-(u)int64, float16-float64,
    complex64, complex128.
 -  Integrates easily with the `Dask <https://dask.pydata.org/en/latest/>`__
    library for out-of-core computation on very large arrays.
 -  Minimal dependencies, only `NumPy <http://www.numpy.org/>`__.
 
+Drawbacks
+---------
+- No compression, although compression for archiving purposes is supported.
+- Array storages uses multiple files as binary data is separated from text 
+documentation and metadata.
+
 See the `documentation <http://darr.readthedocs.io/>`__ for more information.
 
 .. |Github CI Status| image:: https://github.com/gbeckers/Darr/actions/workflows/python_package.yml/badge.svg