Skip to content

Commit

Permalink
extend array design document
Browse files Browse the repository at this point in the history
  • Loading branch information
mrocklin committed Jul 17, 2015
1 parent 6a316e2 commit 2efc02f
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 2 deletions.
26 changes: 25 additions & 1 deletion docs/source/array-design.rst
Expand Up @@ -12,10 +12,13 @@ dask array with from the following components

* A dask with a special set of keys designating blocks
e.g. ``('x', 0, 0), ('x', 0, 1), ...``
* A sequence of block sizes along each dimension
* A sequence of chunk sizes along each dimension called ``chunks``
e.g. ``((5, 5, 5, 5), (8, 8, 8))``
* A name to identify which keys in the dask refer to this array, e.g. ``'x'``

Keys of the dask graph
----------------------

By special convention we refer to each block of the array with a tuple of the
form ``(name, i, j, k)`` for ``i, j, k`` being the indices of the block,
ranging from ``0`` to the number of blocks in that dimension. The dask must
Expand All @@ -33,6 +36,14 @@ key-value pairs required to eventually compute the desired values, e.g.
...
}
The name of an ``Array`` object can be found in the ``name`` attribute. One
can get a nested list of keys with the ``._keys()`` method. One can flatten
down this list with ``dask.array.core.flatten()``; this is sometimes useful
when building new dictionaries.

Chunks
------

We also store the size of each block along each axis. This is a tuple of
tuples such that the length of the outer tuple is equal to the dimension and
the lengths of the inner tuples are equal to the number of blocks along each
Expand All @@ -46,6 +57,19 @@ that some operations do expect certain symmetries in the block-shapes. For
example matrix multiplication requires that blocks on each side have
anti-symmetric shapes.

Some ways in which ``chunks`` reflects properties of our array

1. ``len(x.chunks) == x.ndims``: The length of chunks is the number of dimensions
2. ``map(sum, chunks) == shape``: The sum of each internal chunk, is the
length of that dimension.
3. The length of each internal chunk is the number of keys in that dimension,
e.g. for ``chunks == ((a, b), (d, e, f))`` and name == ``'x'``
our array has tasks with the following keys::

('x', 0, 0), ('x', 0, 1), ('x', 0, 2)
('x', 1, 0), ('x', 1, 1), ('x', 1, 2)


Create an Array Object
----------------------

Expand Down
6 changes: 5 additions & 1 deletion docs/source/array-extend.rst
Expand Up @@ -11,7 +11,9 @@ you need the following:

Often ``dask.array`` functions take other ``Array`` objects as inputs along
with parameters, add tasks to a new dask dictionary, create a new ``chunks``
tuple, and then construct and return a new ``Array`` object.
tuple, and then construct and return a new ``Array`` object. The hard parts
are invariably creating the right tasks and creating a new ``chunks`` tuple.
Careful review of the `array design document`_ is suggested.


Example `eye`
Expand All @@ -21,6 +23,8 @@ Consider this simple example with the ``eye`` function.

.. code-block:: python
from dask.array.core import tokens
def eye(n, blocksize):
chunks = ((blocksize,) * n // blocksize,
(blocksize,) * n // blocksize)
Expand Down

0 comments on commit 2efc02f

Please sign in to comment.