Skip to content

Commit

Permalink
Document deterministic names
Browse files Browse the repository at this point in the history
  • Loading branch information
shoyer committed Sep 5, 2015
1 parent cbeb70d commit 2ff6cce
Show file tree
Hide file tree
Showing 5 changed files with 27 additions and 12 deletions.
20 changes: 10 additions & 10 deletions doc/dask.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ benefits of using dask are sufficiently strong that we expect that dask may
become a requirement for a future version of xray.

For a full example of how to use xray's dask integration, read the
`blog post introducing xray + dask`_.
`blog post introducing xray and dask`_.

.. _blog post introducing xray + dask: http://continuum.io/blog/xray-dask
.. _blog post introducing xray and dask: http://continuum.io/blog/xray-dask

What is a dask array?
---------------------
Expand Down Expand Up @@ -143,10 +143,10 @@ Explicit conversion by wrapping a DataArray with ``np.asarray`` also works:
[ 1.337e+00, -1.531e+00, ..., 8.726e-01, -1.538e+00],
...

With the current versions of xray and dask, there is no automatic conversion
of eager numpy arrays to dask arrays, nor automatic alignment of chunks when
performing operations between dask arrays with different chunk sizes. You will
need to explicitly chunk each array to ensure compatibility. With xray, both
With the current version of dask, there is no automatic alignment of chunks when
performing operations between dask arrays with different chunk sizes. If your
computation involves multiple dask arrays with different chunks, you may need to
explicitly rechunk each array to ensure compatibility. With xray, both
converting data to a dask arrays and converting the chunk sizes of dask arrays
is done with the :py:meth:`~xray.Dataset.chunk` method:

Expand All @@ -166,16 +166,16 @@ You can view the size of existing chunks on an array by viewing the
rechunked.chunks
If there are not consistent chunksizes between all the ararys in a dataset
If there are not consistent chunksizes between all the arrays in a dataset
along a particular dimension, an exception is raised when you try to access
``.chunks``.

.. note::

In the future, we would like to enable automatic alignment of dask
chunksizes and automatic conversion of numpy arrays to dask (but not the
other way around). We might also require that all arrays in a dataset
share the same chunking alignment. None of these are currently done.
chunksizes (but not the other way around). We might also require that all
arrays in a dataset share the same chunking alignment. Neither of these
are currently done.

NumPy ufuncs like ``np.sin`` currently only work on eagerly evaluated arrays
(this will change with the next major NumPy release). We have provided
Expand Down
5 changes: 5 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ What's New
v0.6.1
------

The minimum required version of dask for use with xray is now version 0.6.

API Changes
~~~~~~~~~~~

Expand All @@ -28,6 +30,9 @@ Enhancements
attributes to Dataset and DataArray (:issue:`553`).
- More informative error message with :py:meth:`~xray.Dataset.from_dataframe`
if the frame has duplicate columns.
- xray now uses deterministic names for dask arrays it creates or opens from
disk. This allows xray users to take advantage of dask's nascent support for
caching intermediate computation results. See :issue:`555` for an example.

Bug fixes
~~~~~~~~~
Expand Down
7 changes: 6 additions & 1 deletion xray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,12 @@ def maybe_decode_store(store, lock=False):
drop_variables=drop_variables)

if chunks is not None:
from dask.base import tokenize
try:
from dask.base import tokenize
except ImportError:
import dask # raise the usual error if dask is entirely missing
raise ImportError('xray requires dask version 0.6 or newer')

if isinstance(filename_or_obj, basestring):
file_arg = os.path.getmtime(filename_or_obj)
else:
Expand Down
6 changes: 5 additions & 1 deletion xray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -908,7 +908,11 @@ def chunk(self, chunks=None, name_prefix='xray-', token=None, lock=False):
-------
chunked : xray.Dataset
"""
from dask.base import tokenize
try:
from dask.base import tokenize
except ImportError:
import dask # raise the usual error if dask is entirely missing
raise ImportError('xray requires dask version 0.6 or newer')

if isinstance(chunks, Number):
chunks = dict.fromkeys(self.dims, chunks)
Expand Down
1 change: 1 addition & 0 deletions xray/core/pycompat.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import numpy as np
import sys
from distutils.version import LooseVersion

PY3 = sys.version_info[0] >= 3

Expand Down

0 comments on commit 2ff6cce

Please sign in to comment.