From 4e653b6d1d855fe87fba8dab49e4bcc67c708d0e Mon Sep 17 00:00:00 2001 From: christopher Date: Thu, 10 May 2018 13:42:09 -0400 Subject: [PATCH 1/3] add docs about crossing dask and stream nodes --- docs/source/dask.rst | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/docs/source/dask.rst b/docs/source/dask.rst index 0a1dbfc6..442631b0 100644 --- a/docs/source/dask.rst +++ b/docs/source/dask.rst @@ -117,3 +117,24 @@ did. ``source.emit``. .. _Dask: https://dask.pydata.org/en/latest/ + + +Gotchas ++++++++ + +An important gotcha with ``DaskStream`` is that if a ``Stream`` node is +downstream of a ``DaskStream`` node without a ``gather`` between then the +``Stream`` node will receive the future not the data itself. + +For example + +.. code-block:: python + + source = Stream() + source2 = Stream() + a = source.scatter().map(inc) + b = source2.combine_latest(a) + +In this case b is now a ``Stream`` node and does not have access to the actual +data on the dask cluster. +Any operations done downstream of b would be performed on the future. From 8d7fcb9a61be2c6b04c83b99de3f38ef471578b3 Mon Sep 17 00:00:00 2001 From: christopher Date: Thu, 10 May 2018 14:08:06 -0400 Subject: [PATCH 2/3] code case b --- docs/source/dask.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/dask.rst b/docs/source/dask.rst index 442631b0..fddac18c 100644 --- a/docs/source/dask.rst +++ b/docs/source/dask.rst @@ -135,6 +135,6 @@ For example a = source.scatter().map(inc) b = source2.combine_latest(a) -In this case b is now a ``Stream`` node and does not have access to the actual +In this case ``b`` is now a ``Stream`` node and does not have access to the actual data on the dask cluster. -Any operations done downstream of b would be performed on the future. +Any operations done downstream of ``b`` would be performed on the future. From 328177955152abfca555dd9d1e881f301765ccaa Mon Sep 17 00:00:00 2001 From: "Christopher J. Wright" Date: Tue, 26 Mar 2019 10:06:30 -0400 Subject: [PATCH 3/3] changes from @martindurant --- docs/source/dask.rst | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/docs/source/dask.rst b/docs/source/dask.rst index fddac18c..29819bc5 100644 --- a/docs/source/dask.rst +++ b/docs/source/dask.rst @@ -122,19 +122,18 @@ did. Gotchas +++++++ -An important gotcha with ``DaskStream`` is that if a ``Stream`` node is -downstream of a ``DaskStream`` node without a ``gather`` between then the -``Stream`` node will receive the future not the data itself. -For example +An important gotcha with ``DaskStream`` is that it is a subclass ``Stream``, and so can be used as an input +to any function expecting a ``Stream``. If there is no intervening ``.gather()``, then the downstream node will +receive Dask futures instead of the data they represent:: -.. code-block:: python + source = Stream() + source2 = Stream() + a = source.scatter().map(inc) + b = source2.combine_latest(a) - source = Stream() - source2 = Stream() - a = source.scatter().map(inc) - b = source2.combine_latest(a) +In this case, the combine operation will get real values from ``source2``, and Dask futures. +Downstream nodes would be free to operate on the futures, but more likely, the line should be:: + + b = source2.combine_latest(a.gather()) -In this case ``b`` is now a ``Stream`` node and does not have access to the actual -data on the dask cluster. -Any operations done downstream of ``b`` would be performed on the future.