<img src="images/dask_horizontal.svg"
     width="300px"
     alt="Dask logo">
     
# Manipulating the Task Graph

Sometimes the task graph that Dask constructs is sub-optimal. For instance if you are aggregating data and then use that aggregated data in the next operation _and_ that data requires a lot of memory, then it might make sense to manipulate the task graph. Read more about [manipulating task graphs](https://docs.dask.org/en/latest/graph_manipulation.html).

Naive approach to normalizing by mean temperature.

In [None]:
import dask.dataframe as dd

ddf = dd.read_csv("data/*", parse_dates=["timestamp"]).set_index("timestamp")

mean_temperature = ddf.temperature.mean()
output = (ddf.temperature - mean_temperature).resample("1M").agg(["min", "max"])

In [None]:
output.visualize()

If we instead use `bind`, then the task graph is restructured to first go all the way through the `mean_temperature` calculation, and only after that, to start on the full computation. 

**NOTE:** An alternate approach would be to call `mean_temperature.compute()` and pass the evaluated value into the final operation. The important difference is that using `bind` keeps the operation fully lazy (all part of one task graph).

In [None]:
from dask.graph_manipulation import bind

temperature_b = bind(ddf.temperature, mean_temperature)
output_b = (temperature_b - mean_temperature).resample("1M").agg(["min", "max"])

In [None]:
output_b.visualize()