<img src="images/dask_horizontal.svg"
     width="45%"
     alt="Dask logo">
     
# Manipulating the Task Graph

Sometimes the task graph that Dask constructs is sub-optimal. For instance if you are aggregating data and then use that aggregated data in the next operation _and_ that data requires a lot of memory, then it might make sense to manipulate the task graph. Read more about [manipulating task graphs](https://docs.dask.org/en/latest/graph_manipulation.html).

**NOTE** For this example we will create a fake dataset and store it on disk. You can ignore this bit.

In [None]:
import numpy as np
import dask
import dask.array as da

ddf = dask.datasets.timeseries(start="2010-01-01", end="2020-01-01", freq="1H", partition_freq="1Y")
annual_cycle = np.sin(2 * np.pi * (ddf.index.dayofyear.values / 365.25 - 0.28)).compute_chunk_sizes()
temperature_values = 10 + 15 * annual_cycle + 3 * da.random.normal(size=annual_cycle.size)
ddf["temperature"] = temperature_values

ddf.to_csv("data")

In [None]:
import dask.dataframe as dd

ddf = dd.read_csv("data/*", parse_dates=["timestamp"]).set_index("timestamp")

mean_temperature = ddf.temperature.mean()
output = (ddf.temperature - mean_temperature).resample("1M").agg(["min", "max"])

In [None]:
output.visualize()

In [None]:
from dask.graph_manipulation import bind

temperature_b = bind(ddf.temperature, mean_temperature)
output_b = (temperature_b - mean_temperature).resample("1M").agg(["min", "max"])

In [None]:
output_b.visualize()