<img src="images/dask_horizontal.svg"
     width="300px"
     alt="Dask logo">
     
# Delayed

What if you don't have an array or dataframe? Instead of having blocks where the function is applied to each block, you can decorate functions with `@delayed` and _have the functions themselves be lazy_. 

**NOTE:** For this example we will create a fake dataset and store it on disk. You can ignore this bit.

In [None]:
import numpy as np
import dask
import dask.array as da

ddf = dask.datasets.timeseries(start="2010-01-01", end="2020-01-01", freq="1H", partition_freq="1Y")
annual_cycle = np.sin(2 * np.pi * (ddf.index.dayofyear.values / 365.25 - 0.28)).compute_chunk_sizes()
temperature_values = 10 + 15 * annual_cycle + 3 * da.random.normal(size=annual_cycle.size)
ddf["temperature"] = temperature_values

ddf.to_csv("data")

## Delayed version of ETL

This is example matches the one in [Not Delayed](./5.1-not-delayed.ipynb). But this one has Dask.

In [None]:
!rm -rf transformed_data_lazy

import os
import time
import random
import pandas as pd
import dask

os.mkdir("transformed_data_lazy")  

@dask.delayed
def read_a_file(filename):
    time.sleep(random.random())
    df = pd.read_csv(f"data/{filename}", parse_dates=["timestamp"], index_col="timestamp")
    return df

@dask.delayed
def do_a_transformation(df):
    time.sleep(random.random())
    df["temperature_F"] = df["temperature"] * 9/5 + 32 
    return df

@dask.delayed
def write_it_back_out(df, filename):
    time.sleep(random.random())
    path = f"transformed_data_lazy/{filename}"
    df.to_csv(path)
    return path

filenames = os.listdir("data")

outputs = []
for filename in filenames:
    df = read_a_file(filename)
    df = do_a_transformation(df)
    path = write_it_back_out(df, filename)
    outputs.append(path)

dask.compute(outputs)

In [None]:
dask.visualize(outputs)

## Delayed objects

Of course objects can also be converted to `delayed`. Here we can convert from a `dask.array` to a `numpy.array` of delayed objects.

In [None]:
import dask.array as da

arr = da.random.random(size=(1_000, 1_000), chunks=(250, 500))

arr_delayed = arr.to_delayed()
arr_delayed

Delayed objects can be used like **blocks**, but they don't have any sense of what they represent, so there are fewer guard rails.

In [None]:
arr_delayed[0, 1].sum().compute()

In [None]:
arr.blocks[0, 1].sum().compute()

In [None]:
arr_delayed[0, 1] + "a"

In [None]:
arr.blocks[0, 1] + "a"