# Welcome to Zarr and Dask for large-scale imaging workshop
## Fernando Cervantes
### Systems Analyst in JAX's Research IT
### email: fernando.cervantes@jax.org

## Outcomes for today's session:
- Learn to use Dask library with Zarr image data
- Implement and apply image analysis pipelines with Dask
- Save image analysis outputs as Zarr files


---
# Overview of the Dask package

Dask is lazy!

![image](https://docs.dask.org/en/stable/_images/dask-array.svg)

In [None]:
!pip show dask

# 1. Manipulate Dask arrays

## 1.1 Create Dask arrays

In [None]:
import dask
import dask.array as da
import numpy as np

In [None]:
d1 = da.zeros((10, 10), chunks=(5, 5), dtype=np.int16)

In [None]:
d1

In [None]:
d1[:5, :5] = 1

In [None]:
d1

---
## 1.2 Visualize Dask graphs

In [None]:
d1.visualize()

---
## 1.3 Eexecute the computation graph

In [None]:
d1[:].compute()

In [None]:
d1 = da.zeros((10, 10), chunks=(5, 5))

In [None]:
d1[3:8, :5] = 2

In [None]:
d1.visualize()

In [None]:
d1.compute()

In [None]:
d1 = d1 + 1

In [None]:
d2 = da.ones((10, 10), chunks=(3, 3))

In [None]:
d3 = d1 + d2

In [None]:
d3

In [None]:
d3.visualize()

In [None]:
d3.chunks

---
## 1.4 Rechunk Dask arrays

In [None]:
d3 = d3.rechunk((5, 5))

In [None]:
d3.visualize()

In [None]:
d3

In [None]:
d3 = d1 + d2.rechunk(d1.chunks)

In [None]:
d3

In [None]:
d3.visualize()

In [None]:
d3_sum = np.sum(d3)

In [None]:
d3_sum

In [None]:
d3_sum.compute()

In [None]:
d3_cos = np.cos(d3)

In [None]:
d3_cos

In [None]:
d3_cos.compute()

---
## 1.5 Persist vs Compute

In [None]:
d3 = d1 + d2.rechunk((5, 5))

In [None]:
d3.visualize()

In [None]:
d3 = d3.persist()

In [None]:
d3.visualize()

In [None]:
d3 = d3 + 1

In [None]:
d3.visualize()

---
# 2. Open Zarr files with Dask

In [None]:
import zarr
import tifffile
import dask
import dask.array as da
import numpy as np
import matplotlib.pyplot as plt

In [None]:
z_grp = tifffile.imread("CMU-1.svs", aszarr=True)
da_arr = da.from_zarr(z_grp, component="0")

In [None]:
da_arr

In [None]:
da_arr = da_arr.rechunk((3, 512, 512))

In [None]:
da_arr

In [None]:
da_arr = np.moveaxis(da_arr, 0, -1)

In [None]:
da_arr

---
# 3. Perform image analysis on Dask arrays

In [None]:
import skimage

In [None]:
da_sel = da_arr[10000:10000 + 2048, 10000:10000 + 2048]

ℹ Dask arrays already work with `skimage` functions without having to execute `.compute()`

In [None]:
arr_gray = skimage.color.rgb2gray(da_sel)

In [None]:
type(arr_gray)

In [None]:
arr_nuclei = arr_gray < 0.25

In [None]:
plt.imshow(arr_nuclei, cmap="gray")