# dask-image: A library for distributed image processing

John Kirkham ([@jakirkham]( https://github.com/jakirkham ))

# Typical image processing use cases

* Photos from commidity cameras and cellphones
* High quality images
* Images are in color
* (Several) images fit comfortably in memory
* Generic images of recognizable scenes
* Lots of labeled data
* Various successful algorithms

# Large image processing use cases

* Specialized instruments: microscopes, telescopes, satellites, medical instruments, LIDAR, etc.
* Quality is variable (often technical limits are explored)
* Anywhere between monochrome to hyperspectral
* Small pieces fit in memory
* Images may only make sense to domain specialists
* Little to no label data
* Complex pipelines often required to analyze the data

# Large image processing use cases


[![AOLLSM and ExLLSM]( http://img.youtube.com/vi/ma4fbBLKUEE/0.jpg )]( https://www.youtube.com/watch?v=ma4fbBLKUEE )

# Observations

* Working with large image data is very hard and different
* Being large is part of the problem
* Limited domain knowledge is also a challenge

# Common workflows

* Batch Processing
* Large field of view

# Common workflows - Batch Processing

```python
for each_fn in myfiles:
    a_chunk = load(each_fn)
    a_cleaned = cleanup(a_chunk)
    a_mask = threshold(a_cleaned)
    a_labeled = label(a_mask)
    save(a_labeled)
```

# Common workflows - Large image

```python
# Repeated for each op
for each_slc in regions:
    larger_slice, cropped_slice = add_overlap(each_slice, cleanup_overlap)
    a_larger = load(larger_slice)
    a_large_cleaned = cleanup(a_larger)
    a = a_large_cleaned[cropped_slice]
    save(a)
```

# What are the challenges with these?

```python
for each_fn in myfiles:            # <--- Not parallel
    a_chunk = load(each_fn)
    a_cleaned = cleanup(a_chunk)
    a_mask = threshold(a_cleaned)
    a_labeled = label(a_mask)
    save(a_labeled)
```

```python
for each_fn in myfiles:            # <--- Not parallel
    a_chunk = load(each_fn)
    a_cleaned = cleanup(a_chunk)   # <--- Not inspectable
    a_mask = threshold(a_cleaned)
    a_labeled = label(a_mask)
    save(a_labeled)
```

```python
for each_fn in myfiles:            # <--- Not parallel
    a_chunk = load(each_fn)
    a_cleaned = cleanup(a_chunk)   # <--- Not inspectable
    a_mask = threshold(a_cleaned)  # <--- Not swappable
    a_labeled = label(a_mask)
    save(a_labeled)
```

```python
for each_fn in myfiles:            # <--- Not parallel
    a_chunk = load(each_fn)
    a_cleaned = cleanup(a_chunk)   # <--- Not inspectable
    a_mask = threshold(a_cleaned)  # <--- Not swappable
    a_labeled = label(a_mask)
    save(a_labeled)                # <--- Not interactive
```

```python
# Repeated for each op             # <--- Higher overhead for complex ops
for each_slc in regions:
    larger_slice, cropped_slice = get_cleanup_overlap(each_slice)
    a_larger = load(larger_slice)
    a_large_cleaned = cleanup(a_larger)
    a = a_large_cleaned[cropped_slice]
    save(a)
```

# What are the challenges with these?

* Many!
* Fixing each one increases the complexity
* Challenging to maintain
* Hard still to introduce new users to

# How can we improve this workflow?

# Loading image data

```python
import dask.array as da
from dask_image.imread import imread

a = da.block([
    [imread("images/fn00.tiff"), imread("images/fn01.tiff")],
    [imread("images/fn10.tiff"), imread("images/fn11.tiff")],
])
```

<br>

Read more here: https://blog.dask.org/2019/06/20/load-image-data

# Batch Processing (Revisited)

```python
a_cleaned = a.map_blocks(cleanup)
a_mask = a_cleaned.map_blocks(threshold)
a_labeled = a_mask.map_blocks(label)
```

# Large Image (Revisited)

```python
a_cleaned = a.map_overlap(cleanup, cleanup_overlap)
a_mask = a_cleaned.map_overlap(threshold, threshold_overlap)
a_labeled = a_mask.map_overlap(label, label_overlap)
```

# How can we improve this workflow?

* `.map_blocks` for batch
* `.map_overlap` for large images

# Loading image data



```python
import dask.array as da
from dask_image.imread import imread

a = da.block([
    [imread("images/fn00.tiff"), imread("images/fn01.tiff")],
    [imread("images/fn10.tiff"), imread("images/fn11.tiff")],
])
```