# EMBO Practical Course "Advanced methods in bioimage analysis"

***

Homepage: https://www.embl.org/about/info/course-and-conference-office/events/bia23-01/

***

## Day 2 - Session 1: Image Data Management - 11:30 to 12:30 "GO!"

<table style="table { position: relative;  display: inline-block; } img {  position: absolute;  left: 0;  right: 0;  width: auto;  height: 100%;  object-fit: cover;  object-position: center;}">
    <tr>
        <td style="vertical-align: top">
            <h3>Continuing from `5_Cloud` in Python...</h3>
            <p>
                Software versions used for this workshop:
                <ul/>
                  <li>awscli                    1.29.30</li>
                  <li>dask                      2023.8.1</li>
                  <li>fsspec                    2023.8.1</li>
                  <li>napari                    0.4.18</li>
                  <li>numpy                     1.25.2</li>
                  <li>ome-zarr                  0.8.0</li>
                  <li>openjdk                   20.0.0</li>
                  <li>tifffile                  2023.8.12</li>
                  <li>zarr                      2.16.1</li>
                  <li>vizarr                    0.4</li>
                </ul>
            </p>
        </td>
        <td>
            <center>
                <img src="images/zarr-implementations.png" width="40%"/>
            </center>
            <center>
                <small>
                    <a href="https://zarr.dev/implementations/">https://zarr.dev/implementations/</a>
                </small>
            </center>
        </td>
    </tr>
</table>

In [1]:
%%bash
##
## Setup & Sanity checks
##

YOURNAME=$(whoami)
WORKDIR=/scratch/${YOURNAME}/session1/
test -e ${WORKDIR} || {
    echo Please run the first the POSIX notebook first.
    exit 1
}

In [2]:
import os
YOURNAME = os.getlogin()
%env YOURNAME=$YOURNAME

env: YOURNAME=jamoore


In [3]:
%cd /scratch/{YOURNAME}/session1

/System/Volumes/Data/scratch/jamoore/session1


## Low-level Zarr access

Like with OME-TIFF, OME-Zarr is primarily metadata within the container format. As with `tifffile`, you can use the existing underlying libraries to access the raw bytes. A list is available under https://zarr.dev

In [4]:
import zarr
zarr.open("a.ome.zarr/0/0")

<zarr.core.Array (1, 1, 1, 512, 512) uint8>

## Parallelization

In Python, to work on the chunks in parallel, `dask` (https://www.dask.org/) is probably the place to start.

In [5]:
import dask.array as da
da.from_zarr("a.ome.zarr/0/0")

Unnamed: 0,Array,Chunk
Bytes,256.00 kiB,256.00 kiB
Shape,"(1, 1, 1, 512, 512)","(1, 1, 1, 512, 512)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray
"Array Chunk Bytes 256.00 kiB 256.00 kiB Shape (1, 1, 1, 512, 512) (1, 1, 1, 512, 512) Dask graph 1 chunks in 2 graph layers Data type uint8 numpy.ndarray",1  1  512  512  1,

Unnamed: 0,Array,Chunk
Bytes,256.00 kiB,256.00 kiB
Shape,"(1, 1, 1, 512, 512)","(1, 1, 1, 512, 512)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray


In [6]:
import ome_zarr
import ome_zarr.io
import ome_zarr.reader

url = ome_zarr.io.parse_url("a.ome.zarr/0")
reader = ome_zarr.reader.Reader(url)
for node in reader():
    print(node.data)

[dask.array<from-zarr, shape=(1, 1, 1, 512, 512), dtype=uint8, chunksize=(1, 1, 1, 512, 512), chunktype=numpy.ndarray>, dask.array<from-zarr, shape=(1, 1, 1, 256, 256), dtype=uint8, chunksize=(1, 1, 1, 256, 256), chunktype=numpy.ndarray>]


## Take homes

<br/>
<big><big>
    <ol>
        <li>
The simplicity & transparency of Zarr files makes them ideal for exploration & the cloud. 
        </li>
         <br/>
        <li>
The primary downside is that working with many small files can introduce bottlenecks for uploading (& even deleting).
        </li>
        <br/>
        <li>
Working with S3 is very different from a file system, fewer (GUI) tools exist, and each S3 implementation may be slightly different.
        </li>
        <br/>
        <li>
The benefits in sharing potential (and in some cases cost-savings) can be significant, especially if there's an enabled ecosystem that works for you.
        </li>
    </ol>
</big></big>

## License
Copyright (C) 2023 German BioImaging. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details. You should have received a copy of the GNU General
Public License along with this program; if not, write to the
Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.