# EMBO Practical Course "Advanced methods in bioimage analysis"

***

Homepage: https://www.embl.org/about/info/course-and-conference-office/events/bia23-01/

***

## Day 2 - Session 1: Image Data Management - 11:30 to 12:30 "GO!"

### Continuing from `5_Cloud` in Python!...

## Software versions used for this workshop: (TODO)

   * awscli                    1.22.87
   * dask                      2022.4.0
   * fsspec                    2022.3.0
   * napari                    0.4.15
   * numpy                     1.22.3
   * ome-zarr                  0.4.0
   * openjdk                   11.0.9.1
   * tifffile                  2022.3.25
   * zarr                      2.11.1
   * vizarr                    0.2


In [1]:
%%bash
##
## Setup & Sanity checks
##

YOURNAME=$(whoami)
WORKDIR=/scratch/${YOURNAME}/session1/
test -e ${WORKDIR} || {
    echo Please run the first the POSIX notebook first.
    exit 1
}

In [2]:
import os
YOURNAME = os.getlogin()
%env YOURNAME=$YOURNAME

env: YOURNAME=jamoore


In [3]:
%cd /scratch/{YOURNAME}/session1

/System/Volumes/Data/scratch/jamoore/session1


In [5]:
import zarr
zarr.open("a.ome.zarr/0/0")

<zarr.core.Array (1, 1, 1, 512, 512) uint8>

In [7]:
import dask.array as da
da.from_zarr("a.ome.zarr/0/0")

Unnamed: 0,Array,Chunk
Bytes,256.00 kiB,256.00 kiB
Shape,"(1, 1, 1, 512, 512)","(1, 1, 1, 512, 512)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray
"Array Chunk Bytes 256.00 kiB 256.00 kiB Shape (1, 1, 1, 512, 512) (1, 1, 1, 512, 512) Dask graph 1 chunks in 2 graph layers Data type uint8 numpy.ndarray",1  1  512  512  1,

Unnamed: 0,Array,Chunk
Bytes,256.00 kiB,256.00 kiB
Shape,"(1, 1, 1, 512, 512)","(1, 1, 1, 512, 512)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray


In [16]:
import ome_zarr
import ome_zarr.io
import ome_zarr.reader

url = ome_zarr.io.parse_url("a.ome.zarr/0")
reader = ome_zarr.reader.Reader(url)
for node in reader():
    print(node.data)

[dask.array<from-zarr, shape=(1, 1, 1, 512, 512), dtype=uint8, chunksize=(1, 1, 1, 512, 512), chunktype=numpy.ndarray>, dask.array<from-zarr, shape=(1, 1, 1, 256, 256), dtype=uint8, chunksize=(1, 1, 1, 256, 256), chunktype=numpy.ndarray>]


## License
Copyright (C) 2023 German BioImaging. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details. You should have received a copy of the GNU General
Public License along with this program; if not, write to the
Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

The metadata in a Zarr fileset is stored in (hidden) files starting with ".z".

In [4]:
!find mri.ome.zarr -name ".z*"

mri.ome.zarr/.zattrs
mri.ome.zarr/.zgroup
mri.ome.zarr/s0/.zarray
mri.ome.zarr/s0/.zattrs


These are broken up into groups (folders) or arrays (data). The `.zgroup` files are fairly simple:

In [5]:
# %load mri.ome.zarr/.zgroup
{
  "zarr_format": 2
}

{'zarr_format': 2}

Each `.zattrs` file contains user-supplied metadata. OME-Zarrs use these attributes to describe how an n-dimensional Zarr array should be interpreted as an image.

In [6]:
# %load mri.ome.zarr/.zattrs
{
  "multiscales": [
    {
      "axes": [
        {
          "name": "z",
          "type": "space",
          "unit": "millimeter"
        },
        {
          "name": "y",
          "type": "space",
          "unit": "millimeter"
        },
        {
          "name": "x",
          "type": "space",
          "unit": "millimeter"
        }
      ],
      "datasets": [
        {
          "path": "s0",
          "coordinateTransformations": [
            {
              "type": "scale",
              "scale": [
                7.0,
                1.0,
                1.0
              ]
            }
          ]
        }
      ],
      "name": "mri",
      "type": "Average",
      "version": "0.4"
    }
  ]
}

{'multiscales': [{'axes': [{'name': 'z',
     'type': 'space',
     'unit': 'millimeter'},
    {'name': 'y', 'type': 'space', 'unit': 'millimeter'},
    {'name': 'x', 'type': 'space', 'unit': 'millimeter'}],
   'datasets': [{'path': 's0',
     'coordinateTransformations': [{'type': 'scale',
       'scale': [7.0, 1.0, 1.0]}]}],
   'name': 'mri',
   'type': 'Average',
   'version': '0.4'}]}

The `.zattrs` for each array can be fairly simple:

In [7]:
# %load mri.ome.zarr/s0/.zattrs
{
  "_ARRAY_DIMENSIONS": [
    "z",
    "y",
    "x"
  ]
}

{'_ARRAY_DIMENSIONS': ['z', 'y', 'x']}

The `.zarray` files specify details about storage like compression and array dimensions:

In [8]:
# %load mri.ome.zarr/s0/.zarray
{
  "shape": [
    27,
    226,
    186
  ],
  "chunks": [
    16,
    128,
    128
  ],
  "fill_value": "0",
  "dtype": "|u1",
  "filters": [],
  "dimension_separator": "/",
  "zarr_format": 2,
  "compressor": {
    "id": "gzip",
    "level": -1
  },
  "order": "C"
}

{'shape': [27, 226, 186],
 'chunks': [16, 128, 128],
 'fill_value': '0',
 'dtype': '|u1',
 'filters': [],
 'dimension_separator': '/',
 'zarr_format': 2,
 'compressor': {'id': 'gzip', 'level': -1},
 'order': 'C'}

All the other files in the tree are **"chunks"**, pieces of an array that have been written to separate files:

In [9]:
!tree mri.ome.zarr

[01;34mmri.ome.zarr[00m
└── [01;34ms0[00m
    ├── [01;34m0[00m
    │   ├── [01;34m0[00m
    │   │   ├── 0
    │   │   └── 1
    │   └── [01;34m1[00m
    │       ├── 0
    │       └── 1
    └── [01;34m1[00m
        ├── [01;34m0[00m
        │   ├── 0
        │   └── 1
        └── [01;34m1[00m
            ├── 0
            └── 1

7 directories, 8 files


The levels of this hierarchy can be interpreted as:
```
mri.ome.zarr
└── resolution-level
    └── z-chunk-index
        └── y-chunk-index
            └── x-chunk-index
```

In [31]:
!ls -ltrah mri.ome.zarr/s0/0/0/0

-rw-r--r--  1 jamoore  wheel   148K Apr  5 22:39 mri.ome.zarr/s0/0/0/0


In [32]:
!bioformats2raw

[31m[1mMissing required parameters: '<inputPath>', '<outputLocation>'[21m[39m[0m
Usage: [1m<main class>[21m[0m [[33m-p[39m[0m] [[33m--no-hcs[39m[0m] [[33m--[no-]nested[39m[0m] [[33m--no-ome-meta-export[39m[0m]
                    [[33m--no-root-group[39m[0m] [[33m--overwrite[39m[0m]
                    [[33m--use-existing-resolutions[39m[0m] [[33m--version[39m[0m] [[33m--debug[39m[0m
                    [=[3m<logLevel>[23m[0m]] [[33m--extra-readers[39m[0m[=[3m<extraReaders>[23m[0m[,
                    [3m<extraReaders>[23m[0m...]]]... [[33m--options[39m[0m[=[3m<readerOptions>[23m[0m[,
                    [3m<readerOptions>[23m[0m...]]]... [[33m-s[39m[0m[=[3m<seriesList>[23m[0m[,
                    [3m<seriesList>[23m[0m...]]]...
                    [[33m--additional-scale-format-string-args[39m[0m=[3m<additionalScaleForma[23m[0m
[3m                    tStringArgsCsv>[23m[0m] [[33m-c[39m[0m=[3m<compressionTy

In [33]:
import os, shutil
if os.path.exists("/tmp/trans_norm_out"):
    shutil.rmtree("/tmp/trans_norm_out")

In [34]:
%%time
!bioformats2raw --debug=OFF --progress 1885619/trans_norm.tif /tmp/trans_norm_out

[0/0]   0% [33m│                                 │[0m   0/571 (0:00:00 / ?) 
CPU times: user 278 ms, sys: 122 ms, total: 401 ms571/571 (0:00:01 / 0:00:00) [1B
Wall time: 10.4 s


In [35]:
!ls /tmp/trans_norm_out

[1m[36m0[m[m   [1m[36mOME[m[m


In [15]:
!find /tmp/trans_norm_out -name ".z*"

/tmp/trans_norm_out/.zattrs
/tmp/trans_norm_out/.zgroup
/tmp/trans_norm_out/0/.zattrs
/tmp/trans_norm_out/0/.zgroup
/tmp/trans_norm_out/0/0/.zarray


In [36]:
!ome_zarr -q info /tmp/trans_norm_out/0

/private/tmp/trans_norm_out/0 [zgroup]
 - metadata
   - Multiscales
 - data
   - (1, 1, 571, 30, 30)


## Take homes

<br/>
<big><big>
    <ol>
        <li>
The simplicity & transparency of Zarr files makes them ideal for exploration & the cloud. 
        </li>
         <br/>
        <li>
The primary downside is that working with many small files can introduce bottlenecks for uploading (& even deleting).
        </li>
        <br/>
        <li>
Working with S3 is very different from a file system, fewer (GUI) tools exist, and each S3 implementation may be slightly different.
        </li>
        <br/>
        <li>
The benefits in sharing potential (and in some cases cost-savings) can be significant, especially if there's an enabled ecosystem that works for you.
        </li>
    </ol>
</big></big>

## License
Copyright (C) 2023 German BioImaging. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details. You should have received a copy of the GNU General
Public License along with this program; if not, write to the
Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.