# Reading OME-Zarr files

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ome/EMBL-EBI-imaging-course-04-2025/blob/main/Reading_zarr_images.ipynb)


## Learning Objectives

* Access OME-Zarr files over https
* Learn how to access local OME-Zarr file in Python
* Learn how to access remote OME-Zarr file in Python

There are several ways to access data. For the purpose of the topics covered in this workshop, we will access files over ``https`` and use [dask](https://dask.org/)

Some sofware packages require to have all the 2D planes in memory in order to work other can work on planar data. We will now show two mechanisms to access the data depending on the needs using ``dask.array.from_zarr``.

### Install dependencies if required
If using Google Colab, **do not** use the ``Runtime>Run all`` entry.

In [None]:
%pip install zarr

## How to access local OME-Zarr file using Python
Like with OME-TIFF, OME-Zarr is primarily metadata within the container format Zarr. 
You can use low-level existing underlying libraries to access the raw bytes. It is nonetheless to work on the chunks in parallel.
The [Dask]((https://www.dask.org/) is a useful library to start with

In [9]:
import dask
import dask.array as da
from dask.diagnostics import ProgressBar
import numpy

## How to access OME-Zarr file on S3

To view the data in S3, several options are possible. 
For the purpose of this workshop, we will view the data over ``https``.

## Read the OME-Zarr file stored in S3 using Python
The ``B4_C3.tif`` image has been converted into OME-Zarr and is available on S3.
In the example below, we use remote data. The same logic applies when reading local Zarr files.

In [14]:
image_id = 6001247

In [15]:
ENPOINT_URL = 'https://uk1s3.embassy.ebi.ac.uk/'

In [None]:
# Helper method to display the data
import matplotlib.pyplot as plt
%matplotlib inline
from ipywidgets import *

def update(z=0):
    fig = plt.figure(figsize=(10, 10))
    plt.subplot(121)
    c = 1
    t = 0
    plt.imshow(data[t, c, z, :, :])
    fig.canvas.flush_events()

### Option 1: Load the binary
Load the binary. In that case, we load the 5D-image. This might be required when using a software needing to access the 5D-image to analyse the data. This approach should only be used if the 5D-image is required.

In [20]:
def load_binary_from_s3_with_data(name, resolution='0'):
    root = '%s/%s/' % (name, resolution)
    with ProgressBar():
        return numpy.asarray(da.from_zarr(ENPOINT_URL + root))

In [None]:
%%time 
name = 'idr/zarr/v0.1/%s.zarr' % image_id
data = load_binary_from_s3_with_data(name)
print(data.shape)

When the slider is moved, the plane is loaded **from disk** since it has already been downloaded.

In [32]:
interact(update, z= widgets.IntSlider(value=0, min=0, max=data.shape[2]-1, step=1, description="Select Z", continuous_update=False))

interactive(children=(IntSlider(value=0, continuous_update=False, description='Select Z', max=256), Output()),…

<function __main__.update(z=0)>

### Option 2: Lazy Loading

The method below will return a Dask array **without** any binary data i.e. **lazy loading**. The dimension order of the array returned is ``(TCZYX)``. 

In [None]:
def load_binary_from_s3(name, resolution='0'):
    root = '%s/%s/' % (name, resolution)
    return da.from_zarr(ENPOINT_URL + root)

In [None]:
%%time 
name = 'idr/zarr/v0.1/%s.zarr' % image_id
data = load_binary_from_s3(name)
print(data.shape)

Main point to keep in mind is that binary data **are not** loaded until it is used, i.e. it is **lazily loaded**. 
The plane will be loaded when the slider is moved.

In [None]:
interact(update, z= widgets.IntSlider(value=0, min=0, max=data.shape[2]-1, step=1, description="Select Z", continuous_update=False))

### Load the labels
zarr images can be seen as a "container": it is possible to store alongside the acquired binary data, the output of analytical results for example ``masks``.
Labels if any are stored under the ``labels`` folder.


In [None]:
%%time
name = 'idr/zarr/v0.1/%s.zarr/labels' % image_id
labels = load_binary_from_s3(name)

In [None]:
print(labels.shape)

The labels were saved for one channel only. This is why the ``C`` dimensions are not the same.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
c = 1
t = 0
z = 100
plt.imshow(data[t, c, z, :, :])
plt.imshow(labels[t, 0, z, :, :], cmap='jet', alpha=0.5) 

## Analyze an image in parallel
The Lightsheet image below is taken from the paper "In Toto Imaging and Reconstruction of Post-Implantation Mouse Development at the Single-Cell Level" published October 2018 in Cell: [https://doi.org/10.1016/j.cell.2018.09.031](https://doi.org/10.1016/j.cell.2018.09.031). The images can be viewed online in the [Image Data Resource](https://idr.openmicroscopy.org/).

The original image ID is used to identify the file on the S3 store.

In [None]:
image_id = 4007801

Load the dask array

In [None]:
name = 'idr/zarr/v0.1/%s.zarr' % image_id
data = load_binary_from_s3(image_id, resolution='4')

In [None]:
# Check the dimension of the array
print(data.shape)

Segment the image
This could be replaced by the analysis you wish to perform on a plane.

In [None]:
from skimage.filters import threshold_otsu
from skimage.morphology import closing
from skimage.filters import gaussian
from skimage.measure import label

def analyze(t, c, z):
    plane = data[t, c, z, :, :].compute() # convert the Dask array into a Numpy array
    smoothed_image = gaussian(plane, sigma=1)
    black_white_plane = closing(smoothed_image > threshold_otsu(plane))
    label_image = label(black_white_plane)
    name = "t:%s, c: %s, z:%s" % (t, c, z)
    return label_image, name

We use ``dask.delayed`` on our function so it records what we want to compute as a task into a graph that will run later on parallel hardware.

Due to the size of the image, we only analyze in the context of this notebook a few planes around the middle z-section and middle timepoint for the first channel.

First prepare the graph. At this stage, we **do not** load the binary data.

In [None]:
lazy_results = []
middle_z = data.shape[2] // 2
middle_t = data.shape[0] // 2
range_t = 2
range_z = 2
range_c = 1
for t in range(middle_t - range_t, middle_t + range_t):
    for z in range(middle_z - range_z, middle_z + range_z):
        for c in range(range_c):
            lazy_result = dask.delayed(analyze)(t, c, z)
            lazy_results.append(lazy_result)
print(lazy_results)

Run the analysis in parallel

The lazy_results list contains information about ``number_t*number_z*number_c`` calls to our function ``analyze``. We call ``dask.compute`` when we want the results. The binary data is loaded from the S3 store during the ``compute`` phase and the analysis is performed.

In [None]:
%time results = dask.compute(*lazy_results)

Display the segmented planes. Use the slider to select the plane.

In [None]:
from ipywidgets import *

def display(i=0):
    r, name = results[i]
    fig = plt.figure(figsize=(10, 10))
    plt.subplot(121)
    plt.imshow(r)
    plt.title(name)
    fig.canvas.flush_events()

interact(display, i= widgets.IntSlider(value=0, min=0, max=len(results)-1, step=1, description="Select Plane", continuous_update=False))

### License (BSD 2-Clause)
Copyright (C) 2025 University of Dundee. All Rights Reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.