# EMBO Practical Course <br/>"Advanced methods in bioimage analysis"

***

Homepage: https://www.embl.org/about/info/course-and-conference-office/events/bia23-01/

***

## Day 2 - Session 1: Image Data Management - 10:15 to 11:15 "Get set"

<table style="table { position: relative;  display: inline-block; } img {  position: absolute;  left: 0;  right: 0;  width: auto;  height: 100%;  object-fit: cover;  object-position: center;}">
    <tr>
        <td style="vertical-align: top">
            <h3>Introduction</h3>
            <p>
                In this notebook, we'll look at issues arising from having many files
                (and many file formats!) on your file servers. A quick intro into working
                with POSIX file systems will be followed by tools for converting data into
                a common file-format: OME-TIFF.
            </p>
            <p>
                The two main goals are: (1) make sure you have the basics you might need
                for the rest of the course and (2) encourage you to explore these tools
                further on your own to speed up your day-to-day activities.
            </p>
            <p>
                Outline:
                <ol start="3">
                    <li>Bash & Scripting
                        <ol type="a">
                            <li>Bash basics</li>
                            <li>Mixing Bash and Jupyter ⚠</li>
                            <li>Our data directory</li>
                            <li>Extras: time permitting
                                <ul>
                                    <li>Permissions</li>
                                    <li>Simple scripting</li>
                                    <li>shasum</li>
                                    <li>h5tools</li>
                                </ul>
                            </li>
                        </ol>
                    </li>
                        <li>bftools
                        <ol type="a">
                            <li>showinf, bfconvert, etc.</li>
                            <li>Working with OME-TIFF</li>
                            <li>aicsimageio (time-permitting)</li>
                        </ol>
                    </li>
                </ol>
            </p>
        </td>
        <td>
            <center>
                <img src="images/falk/clara-shares-300dpi.png"/>
                <small>
                    <a href="https://github.com/zarr-developers/zarr-illustrations-falk-2022#clara-shares">"Clara shares"</a>
                    by Henning Falk, ©2022 NumFOCUS, is used under a CC BY 4.0 license. Modifications to this photo include cropping.
                </small>
            </center>
        </td>
    </tr>
</table>

## 3a. Bash basics

If you are managing files on a server, it will often be easier to use the command-line once you learn it. And if you are using a command-line, it's likely going to be Bash.

There are a large number of useful commands. Often, the shorter the name the more useful it is. Let's take `ls` for "list".

To learn more about a tool, you can use the `man` tool.

In [1]:
!man ls | head

LS(1)                       General Commands Manual                      LS(1)

NAME
     ls – list directory contents

SYNOPSIS
     ls [-@ABCFGHILOPRSTUWabcdefghiklmnopqrstuvwxy1%,] [--color=when]
        [-D format] [file ...]

DESCRIPTION
cat: stdout: Broken pipe


<hr/>

I bring up `ls` first because it can be used to find other commands:

In [2]:
!ls -ltrad /bin /usr/bin/ /usr/local/bin

drwxr-xr-x   936 root     wheel  29952 Jul 11 10:56 [1m[36m/usr/bin/[m[m
drwxr-xr-x@   39 root     wheel   1248 Jul 11 10:56 [1m[36m/bin[m[m
drwxr-xr-x  2086 jamoore  admin  66752 Aug 20 18:35 [1m[36m/usr/local/bin[m[m


These are the directories where you will find many of them. For example, the command `bash` is one of them:

In [3]:
!which bash

/bin/bash


`which` is a command that tells you where a command lives. `/bin/bash` is the command that gets run when you login. It's your "shell".

In [4]:
!echo $SHELL

/bin/bash


So it's the context that everything you are doing is happening in:

In [5]:
!pstree -p $$

-+= 00001 root /sbin/launchd
 \-+= 00560 jamoore /Applications/iTerm.app/Contents/MacOS/iTerm2
   \-+- 02144 jamoore /Users/jamoore/Library/Application Support/iTerm2/iTermSe
     \-+= 12058 root login -fp jamoore
       \-+= 12059 jamoore -bash
         \-+= 12310 jamoore /Users/jamoore/micromamba/envs/embo/bin/python3.9 /
           \-+= 18104 jamoore /Users/jamoore/micromamba/envs/embo/bin/python -m
             \-+= 18150 jamoore pstree -p 18150
               \--- 18151 root ps -axwwo user,pid,ppid,pgid,command


## 3b. Mixing BASH & Jupyter

Each Jupyter notebook runs within a Python interpreter, but that Python interpreter runs within a Bash shell. Each cell starting with "!" starts a **new** Bash shell. Therefore the first time you run the cells below you won't be able to just set a variable.

This is important because you will need to know the state of the different shells/interpreters, or at least, if you don't it can be confusing. (⚠ Jupyter notebooks can already be confusing because you can change the order of things that you are doing.)

For example, we will use the `whoami` command to set a variable for creating directories for everyone:

In [6]:
!whoami

jamoore


You can try to set this variable in Bash like this:

In [7]:
!YOURNAME=$(whoami)

But when you try to use it, it will be empty:

In [8]:
!echo $YOURNAME




What's happened is that the `!` here created a new Bash shell, ran your command, and then that shell exited. Changes made have disappeared.

Instead, for this action we will need to make use of Python to update the *parent shell*:

In [9]:
import os
YOURNAME = os.getlogin()
%env YOURNAME=$YOURNAME

env: YOURNAME=jamoore


In [10]:
!echo $YOURNAME

jamoore


Now we have the variable we want set.

## 3c. Our data directory

Now with this out of the way, we can start to work with the data directory 

`pwd` prints the directory that the current process is in:

In [11]:
pwd

'/opt/EMBO-Workshop-2023'

We want to move to another directory, but again we can't just use `cd MYDIR` because that will only change the subshell. Instead we use Jupyter magic (commands starting with `%`):

In [12]:
%cd /scratch/bioimagecourse2023/session1

/System/Volumes/Data/scratch/bioimagecourse2023/session1


Now we are in our working directory. Feel free to look around using `ls`. For example the flags `-ltra` mean: "show me a long listing of the files in reverse order by time and even show me the weird files starting with `.`".

In [13]:
!ls -ltra

total 248
-rw-r--r--   1 jamoore  wheel    529 Aug 23 18:13 2a.sh
-rw-r--r--   1 jamoore  wheel  42000 Aug 23 18:13 .2a.sh.un~
-rw-r--r--   1 jamoore  wheel     69 Aug 23 18:16 notes
-rw-r--r--   1 jamoore  wheel   3123 Aug 23 18:16 .notes.un~
drwxr-xr-x   5 jamoore  wheel    160 Aug 23 18:31 [1m[36mdata[m[m
-rw-r--r--   1 jamoore  wheel    469 Aug 23 18:50 2b.sh
-rw-r--r--   1 jamoore  wheel  27560 Aug 23 18:50 .2b.sh.un~
-rw-r--r--   1 jamoore  wheel    638 Aug 23 18:58 3a.sh
-rw-r--r--   1 jamoore  wheel  31925 Aug 23 18:58 .3a.sh.un~
drwxr-xr-x  11 jamoore  wheel    352 Aug 23 18:58 [1m[36m.[m[m
drwxr-xr-x   3 jamoore  wheel     96 Aug 24 16:39 [1m[36m..[m[m


The "-S" flag means sort by size instead of time:

In [14]:
ls -lSr data/

total 0
-rw-r--r--  1 jamoore  wheel   0 Aug 23 18:01 a.fake
lrwxr-xr-x  1 jamoore  wheel  33 Aug 23 18:12 [1m[35mdata[m[m@ -> /scratch/josh_openmicroscopy/data
drwxr-xr-x  3 jamoore  wheel  96 Aug 23 18:31 [1m[36mcellprofiler[m[m/


Another important tool for understanding the size of your data is `du`, for "disk usage". The `-sh` flags "say show me only a (s)ummary of the data and put it in (h)uman-readable form"

In [15]:
! du -sh data/

8.7M	data/


Now we want to make a directory for you to do all of your work in. The "-p" flag means "create all parents" (but also: "don't fail if it also exists" which is useful for Jupyter notebooks!)

In [16]:
!mkdir -p /scratch/${YOURNAME}/session1

In [17]:
%cd /scratch/{YOURNAME}/session1

/System/Volumes/Data/scratch/jamoore/session1


In [None]:
!test -e data || ln -s /scratch/josh_openmicroscopy/data data

## TODO
https://downloads.openmicroscopy.org/images/HCS/Operetta/59548/
https://downloads.openmicroscopy.org/presentations/2013/fs-workshop-paris/#/7/1

`find` is a very powerful tool.

In [None]:
!find . | tail -n 10


In [25]:
!find . -name "*.tiff" | wc

       2       2      31


In [27]:
!find . -type f -exec file {} \;

./a.ome.zarr/.zattrs: JSON data
./a.ome.zarr/.zgroup: JSON data
./a.ome.zarr/0/.zattrs: JSON data
./a.ome.zarr/0/.zgroup: JSON data
./a.ome.zarr/0/0/.zarray: JSON data
./a.ome.zarr/0/0/0/0/0/0/0: Targa image data - Color (1-1024) 1604 x 65536 x 24 +2 ""
./a.ome.zarr/0/1/.zarray: JSON data
./a.ome.zarr/0/1/0/0/0/0/0: data
./a.ome.zarr/OME/METADATA.ome.xml: XML 1.0 document text, ASCII text, with very long lines (678), with no line terminators
./a.ome.zarr/OME/.zattrs: JSON data
./a.ome.zarr/OME/.zgroup: JSON data
./pos002.ome.tiff: TIFF image data, little-endian, direntries=17, height=1006, bps=8, compression=none, PhotometricIntepretation=RGB Palette, width=1000


## Excercise: use the `ln` tool to symlink data into your own folder.

In [29]:
## Do something here or on the command-line

In [None]:
## Your work here

## 3d. Extras (time permitting)

### Permissions

In the directory listings above, e.g.:

```
drwxr-xr-x   936 root     wheel  29952 Jul 11 10:56 /usr/bin/
drwxr-xr-x@   39 root     wheel   1248 Jul 11 10:56 /bin
drwxr-xr-x  2086 jamoore  admin  66752 Aug 20 18:35 /usr/local/bin
```

the info here is **critical**:

```
PERMISSIONS ---- USER     GROUP   SIZE MODIFIED     NAME
```

Figuring out your user and group is pretty easy. `whoami` from above is your user. `id` can tell you your groups:

In [30]:
!id

uid=501(jamoore) gid=20(staff) groups=20(staff),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),399(com.apple.access_ssh),33(_appstore),100(_lpoperator),204(_developer),250(_analyticsusers),395(com.apple.access_ftp),398(com.apple.access_screensharing),400(com.apple.access_remote_ae)


Together we can start to piece together whether or not you can read or edit a file. From https://www.grymoire.com/Unix/Permissions.html :

```
Three sections after the "d" for directory marker:
+------------+------+-------+
| Permission | Octal| Field |
+------------+------+-------+
| rwx------  | 700  | User  |
| ---rwx---  | 070  | Group |
| ------rwx  | 007  | Other |
+------------+------+-------+

Each section can be in one of 8 states:
+-----+---+--------------------------+
| rwx | 7 | Read, write and execute  |
| rw- | 6 | Read, write              |
| r-x | 5 | Read, and execute        |
| r-- | 4 | Read,                    |
| -wx | 3 | Write and execute        |
| -w- | 2 | Write                    |
| --x | 1 | Execute                  |
| --- | 0 | no permissions           |
+------------------------------------+

Examples:
+------------------------+-----------+
| chmod u=rwx,g=rwx,o=rx | chmod 775 | For world executables files
| chmod u=rwx,g=rx,o=    | chmod 750 | For executables by group only
| chmod u=rw,g=r,o=r     | chmod 644 | For world readable files
| chmod u=rw,g=r,o=      | chmod 640 | For group readable files
| chmod u=rw,go=         | chmod 600 | For private readable files
| chmod u=rwx,go=        | chmod 700 | For private executables
+------------------------+-----------+
```

### Simple scripting

For loops can be fairly useful when working with many files:

In [31]:
%%bash
for x in $(find . -name "*.fake");
do
    sha1sum ${x}
done

But this is obviously just a starting point. Perhaps check out a [Carpentry lesson](https://software-carpentry.org/lessons/) like [The Unix Shell](https://swcarpentry.github.io/shell-novice/) for more information. I can highly recommend them.

### shasum

In [23]:
%%bash
for x in $(find . -type f);
do
    sha1sum ${x}
done

1a45bdb742869f84fd71ae0fd67f79f1b26923fe  ./a.ome.zarr/.zattrs
63de336a45370c236af207996ffd1bca2d7ae2f4  ./a.ome.zarr/.zgroup
74075dab6c0712c1d9ae80053ec1410e7f3099cf  ./a.ome.zarr/0/.zattrs
63de336a45370c236af207996ffd1bca2d7ae2f4  ./a.ome.zarr/0/.zgroup
965f764929f2da5e2691cb8049035975c2ccb5a0  ./a.ome.zarr/0/0/.zarray
7c2485fb91905ebd879e745d2e80f097584de271  ./a.ome.zarr/0/0/0/0/0/0/0
e6f199c7ed96964069fce5faa5e1cca8513a2320  ./a.ome.zarr/0/1/.zarray
dbc22354672642df9dcc2dcb5fffcced6a18b83a  ./a.ome.zarr/0/1/0/0/0/0/0
eda28681265146f6d153f53baaee09c030453123  ./a.ome.zarr/OME/METADATA.ome.xml
156e48269827cb4611d5a3899d862c60c8f483f4  ./a.ome.zarr/OME/.zattrs
63de336a45370c236af207996ffd1bca2d7ae2f4  ./a.ome.zarr/OME/.zgroup
722c2d9b3e9a7596389f03ffd5994be8be9c438f  ./pos002.ome.tiff
6b7350beeed3e54806cf46ff110e1a8c77c0fe80  ./a.ome.tiff


### h5tools

In [None]:
!h5ls -h

<hr/>

## Half-time

<hr/>

In [28]:
%%bash

##
## Setup & Sanity checks
##

YOURNAME=$(whoami)
WORKDIR=/scratch/${YOURNAME}/session1
test -e ${WORKDIR} || {
    echo Please run the steps above first.
    exit 1
}

In [32]:
%cd /scratch/{YOURNAME}/session1

/System/Volumes/Data/scratch/jamoore/session1


## 4a. bftools: showinf, bfconvert, etc.

bftools is a package of tools that can be downloaded from https://downloads.openmicroscopy.org/bio-formats/latest/artifacts/ or [conda](https://anaconda.org/ome/bftools) for working with bioimaging files from the command-line.

One of the tools, `formatlist` simply lists all the supported file formats with their associated file endings:

In [33]:
!formatlist


File pattern: can read (pattern)
Zip: can read (zip)
Animated PNG: can read, can write, can write multiple (png)
JPEG: can read, can write (jpg, jpeg, jpe)
SlideBook 7 SLD (native): can read (sldy)
Portable Any Map: can read (pbm, pgm, ppm)
Flexible Image Transport System: can read (fits, fts)
PCX: can read (pcx)
Graphics Interchange Format: can read (gif)
Windows Bitmap: can read (bmp)
IPLab: can read (ipl)
IVision: can read (ipm)
RCPNL: can read (rcpnl)
Deltavision: can read (dv, r3d, r3d_d3d, dv.log, r3d.log)
Medical Research Council: can read (mrc, st, ali, map, rec, mrcs)
Gatan Digital Micrograph: can read (dm3, dm4)
Gatan DM2: can read (dm2)
Bitplane Imaris: can read (ims)
Openlab RAW: can read (raw)
OME-XML: can read, can write, can write multiple (ome, ome.xml)
Leica Image File Format: can read (lif)
Audio Video Interleave: can read, can write, can write multiple (avi)
PICT: can read (pict, pct)
SPCImage Data: can read (sdt)
SPC FIFO Data: can read (spc, set)
Encapsulated Post

We'll look at a few of these types.

In [34]:
!showinf -nopix data/a.fake

Initializing reader
FakeReader initializing data/a.fake
Initialization took 0.082s

Reading core metadata
filename = data/a.fake
Used files = [/System/Volumes/Data/scratch/jamoore/session1/data/a.fake]
Series count = 1
Series #0 :
	Image count = 1
	RGB = false (1) 
	Interleaved = false
	Indexed = false (true color)
	Width = 512
	Height = 512
	SizeZ = 1
	SizeT = 1
	SizeC = 1
	Tile size = 512 x 512
	Thumbnail size = 128 x 128
	Endianness = intel (little)
	Dimension order = XYZCT (certain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = false
	-----
	Plane #0 <=> Z 0, C 0, T 0


Reading global metadata

Reading metadata


In [36]:
!showinf -nopix -omexml-only data/a.fake

<?xml version="1.0" encoding="UTF-8"?>
<OME xmlns="http://www.openmicroscopy.org/Schemas/OME/2016-06" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openmicroscopy.org/Schemas/OME/2016-06 http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd">
   <Image ID="Image:0" Name="data/a">
      <Pixels BigEndian="false" DimensionOrder="XYZCT" ID="Pixels:0" Interleaved="false" SignificantBits="8" SizeC="1" SizeT="1" SizeX="512" SizeY="512" SizeZ="1" Type="uint8">
         <Channel ID="Channel:0:0" SamplesPerPixel="1">
            <LightPath/>
         </Channel>
         <MetadataOnly/>
      </Pixels>
   </Image>
</OME>



In [38]:
! test -e a.ome.tiff && rm -rf a.ome.tiff

In [39]:
! bfconvert data/a.fake a.ome.tiff

data/a.fake
FakeReader initializing data/a.fake
[Simulated data] -> a.ome.tiff [OME-TIFF]
	Converted 1/1 planes (100%)
[done]
0.993s elapsed (17.0+149.0ms per plane, 786ms overhead)


In [40]:
! showinf cellprofiler/fruit-fly-cells/POS218.pattern/POS218.pattern

Initializing reader
Exception in thread "main" java.io.FileNotFoundException: cellprofiler/fruit-fly-cells/POS218.pattern/POS218.pattern (No such file or directory)
	at java.base/java.io.RandomAccessFile.open0(Native Method)
	at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:345)
	at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:259)
	at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:214)
	at loci.common.NIOFileHandle.<init>(NIOFileHandle.java:130)
	at loci.common.NIOFileHandle.<init>(NIOFileHandle.java:151)
	at loci.common.NIOFileHandle.<init>(NIOFileHandle.java:165)
	at loci.common.Location.getHandle(Location.java:522)
	at loci.common.Location.getHandle(Location.java:462)
	at loci.common.Location.getHandle(Location.java:443)
	at loci.common.Location.getHandle(Location.java:426)
	at loci.common.Location.checkValidId(Location.java:551)
	at loci.formats.ImageReader.getReader(ImageReader.java:182)
	at loci.formats.ImageReader.setId(ImageR

In [48]:
from tifffile import imread, tiffcomment

data = imread("a.ome.tiff")
metadata = tiffcomment("a.ome.tiff")

In [55]:
from ome_types import from_xml, from_tiff

obj = from_xml(metadata)
print(obj.images[0])

id='Image:0' name='data/a' pixels={'channels': [{'annotation_refs': [], 'light_path': {'excitation_filters': [], 'emission_filters': [], 'annotation_refs': []}, 'id': 'Channel:0:0', 'samples_per_pixel': 1, 'color': Color('white', rgb=(255, 255, 255))}], 'bin_data_blocks': [], 'tiff_data_blocks': [{'uuid': {'value': 'urn:uuid:0cc95cd0-950f-40bb-bc97-77a3fe7f7b64', 'file_name': 'a.ome.tiff'}, 'plane_count': 1}], 'planes': [], 'id': 'Pixels:0', 'dimension_order': <Pixels_DimensionOrder.XYZCT: 'XYZCT'>, 'type': <PixelType.UINT8: 'uint8'>, 'significant_bits': 8, 'interleaved': False, 'big_endian': False, 'size_x': 512, 'size_y': 512, 'size_z': 1, 'size_c': 1, 'size_t': 1}


In [56]:
obj = from_tiff("a.ome.tiff")
print(obj.images[0])

id='Image:0' name='data/a' pixels={'channels': [{'annotation_refs': [], 'light_path': {'excitation_filters': [], 'emission_filters': [], 'annotation_refs': []}, 'id': 'Channel:0:0', 'samples_per_pixel': 1, 'color': Color('white', rgb=(255, 255, 255))}], 'bin_data_blocks': [], 'tiff_data_blocks': [{'uuid': {'value': 'urn:uuid:0cc95cd0-950f-40bb-bc97-77a3fe7f7b64', 'file_name': 'a.ome.tiff'}, 'plane_count': 1}], 'planes': [], 'id': 'Pixels:0', 'dimension_order': <Pixels_DimensionOrder.XYZCT: 'XYZCT'>, 'type': <PixelType.UINT8: 'uint8'>, 'significant_bits': 8, 'interleaved': False, 'big_endian': False, 'size_x': 512, 'size_y': 512, 'size_z': 1, 'size_c': 1, 'size_t': 1}


In [59]:
from aicsimageio import AICSImage

img = AICSImage("a.ome.tiff") 

In [61]:
img.dask_data

Unnamed: 0,Array,Chunk
Bytes,256.00 kiB,256.00 kiB
Shape,"(1, 1, 1, 512, 512)","(1, 1, 1, 512, 512)"
Dask graph,1 chunks in 6 graph layers,1 chunks in 6 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray
"Array Chunk Bytes 256.00 kiB 256.00 kiB Shape (1, 1, 1, 512, 512) (1, 1, 1, 512, 512) Dask graph 1 chunks in 6 graph layers Data type uint8 numpy.ndarray",1  1  512  512  1,

Unnamed: 0,Array,Chunk
Bytes,256.00 kiB,256.00 kiB
Shape,"(1, 1, 1, 512, 512)","(1, 1, 1, 512, 512)"
Dask graph,1 chunks in 6 graph layers,1 chunks in 6 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray


In [63]:
! showinf -nopix 'a&series=2.fake'

Initializing reader
FakeReader initializing a&series=2.fake
Initialization took 0.071s

Reading core metadata
filename = a&series=2.fake
Used files = [/System/Volumes/Data/scratch/jamoore/session1/a&series=2.fake]
Series count = 2
Series #0 :
	Image count = 1
	RGB = false (1) 
	Interleaved = false
	Indexed = false (true color)
	Width = 512
	Height = 512
	SizeZ = 1
	SizeT = 1
	SizeC = 1
	Tile size = 512 x 512
	Thumbnail size = 128 x 128
	Endianness = intel (little)
	Dimension order = XYZCT (certain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = false
	-----
	Plane #0 <=> Z 0, C 0, T 0

Series #1 :
	Image count = 1
	RGB = false (1) 
	Interleaved = false
	Indexed = false (true color)
	Width = 512
	Height = 512
	SizeZ = 1
	SizeT = 1
	SizeC = 1
	Tile size = 512 x 512
	Thumbnail size = 128 x 128
	Endianness = intel (little)
	Dimension order = XYZCT (certain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series

## License
Copyright (C) 2023 German BioImaging. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details. You should have received a copy of the GNU General
Public License along with this program; if not, write to the
Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.