# EMBO Practical Course <br/>"Advanced methods in bioimage analysis"

***

Homepage: https://www.embl.org/about/info/course-and-conference-office/events/bia23-01/

***

## Day 2 - Session 1: Image Data Management - 10:15 to 11:15 "Get set"

<table style="table { position: relative;  display: inline-block; } img {  position: absolute;  left: 0;  right: 0;  width: auto;  height: 100%;  object-fit: cover;  object-position: center;}">
    <tr>
        <td style="vertical-align: top">
            <h3>Introduction</h3>
            <p>
                In this notebook, we'll look at issues arising from having many files
                (and many file formats!) on your file servers. A quick intro into working
                with POSIX file systems will be followed by tools for converting data into
                a common file-format: OME-TIFF.
            </p>
            <p>
                The two main goals are: (1) make sure you have the basics you might need
                for the rest of the course and (2) encourage you to explore these tools
                further on your own to speed up your day-to-day activities.
            </p>
            <p>
                Outline:
                <ol start="3">
                    <li>Bash & Scripting
                        <ol type="a">
                            <li>Bash basics</li>
                            <li>Mixing Bash and Jupyter ⚠</li>
                            <li>Our data directory</li>
                            <li>Permissions</li>
                        </ol>
                    </li>
                        <li>bftools
                        <ol type="a">
                            <li>showinf</li>
                            <li>bfconvert</li>
                            <li>OME-TIFF</li>
                        </ol>
                    </li>
                </ol>
            </p>
        </td>
        <td>
            <center>
                <img src="images/falk/clara-shares-300dpi.png"/>
            </center>
        </td>
    </tr>
</table>

## 3a. Bash basics

If you are managing files on a server, it will often be easier to use the command-line once you learn it. And if you are using a command-line, it's likely going to be Bash.

There are a large number of useful commands. Often, the shorter the name the more useful it is. Let's take `ls` for "list".

To learn more about a tool, you can use the `man` tool.

In [57]:
!man ls | head

LS(1)                       General Commands Manual                      LS(1)

NAME
     ls – list directory contents

SYNOPSIS
     ls [-@ABCFGHILOPRSTUWabcdefghiklmnopqrstuvwxy1%,] [--color=when]
        [-D format] [file ...]

DESCRIPTION
cat: stdout: Broken pipe


<hr/>

I bring up `ls` first because it can be used to find other commands:

In [3]:
!ls -ltrad /bin /usr/bin/ /usr/local/bin

drwxr-xr-x   936 root     wheel  29952 Jul 11 10:56 [1m[36m/usr/bin/[m[m
drwxr-xr-x@   39 root     wheel   1248 Jul 11 10:56 [1m[36m/bin[m[m
drwxr-xr-x  2086 jamoore  admin  66752 Aug 20 18:35 [1m[36m/usr/local/bin[m[m


These are the directories where you will find many of them. For example, the command `bash` is one of them:

In [2]:
!which bash

/bin/bash


This is the command that gets run when you login:

In [1]:
!echo $SHELL

/bin/bash


So it's the context that everything you are doing is happening in:

In [17]:
!pstree -p $$

-+= 00001 root /sbin/launchd
 \-+= 00560 jamoore /Applications/iTerm.app/Contents/MacOS/iTerm2
   \-+- 02144 jamoore /Users/jamoore/Library/Application Support/iTerm2/iTermSe
     \-+= 12058 root login -fp jamoore
       \-+= 12059 jamoore -bash
         \-+= 12310 jamoore /Users/jamoore/micromamba/envs/embo/bin/python3.9 /
           \-+= 12584 jamoore /Users/jamoore/micromamba/envs/embo/bin/python -m
             \-+= 13306 jamoore pstree -p 13306
               \--- 13307 root ps -axwwo user,pid,ppid,pgid,command


## 3b. Mixing BASH & Jupyter

Each Jupyter notebook runs within a Python interpreter, but that Python interpreter runs within a Bash shell. Each cell starting with "!" starts a **new** Bash shell. Therefore the first time you run the cells below you won't be able to just set a variable.

This is important because you will need to know the state of the different shells/interpreters, or at least, if you don't it can be confusing.

For example, we will use the `whoami` command to set a variable for creating directories for everyone:

In [6]:
!whoami

jamoore


You can try to set this variable in Bash like this:

In [18]:
!YOURNAME=$(whoami)

But when you try to use it, it will be empty:

In [19]:
!echo $YOURNAME




Instead, for this action we will need to make use of Python: 

In [20]:
import os
YOURNAME = os.getlogin()
%env YOURNAME=$YOURNAME

env: YOURNAME=jamoore


In [21]:
!echo $YOURNAME

jamoore


Now we have the variable we want set.

## 3c. Our data directory

`pwd` prints the directory that the current process is in:

In [22]:
pwd

'/opt/EMBO-Workshop-2023'

We want to move to another directory, but again we can't just use `cd MYDIR` because that will only change the subshell. Instead we use Jupyter magic (commands starting with `%`):

In [24]:
%cd /scratch/bioimagecourse2023/session1

/System/Volumes/Data/scratch/bioimagecourse2023/session1


Now we are in our working directory. Feel free to look around using `ls`. For example the flags `-ltra` mean: "show me a long listing of the files in reverse order by time and even show me the weird files starting with `.`".

In [25]:
!ls -ltra

total 248
-rw-r--r--   1 jamoore  wheel    529 Aug 23 18:13 2a.sh
-rw-r--r--   1 jamoore  wheel  42000 Aug 23 18:13 .2a.sh.un~
-rw-r--r--   1 jamoore  wheel     69 Aug 23 18:16 notes
-rw-r--r--   1 jamoore  wheel   3123 Aug 23 18:16 .notes.un~
drwxr-xr-x   5 jamoore  wheel    160 Aug 23 18:31 [1m[36mdata[m[m
-rw-r--r--   1 jamoore  wheel    469 Aug 23 18:50 2b.sh
-rw-r--r--   1 jamoore  wheel  27560 Aug 23 18:50 .2b.sh.un~
-rw-r--r--   1 jamoore  wheel    638 Aug 23 18:58 3a.sh
-rw-r--r--   1 jamoore  wheel  31925 Aug 23 18:58 .3a.sh.un~
drwxr-xr-x  11 jamoore  wheel    352 Aug 23 18:58 [1m[36m.[m[m
drwxr-xr-x   3 jamoore  wheel     96 Aug 24 16:39 [1m[36m..[m[m


The "-S" flag means sort by size instead of time:

In [27]:
ls -lSr data/

total 0
-rw-r--r--  1 jamoore  wheel   0 Aug 23 18:01 a.fake
lrwxr-xr-x  1 jamoore  wheel  33 Aug 23 18:12 [1m[35mdata[m[m@ -> /scratch/josh_openmicroscopy/data
drwxr-xr-x  3 jamoore  wheel  96 Aug 23 18:31 [1m[36mcellprofiler[m[m/


Another important tool for understanding the size of your data is `du`, for "disk usage". The `-sh` flags "say show me only a (s)ummary of the data and put it in (h)uman-readable form"

In [26]:
! du -sh data/

8.7M	data/


## 3d. Permissions

In [29]:
##
## Permissions
##

!mkdir -p /scratch/${YOURNAME}/session1

In [25]:
!id

uid=501(jamoore) gid=20(staff) groups=20(staff),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),399(com.apple.access_ssh),33(_appstore),100(_lpoperator),204(_developer),250(_analyticsusers),395(com.apple.access_ftp),398(com.apple.access_screensharing),400(com.apple.access_remote_ae)


In [32]:
%cd /scratch/{YOURNAME}/session1

/System/Volumes/Data/scratch/jamoore/session1


In [33]:
!test -e session1 || ln -s /scratch/josh_openmicroscopy/data session1

In [34]:
##
##
##

%cd session1



/System/Volumes/Data/scratch/josh_openmicroscopy/data


In [35]:
!find . | tail -n 10


./cellprofiler/fruit-fly-cells/POS076.pattern/01_POS076_Z00_T00_D.TIF
./cellprofiler/fruit-fly-cells/COPYING
./cellprofiler/fruit-fly-cells/readme.txt
./cellprofiler/fruit-fly-cells/POS002.pattern
./cellprofiler/fruit-fly-cells/POS002.pattern/POS002.pattern
./cellprofiler/fruit-fly-cells/POS002.pattern/01_POS002_Z00_T00_D.TIF
./cellprofiler/fruit-fly-cells/POS002.pattern/01_POS002_Z00_T00_R.TIF
./cellprofiler/fruit-fly-cells/POS002.pattern/01_POS002_Z00_T00_F.TIF
./cellprofiler/fruit-fly-cells/.bioformats
./data


In [36]:
!find . -exec file {} \;


.: directory
./a.fake: empty
./cellprofiler: directory
./cellprofiler/fruit-fly-cells: directory
./cellprofiler/fruit-fly-cells/POS218.pattern: directory
./cellprofiler/fruit-fly-cells/POS218.pattern/POS218.pattern: ASCII text, with no line terminators
./cellprofiler/fruit-fly-cells/POS218.pattern/01_POS218_Z00_T00_R.TIF: TIFF image data, little-endian, direntries=16
./cellprofiler/fruit-fly-cells/POS218.pattern/01_POS218_Z00_T00_D.TIF: TIFF image data, little-endian, direntries=16
./cellprofiler/fruit-fly-cells/POS218.pattern/01_POS218_Z00_T00_F.TIF: TIFF image data, little-endian, direntries=16
./cellprofiler/fruit-fly-cells/.DS_Store: Apple Desktop Services Store
./cellprofiler/fruit-fly-cells/POS076.pattern: directory
./cellprofiler/fruit-fly-cells/POS076.pattern/.DS_Store: Apple Desktop Services Store
./cellprofiler/fruit-fly-cells/POS076.pattern/POS076.pattern: ASCII text, with no line terminators
./cellprofiler/fruit-fly-cells/POS076.pattern/01_POS076_Z00_T00_F.TIF: TIFF image d

In [42]:
%%bash

##
## Extras
##

for x in $(find . -name "*.fake");
do
    sha1sum ${x}
done

da39a3ee5e6b4b0d3255bfef95601890afd80709  ./a.fake


Can highly suggest

https://software-carpentry.org/lessons/, e.g. [The Unix Shell](https://swcarpentry.github.io/shell-novice/)



In [53]:
!h5ls -h

usage: h5ls [OPTIONS] file[/OBJECT] [file[/[OBJECT]...]
  OPTIONS
   -h, -?, --help  Print a usage message and exit
   -a, --address   Print raw data address.  If dataset is contiguous, address
                   is offset in file of beginning of raw data. If chunked,
                   returned list of addresses indicates offset of each chunk.
                   Must be used with -v, --verbose option.
                   Provides no information for non-dataset objects.
   -d, --data      Print the values of datasets
   --enable-error-stack
                   Prints messages from the HDF5 error stack as they occur.
   --follow-symlinks
                   Follow symbolic links (soft links and external links)
                   to display target object information.
                   Without this option, h5ls identifies a symbolic link
                   as a soft link or external link and prints the value
                   assigned to the symbolic link; it does not provide any
         

<hr/>

## Half-time

<hr/>

## 4. bftools: showinf, bfconvert, OME-TIFF, oh my

In [4]:
%%bash

##
## Setup & Sanity checks
##

YOURNAME=$(whoami)
WORKDIR=/scratch/${YOURNAME}/session1
test -e ${WORKDIR} || {
    echo Please run the steps above first.
    exit 1
}

In [7]:
%cd /scratch/{YOURNAME}/session1

/System/Volumes/Data/scratch/jamoore/session1



##
## bftools & OME-TIFF
##



In [9]:
!formatlist


File pattern: can read (pattern)
Zip: can read (zip)
Animated PNG: can read, can write, can write multiple (png)
JPEG: can read, can write (jpg, jpeg, jpe)
SlideBook 7 SLD (native): can read (sldy)
Portable Any Map: can read (pbm, pgm, ppm)
Flexible Image Transport System: can read (fits, fts)
PCX: can read (pcx)
Graphics Interchange Format: can read (gif)
Windows Bitmap: can read (bmp)
IPLab: can read (ipl)
IVision: can read (ipm)
RCPNL: can read (rcpnl)
Deltavision: can read (dv, r3d, r3d_d3d, dv.log, r3d.log)
Medical Research Council: can read (mrc, st, ali, map, rec, mrcs)
Gatan Digital Micrograph: can read (dm3, dm4)
Gatan DM2: can read (dm2)
Bitplane Imaris: can read (ims)
Openlab RAW: can read (raw)
OME-XML: can read, can write, can write multiple (ome, ome.xml)
Leica Image File Format: can read (lif)
Audio Video Interleave: can read, can write, can write multiple (avi)
PICT: can read (pict, pct)
SPCImage Data: can read (sdt)
SPC FIFO Data: can read (spc, set)
Encapsulated Post

In [10]:
!showinf -nopix data/a.fake


Initializing reader
FakeReader initializing data/a.fake
Initialization took 0.079s

Reading core metadata
filename = data/a.fake
Used files = [/System/Volumes/Data/scratch/jamoore/session1/data/a.fake]
Series count = 1
Series #0 :
	Image count = 1
	RGB = false (1) 
	Interleaved = false
	Indexed = false (true color)
	Width = 512
	Height = 512
	SizeZ = 1
	SizeT = 1
	SizeC = 1
	Tile size = 512 x 512
	Thumbnail size = 128 x 128
	Endianness = intel (little)
	Dimension order = XYZCT (certain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = false
	-----
	Plane #0 <=> Z 0, C 0, T 0


Reading global metadata

Reading metadata


In [None]:
showinf -nopix -omexml-only data/a.fake

In [None]:
test -e a.ome.tiff && rm -rf a.ome.tiff

In [None]:
bfconvert data/a.fake a.ome.tiff

In [None]:
showinf cellprofiler/fruit-fly-cells/POS218.pattern/POS218.pattern

In [None]:
echo series

In [None]:
tifffile