# Accessing cloud-hosted image data (practical)


<p><b>Image Data:</b><br/>
   data management, standard image file format<br/>
    for sharing big image data in the cloud, and image data repositories</p>
<p><i>
Day 3: Friday, 29 January 2021 – Standard image file format for sharing big image data in the cloud
</i></p>



## Outline
1. Notebook reminders
2. Choosing our data & downloading from S3
3. Converting your data to OME-NGFF
4. Publishing your data with S3

***

## 1. Notebook reminders
This notebook is somewhat unusual in that we use a lot of command-line tools. Each of the lines beginning with a exclamatino mark (`!`) is run in a terminal. You can perform those actions on your own system _without_ a Jupyter notebook. For that, you will need to install the correct tools like `bioformats2raw` or run everything via `repo2docker`. See instructions under https://github.com/joshmoore/NGFF-GBI-2021-Workshop/blob/main/README.md


In [None]:
!conda info --envs

In [None]:
!cat binder/environment.yml

In [None]:
!pwd

## 2. Data from S3
We're going to start off by looking at some images you will likely have seen during the OMERO or IDR sessions:

<table>
    <tr>
        <td>
            <img alt="idr0062 thumbnails" src="images/training-1.png" style="height:150px"/>
        </td>
        <td>
            <img alt="idr0062 thumbnails" src="images/training-2.png" style="height:150px"/>
        </td>
    </tr>
</table>
    
These images were using in the ilastik plugin guide presented by Petr: https://omero-guides.readthedocs.io/en/latest/ilastik/docs/ilastik_fiji.html


The original dataset can be found in IDR study idr0062: https://workshop.openmicroscopy.org/webclient/?show=dataset-6179

Our goal is to share these *without* using an OMERO.


## 2.1 Minio client

There are a number of different types of cloud storage and there are a number of tools that you can use to access your cloud storage, but here we're going to focus on a single one `mc`.

`mc` is provided by the minio project and is described as "a modern alternative to UNIX commands like ls, cat, cp, mirror, diff, find etc." The quickstart guide can be found under https://docs.minio.io/docs/minio-client-quickstart-guide.html For our purposes we'll focus on how to use it to upload and manage data in S3.

## 2.2 Connecting

The minio project provides a safe space for you to learn about S3: https://play.minio.io:9000/minio/ Here we've used the `mc` command to find the access information:

 * "AccessKey" is basically a user name.
 * "SecretKey" is basically a password. 
 * The URL is our "endpoint", which differentiates it from the S3 servers provided by Amazon.

You can log in to the webpage and explore what the many other users have upload at https://play.minio.io:9000/minio/

The other two important concepts are:
 * "buckets" which is roughly like a shared namespace with permissions
 * and "keys" which will get to in a second.

In [3]:
!mc config host list play

[m[36;1mplay
[0m[33m  URL       : https://play.min.io
[0m[36m  AccessKey : Q3AM3UQ867SPQQA43P2F
[0m[36m  SecretKey : zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG
[0m[34m  API       : S3v4
[0m[36m  Path      : auto
[0m
[0m

But EMBL has kindly provided us a bucket for this session which we need to connect to:

In [16]:
!mc config host add gbi https://s3.embl.de bioim-user PLEASE_FIX_ME

[31;3;1mmc: <ERROR> [0m[31;3;1mUnable to initialize new alias from the provided credentials. The request signature we calculated does not match the signature you provided. Check your key and signing method.
[0m

In [5]:
!mc ls gbi/bioim

[m[32m[2021-01-12 11:43:25 CET][0m[33m    13B[0m[1m README.md[0m
[0m[m[32m[2021-01-21 18:26:48 CET][0m[33m  32KiB[0m[1m s3-browser.html[0m
[0m[m[32m[2021-01-25 16:36:45 CET][0m[33m     0B[0m[36;1m idr0062-tiffs/[0m
[0m[m[32m[2021-01-25 16:36:45 CET][0m[33m     0B[0m[36;1m idr0062-zarrs/[0m
[0m

In [7]:
!mc ls gbi/bioim/idr0062-tiffs/

[m[32m[2021-01-21 18:07:25 CET][0m[33m  68MiB[0m[1m B1_C1.tif[0m
[0m[m[32m[2021-01-21 18:07:25 CET][0m[33m  67MiB[0m[1m B1_C1_Manual.tif[0m
[0m[m[32m[2021-01-21 18:07:25 CET][0m[33m  66MiB[0m[1m B1_C2.tif[0m
[0m[m[32m[2021-01-21 18:07:25 CET][0m[33m  66MiB[0m[1m B1_C2_Manual.tif[0m
[0m[m[32m[2021-01-21 18:07:25 CET][0m[33m 120MiB[0m[1m B2_C1.tif[0m
[0m[m[32m[2021-01-21 18:07:25 CET][0m[33m 120MiB[0m[1m B2_C1_Manual.tif[0m
[0m[m[32m[2021-01-21 18:07:25 CET][0m[33m  42MiB[0m[1m B2_C2.tif[0m
[0m[m[32m[2021-01-21 18:07:25 CET][0m[33m  42MiB[0m[1m B2_C2_Manual.tif[0m
[0m[m[32m[2021-01-21 18:07:29 CET][0m[33m  84MiB[0m[1m B3.tif[0m
[0m[m[32m[2021-01-21 18:07:29 CET][0m[33m  83MiB[0m[1m B3_Manual.tif[0m
[0m[m[32m[2021-01-21 18:07:29 CET][0m[33m  49MiB[0m[1m B4_C1.tif[0m
[0m[m[32m[2021-01-21 18:07:29 CET][0m[33m  49MiB[0m[1m B4_C1_Manual.tif[0m
[0m[m[32m[2021-01-21 18:07:29 CET][0m[3

In [8]:
!mc ls gbi/bioim/idr0062-zarrs/

[m[32m[2021-01-25 16:38:24 CET][0m[33m     0B[0m[36;1m 6001237.zarr/[0m
[0m[m[32m[2021-01-25 16:38:24 CET][0m[33m     0B[0m[36;1m 6001238.zarr/[0m
[0m[m[32m[2021-01-25 16:38:24 CET][0m[33m     0B[0m[36;1m 6001239.zarr/[0m
[0m[m[32m[2021-01-25 16:38:24 CET][0m[33m     0B[0m[36;1m 6001240.zarr/[0m
[0m[m[32m[2021-01-25 16:38:24 CET][0m[33m     0B[0m[36;1m 6001241.zarr/[0m
[0m[m[32m[2021-01-25 16:38:24 CET][0m[33m     0B[0m[36;1m 6001242.zarr/[0m
[0m[m[32m[2021-01-25 16:38:24 CET][0m[33m     0B[0m[36;1m 6001243.zarr/[0m
[0m[m[32m[2021-01-25 16:38:24 CET][0m[33m     0B[0m[36;1m 6001244.zarr/[0m
[0m[m[32m[2021-01-25 16:38:24 CET][0m[33m     0B[0m[36;1m 6001245.zarr/[0m
[0m[m[32m[2021-01-25 16:38:24 CET][0m[33m     0B[0m[36;1m 6001246.zarr/[0m
[0m[m[32m[2021-01-25 16:38:24 CET][0m[33m     0B[0m[36;1m 6001247.zarr/[0m
[0m[m[32m[2021-01-25 16:38:24 CET][0m[33m     0B[0m[36;1m 6001248.zarr/[

## 2.3 Your first download

Now you can see list the contents of the bucket and see that it's there. Note that since other people are working on the same bucket, it may look slightly different each time you run this command.

In [10]:
!mc cp gbi/bioim/idr0062-tiffs/B1_C1.tif /tmp/

...B1_C1.tif:  67.55 MiB / 67.55 MiB  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓  994.88 KiB/s 1m9s[0m[0m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[

In [15]:
!ls -ltrah /tmp/B1_C1.tif

-rw-r--r--  1 jamoore  wheel    68M Jan 25 16:42 /tmp/B1_C1.tif


# 3 Converting your data to OME-NGFF

# 3.1 Conversion tools

https://forum.image.sc/t/converting-whole-slide-images-to-ome-tiff-a-new-workflow/32110/4

<img src="blog-2019-12-converting-whole-slide-images.jpg" style="height:300px" />



In [None]:
%%time
!bioformats2raw --help

## Required OME-Zarr options
Two of the options are currently necessary to produce OME-Zarr data:
```
      --file_type=<fileType>
                     Tile file extension: n5, zarr (default: n5) [Can break
                       compatibility with raw2ometiff]
```
and
```
     --dimension-order=<dimensionOrder>
                     Override the input file dimension order in the output file
                       [Can break compatibility with raw2ometiff] (XYZCT,
                       XYZTC, XYCTZ, XYCZT, XYTCZ, XYTZC)
```

`--file_type` which produces Zarr output rather than N5 as the intermediate format. If we additionally pass the `--dimension-order` argument, then the intermediate result can be used directly by the ome-zarr library.

In [None]:
%%time
!bioformats2raw i2k2020.gif $PWD --file_type=zarr --dimension-order=XYZCT --scale-format-string=gif.zarr/%d

In [None]:
!ls *.ome.xml data.zarr

## 4. Publishing your data with S3

You can then move the generated output to S3

In [None]:
!mc cp --recursive data.zarr/gif.zarr/ play/i2k2020/gif.zarr/

You can see your image under http://hms-dbmi.github.io/vizarr?source=https://play.minio.io:9000/i2k2020/gif.zarr

In [None]:
!mc cat play/i2k2020/gif.zarr/.zattrs

Now visit https://play.minio.io:9000/i2k2020/hello.txt to see what you uploaded with the key of `hello.txt`. It's no longer a file. It's now in **object storage**.

## Content type (i.e. metadata)

We can also upload HTML:

In [None]:
!mc cp hello.html play/i2k2020/hello.html

which will render under https://play.minio.io:9000/i2k2020/hello.html However, if the name doesn't match the contents, the file will be downloaded rather than opened:

In [None]:
!mc cp hello.json play/i2k2020/hello.data

it may be downloaded: https://play.minio.io:9000/i2k2020/hello.data rather than shown. To fix that, you can add a content type:

In [None]:
!mc cp --attr=Content-Type=text/plain hello.txt play/i2k2020/hello.txt

## Naming (i.e. keys)

Another important distinction to filesystems is that though it looks like hello is in a directory, you should really think of the entire string after the bucket just as a "key".

In [None]:
!mc rm play/i2k2020/hello.txt

In [None]:
!mc cp hello.txt play/i2k2020/josh.txt

![object storage comparison](https://mk0openioo80ctbhsnje.kinstacdn.com/wp-content/uploads/2019/09/oio-block-files-object-storage-compared.png)

*from https://www.openio.io/blog/block-file-object-storage-evolution-computer-storage-systems*

The two basic commands are `bioformats2raw` and `raw2ometiff`. Together they provide a pipeline to scalably convert large images into OME-TIFF. The primary caveat is that they require **twice** the storage for the conversion.

### License
Copyright (C) 2019-2020 University of Dundee. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details. You should have received a copy of the GNU General
Public License along with this program; if not, write to the
Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

# ITEMS TO PROCESS:

* https://en.wikipedia.org/wiki/Comparison_of_web_browsers#Image_format_support

In [None]:
! pip install omero-cli-zarr

In [None]:
omero zarr export Image:6001240