# Create MoBIE HTM Project

Create a MoBIE project for high-throughput-microscopy data. The test data for this example is available here: https://owncloud.gwdg.de/index.php/s/eu8JMlUFZ82ccHT. It contains 3 wells  of a plate from a immunofluorescence based SARS-CoV-2 antibody assay from https://onlinelibrary.wiley.com/doi/full/10.1002/bies.202000257.

In [None]:
# general imports
import os
import string
from glob import glob

import mobie
import mobie.htm as htm
import pandas as pd

# the location of the data
# adapt these paths to your system and the input data you are using

# location of the input data. 
# the example data used in this notebook is available via this link:
# https://oc.embl.de/index.php/s/IV1709ZlcUB1k99
example_input_folder = "/home/pape/Work/data/mobie/htm-test-data"

# the location of the mobie project that will be created
# we recommend that the mobie project folders have the structure <PROECJT_ROOT_FOLDER/data>
# the folder 'data' will contain the sub-folders for individual datasets
mobie_project_folder = "/home/pape/Work/data/mobie/mobie_htm_project/data"

# name of the dataset that will be created.
# one project can contain multiple datasets
dataset_name = "example-dataset"
dataset_folder = os.path.join(mobie_project_folder, dataset_name)

# the platform and number of jobs used for computation.
# choose 'local' to run computations on your machine.
# for large data, it is also possible to run computation on a cluster;
# for this purpose 'slurm' (for slurm cluster) and 'lsf' (for lsf cluster) are currently supported
target = "local"
max_jobs = 4

## Adding image data

First, we add all the image data for the 3 wells. Here, we have 3 channels:
- `serum`: showing the measured immunofluorescence of the human serum
- `marker`: showing a marker channel for viral RNA
- `nuclei`: showing the nuclei stained with DAPI

The function `htm.add_images` will add sources to the dataset metadata for all `input_files` that are passed.
It **will not** add corresponding views to show the individual images. Instead, we will add a grid view below that recreates the plate layout and where all image (and segmentation) sources can be toggled on and off.

In [None]:
# the individual images are stored as h5 files in the folder with the example data.
# each hdf5 file contains multiple datasets, each corresponding to a different image channel (or segmentation)
input_files = glob(os.path.join(example_input_folder, "*.h5"))
input_files.sort()

# the resolution in micron for this data, as well as the downscaling factors and chunks to be used in the data conversion
resolution = [0.65, 0.65]
scale_factors = 4 * [[2, 2]]
chunks = [512, 512]

# the 3 image channels (each stored as dataset in the h5 file corresponding to the site)
channels = ["serum", "marker", "nuclei"]
for channel_name in channels:
    # image_names determines the names for the corresponding image sources in MoBIE
    image_names = [os.path.splitext(os.path.basename(im))[0] for im in input_files]
    image_names = [f"{channel_name}-{name}" for name in image_names]

    htm.add_images(input_files, mobie_project_folder, dataset_name,
                   image_names, resolution, scale_factors, chunks, key=channel_name,
                   target=target, max_jobs=max_jobs, file_format="ome.zarr")

## Add segmentation data

Next, we add the segmentation data. Here, we have 2 segmentations per site:
- `cells`: the segmentation of individual cells
- `nuclei`: the segmentation of individual nuclei

`htm.add_segmentations` works very similar to `htm.add_images`.

In [None]:
segmentation_names = ["cells", "nuclei"]
for seg_name in segmentation_names:
    image_names = [os.path.splitext(os.path.basename(im))[0] for im in input_files]
    image_names = [f"segmentation-{seg_name}-{name}" for name in image_names]
    
    htm.add_segmentations(input_files, mobie_project_folder, dataset_name,
                          image_names, resolution, scale_factors, chunks, key=f"segmentation/{seg_name}",
                          target=target, max_jobs=max_jobs, file_format="ome.zarr")

## Add views to create plate layout

Finally, we create the view with the plate layout and data, using MoBIE `grid` transformations and `regionDisplays`.
In addition to the layout, we can also add tables associated with wells, or with individual sites (=image positions). Here, we can use the example table for our test data from: https://owncloud.gwdg.de/index.php/s/m1ILROJc7Chnu9h

In [None]:
# first, we need to define function that translate source names to site names, site_names to well names and 
# that return the 2d grid position for a given well


# extract the site name (= Well name and position in well for an image)
# here, the site name comes in the source name after the source prefix, i.e.
# source_name = f"{prefix}_{site_name}"
def to_site_name(source_name, prefix):
    return source_name[(len(prefix) + 1):]


# extract the well name from the site name.
# here, the site name consists of well name and position in the well, i.e.
# source_name = f"{well_name}_{position_in_well}"
def to_well_name(site_name):
    return site_name.split("_")[0]


# map the well name to its position in the 2d grid
# here, the Wells are called C01, C02, etc.
def to_position(well_name):
    r,c = well_name[0], well_name[1:]
    r = string.ascii_uppercase.index(r)
    c = int(c) - 1
    return [c, r]

In [None]:
# all our source prefixes (= image channel / segmentation names)
# and the corresponding source types
source_prefixes = ["nuclei", "serum", "marker", "segmentation-cells", "segmentation-nuclei"]
source_types = ["image", "image", "image", "segmentation", "segmentation"]

In [None]:
# compute the contrast limits for the image channels
# (this is not strictly necessaty, but usually very beneficial for htm data to obtain a reasonable visualization of the data)
clims_nuclei = htm.compute_contrast_limits("nuclei", dataset_folder, lower_percentile=4, upper_percentile=96, n_threads=max_jobs)
clims_serum = htm.compute_contrast_limits("serum", dataset_folder, lower_percentile=4, upper_percentile=96, n_threads=max_jobs)
clims_marker = htm.compute_contrast_limits("marker", dataset_folder, lower_percentile=4, upper_percentile=96, n_threads=max_jobs)

# specifiy the settings for all the sources
source_settings = [ 
    # nucleus channel: color blue
    {"color": "blue", "contrastLimits": clims_nuclei, "visible": True},
    # serum channel: color green
    {"color": "green", "contrastLimits": clims_serum, "visible": False},
    # marker channel: color red
    {"color": "red", "contrastLimits": clims_marker, "visible": False},
    # the settings for the 2 segmentations
    {"lut": "glasbey", "visible": False, "showTable": False},
    {"lut": "glasbey", "visible": False, "showTable": False},
]  

In [None]:
# create table for the sites (individual images)

site_table_path = os.path.join(example_input_folder, "site-table.tsv")
site_table = pd.read_csv(site_table_path, sep="\t")

# we need to rename the site name from its representation in the table (C01-0001) to our representation (C01-1)
def rename_site(site_name):
    well, image_id = site_name.split("-")
    image_id = int(image_id)
    return f"{well}_{image_id}"

site_table["sites"] = site_table["sites"].apply(rename_site)

# the first column in tables for a MoBIE region display (which is used internally by the grid view)
# has to be called "region_id"
site_table = site_table.rename(columns={"sites": "region_id"})
print(site_table)

In [None]:
# we can also create a table for the wells; the procedure here is similar to the site table
well_table_path = os.path.join(example_input_folder, "well-table.tsv")
well_table = pd.read_csv(well_table_path, sep="\t")
well_table = well_table.rename(columns={"wells": "region_id"})
print(well_table)

In [None]:
# crate the plate grid view
dataset_folder = os.path.join(mobie_project_folder, dataset_name)
htm.add_plate_grid_view(dataset_folder, view_name="default",
                        source_prefixes=source_prefixes, source_types=source_types, source_settings=source_settings,
                        source_name_to_site_name=to_site_name, site_name_to_well_name=to_well_name,
                        well_to_position=to_position, site_table=site_table, well_table=well_table,
                        sites_visible=False, menu_name="bookmark")

## Validation and remote metadata

In [None]:
# validate that the project was created correctly
mobie.validation.validate_project(mobie_project_folder)

In [None]:
# create the metadata for accessing it remotely

# needs to be adapted to your s3 storage setup
bucket_name = "i2k-2020/mobie_htm_project/data"
service_endpoint = "https://s3.embl.de"

mobie.metadata.add_remote_project_metadata(
    mobie_project_folder, bucket_name, service_endpoint
)

Now you can upload the project to s3 for remote access and sharing it with collaborators.