# Create a tensorstore and view thumbnails

In this colab, we will use image_grid to lay out the example images into a tensorstore and then use this to draw patch thumbnails from the tensorstore.

The image_grid module is built on top of tensorstore as a straightforward way to layout the grid of images in the tensorstore. In our case, we have a set of microscope images from 384 well plates. We want our 2D "map" to lay out our image so that, if you zoom all the way out, you see a grid of the plates and as you zoom in, each well and site are where you'd expect. In this analogy, you can think of the different stains (colors) as layers in the map.

The image_grid module is responsible for laying out the hierarchy of the axes, while tensorstore is the name of the datastorage module itself. In other words, image_grid is a wrapper on top of tensorstore to help make this gridding easier, particuarly with the idea of plates/wells/sites in mind.

In [None]:
!git clone https://github.com/google/cell_img
!pip install --quiet -e cell_img

In [None]:
#@title Restart your colab kernel after the installs above, then run this

import cell_img
from cell_img.common import io_lib
from cell_img.image_grid import ts_write_main_lib
from cell_img.image_grid import ts_write_main
from cell_img.malaria_liver import metadata_lib

import fsspec
import os
import pandas as pd

In [None]:
# Will not be needed for the public Google Research Cloud Bucket
from google.colab import auth
auth.authenticate_user()

In [None]:
DATA_ROOT = 'gs://path/to/your/data'

In [None]:
#GCS_PROJECT = 'your_project'
#GCS_BUCKET = 'your_bucket'
#GCS_REGION = 'your_region'

# Inputs

In [None]:
# Path for where to create the tensorstore instance
TENSORSTORE_PATH = os.path.join(DATA_ROOT, 'tensorstore')

# Path for where to find the CSVs with the information needed to create the
# tensorstore: image paths and where, within the image_grid, to place each image.
METADATA_ROOT = os.path.join(DATA_ROOT, 'tensorstore/metadata')
TS_CSV_FOLDER = 'tensorstore'

In [None]:
# Get the list of CSVs with images to put into tensorstore
image_csv_list = ['gs://' + x.path for x in fsspec.open_files(
    os.path.join(METADATA_ROOT, TS_CSV_FOLDER, '*.csv'))]

In [None]:
# Validation 1 : Which files are we using?
print('The following files will be used:\n   %s' % '\n   '.join(image_csv_list))

In [None]:
# Validation 2: Look at one CSV to see the columns
df = pd.read_csv(image_csv_list[0], dtype=str)

df.sample(5)

# Each row is one image to be added to the image_grid
# The image_grid can be thought of as a large map. We set it up to put each
# plate/well/site into a large 2D grid, then use channel as the third dimension,
# like layers in a map.

# plate_uid, well, site and channel help users understand the image location
# channel is which stain (color) the image is in, used for the third dimension
#   in the image_grid.
# image_path indicates where the file containing the image is found.
# plate_row, plate_col, well_row, well_col, site_row and site_col all work to
#   identify where in the large map this particular image should be placed.
#   This is done by uses them as the x_axis_wrap and y_axis_wrap in the
#   image_grid ts_write_main_lib function below.

In [None]:
# Validation 3: Check that the image dimensions are the same in every CSV
first_csv_shape = None

for ic in image_csv_list:
  image_df = io_lib.read_csv(ic, dtype=str)
  for x in [0]:
    img_path = image_df.image_path.iloc[x]
    img_arr = io_lib.read_image(img_path)
    print(img_arr.shape, img_path)
    if not first_csv_shape:
      first_csv_shape = img_arr.shape
    elif img_arr.shape != first_csv_shape:
      raise ValueError(
          'The first csv had shape %s,\n but a test image from %s\n has shape %s instead.' % (
              first_csv_shape, img_arr.shape))

print('Image shape test successful!')

# Run Cloud DataFlow job to create the tensorstore

In [None]:
options = ts_write_main.get_pipeline_options(GCS_PROJECT, GCS_BUCKET, GCS_REGION)

pipeline_result = ts_write_main_lib.run_write_to_ts_pipeline(
    tensorstore_path=TENSORSTORE_PATH,
    create_new_tensorstore=True,
    allow_expansion_of_tensorstore=False,
    image_metadata_path=image_csv_list,
    image_path_col='image_path',
    axes=['Y', 'X', 'channel'],
    x_axis_wrap=['plate_col', 'well_col', 'site_col'],
    y_axis_wrap=['plate_row', 'well_row', 'site_row'],
    pipeline_options=options)

url = ('https://console.cloud.google.com/dataflow/jobs/%s/%s?project=%s' %
       (pipeline_result._job.location, pipeline_result._job.id,
        pipeline_result._job.projectId))
print(url)

If you later wanted to expand this tensorstore with new images, the command would be:

```
pipeline_result = ts_write_main_lib.run_write_to_ts_pipeline(
    tensorstore_path=TENSORSTORE_PATH,
    create_new_tensorstore=False,
    allow_expansion_of_tensorstore=False,
    image_metadata_path=image_csv_list,
    image_path_col='image_path',
    axes=None,
    x_axis_wrap=False,
    y_axis_wrap=False,
    pipeline_options=options)
```

A few notes for expanding, as opposed to creating:
* "allow_expandsion_of_tensorstore" is whether the 2D "map" is allowed to grow. If you know your new images are within the boundaries of the existing map, leave this set to False to avoid accidentally expanding the map due to typos. Often, however, you may be adding new rows or columns that push your map bigger. In this, set this value to be True.
* Note that in this call the axes, x_axis_wrap, and y_axis_wrap values are not passed in. When adding to an existing image_grid, you use the already-defined axes from the creation call.

# Visualizing the image_grid

[Neuroglancer](https://github.com/google/neuroglancer) is a great way to visualize your whole grid of images, with a user interface somewhat like Google Maps, that lets you zoom and turn layers on and off. The tensorstore set up by image_grid is designed to be used by Neuroglancer.

You can also use the tensorstore to grab patch images given the location of the patch (e.g. plate, well, site, x, y). This example data has the metadata files set up to be used by the cell_img thumbnail code.

In [None]:
# set up the object to create images based on metadata

# Set up the alignment between our channel names and RGB
CHANNEL_TO_RGB = ['w3', 'w2', 'w1']
meta_ts = metadata_lib.MetadataIndex(
    TENSORSTORE_PATH, CHANNEL_TO_RGB, METADATA_ROOT)

In [None]:
# Load the example patch csv for testing
# read the CSV with dtype str to preserve string formatting on plates and sites.
example_df = pd.read_csv(
    os.path.join(DATA_ROOT, 'emb_data/example_patches.csv'),
    dtype=str)
# convert the center_row and center_col to ints
example_df['center_row'] = example_df['center_row'].astype(int)
example_df['center_col'] = example_df['center_col'].astype(int)

In [None]:
# Look at the examples - what are the columns, what does the data look like?
example_df.sample(3)

In [None]:
# Look at the sample, how many parasites of each life cycle stage are available?
example_df.stage_result.value_counts()

In [None]:
# Look at the examples, how many from each plate do we have?
example_df.plate.value_counts()

In [None]:
# First, let's show some hypnozoite examples
# This function takes in a dataframe with batch/plate/well/site and columns
# for the x/y within the site (center_col and center_row in this case),
# grabs thumbnails for each row and displays them.
_ = meta_ts.contact_sheet_for_df(
    example_df=example_df.query('stage_result == "hypnozoite"').sample(6),
    patch_size=50, ncols=3, nrows=2,
    name_for_x_col='center_col', name_for_y_col='center_row')

In [None]:
# The schizonts are bigger, let's show a larger patch size for those
# (you'll see that the blue liver nuclei in the patches below are much smaller
# than the ones in the patches above, because each patch here is a 150x150
# square instead of the 50x50 above.)
_ = meta_ts.contact_sheet_for_df(
    example_df=example_df.query('stage_result == "schizont"').sample(4),
    patch_size=150, ncols=2, nrows=2,
    name_for_x_col='center_col', name_for_y_col='center_row')

# Examining / Validating the image_grid

This section reads in the image_grid you have created to help understand the setup (and these tools are useful in debugging).

For example, an error might be:

```
ValueError: Cannot write the new images without expanding the tensorstore first.
Set flag allow_expansion_of_tensorstore to True.
axis_to_values_to_add={'plate_col': ['11']}
```

This error indicates that you are trying to expand the rectangle of the tensorstore, the 2D "map" (so, if you zoom way way out, the full area of the rectangle would get bigger). In this example case, the rectangle width is defined by the plate_col, which is set to be the last 2 digits of the plate. And then rectangle height is defined by the plate_row, which is the first digits of the plate.

This error message is saying that there is currently not a column "11" in your 2D map, and if you want to let this plate into the grid in the space where it should be, you'll need to set the expansion flag to true. Note that this new plate_col will be added to the far right of the existing map. So even if plate_col 10 and 12 exist before this new plate is added, plate_col 11 cannot go between them - the expansion must always be down and to the right, existing data cannot be shifted.

Looking at the existing_dataset.spec below can show you the values that currently exist in the tensorstore so you can debug whether your data should expand the store.

In [None]:
from cell_img.image_grid import downsample_lib
from cell_img.image_grid import ts_metadata_lib
from cell_img.image_grid import ts_write_lib

In [None]:
# Repeating the code in _maybe_expand_tensorstore in colab
tensorstore_path_s0 = downsample_lib.join_downsample_level_to_path(TENSORSTORE_PATH, 0)
spec_without_metadata = ts_write_lib.create_spec_from_path(tensorstore_path_s0)
existing_dataset = ts_write_lib.open_existing_tensorstore(spec_without_metadata)

In [None]:
existing_dataset.spec