In [1]:
%matplotlib inline
import os
import tqdm
import tempfile
import numpy as np
import urllib.request
import os.path as osp
import matplotlib.pyplot as plt
from cytokit import io as ck_io
from cytokit import config as ck_config
from cytokit.function import data as ck_data
from cytokit.image import proc as ck_img_proc

In [10]:
# Raw image directory (will be read-only after initial download)
raw_dir  = "/home/erika/Documents/Projects/CODEX/Data"

# Output directory (will contain all processed images, csvs, and fcs files)
out_dir = osp.join(raw_dir, 'output')
if not os.path.exists(out_dir):
    os.makedirs(out_dir)


/home/erika/Documents/Projects/CODEX/Data/output


## Configuration

All Cytokit experiment configurations are represented as yaml documents and a typical use case involving these documents would be to define one "template" or "base" configuration for a project that contains all information likely to be shared across replicates or varying parameterizations in processing.  

While it would be possible to manually create separate configuration files for each of these, the primary purpose of Cytokit is to make batch processing like this more manageable across many replicates and processing configurations.  To this end, the ```cytokit config editor``` command can be used to make small changes to a template configuration and an example of this is shown below.

See [experiment_pha.yaml](https://github.com/hammerlab/cytokit/blob/ba73d8b7d9dd4b3286df8bd8afe22826bd7e44f9/pub/config/cellular-marker/experiment_pha.yaml) for more details on the base configuration being modified here.  Note below that nearly all the changes made to the configuration are related to defining a subset of the original experiment and in particular, ```index_symlinks``` provides a way to remap any of the indexes mentioned as required in image paths in the section above to a new range.  This is used below to select 7 of 25 z-planes (from index 14 to index 20) but could also be used to define tile or channel subsets as well:

In [16]:
%%bash -s "$out_dir"

# The config editor requires a base configuration as well as the target output location in which the resulting config should be stored.
# The convention used (in our lab at least) is to name "variants" of an experiment with a tag like "v00" or "v01" where each represents
# a different processing configuration and both the config file as well as all associated data will be stored in a directory named by that tag.
#cytokit config editor --base-config-path=$HOME/github/cytokit/pub/config/cellular-marker/experiment_pha.yaml --output-dir=$1 \
cytokit config editor --base-config-path=$HOME/Documents/Projects/CODEX/Data/Tonsil_betaTEST_sfter2.yaml --output-dir=$1 \
set name 'Tonsil_betaTEST' \
show acquisition.num_cycles \
save_variant v00/config \
exit

Configuration values present under property "acquisition.num_cycles":
4


2020-11-03 16:47:46,617:INFO:32105:cytokit.cli.config: Configuration saved to path "/home/erika/Documents/Projects/CODEX/Data/output/v00/config/experiment.yaml"


In [17]:
# Show part of the resulting config, primarily with the fields modified specifically for this example
!cat $out_dir/v00/config/experiment.yaml | head -n 26

acquisition:
  axial_resolution: 1500.0
  channel_names:
  - DAPI
  - blank
  - blank
  - blank
  - DAPI
  - cy3-ki67
  - cy5-CD107a
  - cy7-CD20
  - DAPI
  - cy3-cd8
  - cy5-CD45a
  - cy7-PanCK
  - DAPI
  - blank
  - blank
  - blank
  emission_wavelengths:
  - 465
  - 561
  - 673
  - 773
  lateral_resolution: 325.0
  magnification: 20


---

## Processing

There are generally 3 stages related to processing an experiment that need to be run and each has a separate CLI:

- **processor** - This is the main pipeline responsible for all image pre-processing and segmentation
- **operator** - This application is used to build image extractions or montages for relevant subsets of the ```processor``` outputs (multiplexed imaging experiments are often far too large to visualize/analyze all at once with external applications like ImageJ)
- **analysis** - This application is used to run template notebooks or numeric data aggregations 

While it is possible to parameterize these CLI commands individually (which is useful for ad-hoc operations), a more common use case is to define what they do and how they do it in the configuration file and simply execute the "run_all" command.  This acts as a way to move common operations to read-only locations for reproducibility and documentation.

For example, below is the usage for the ```cytokit processor run``` command which takes many possible arguments:

In [18]:
!cytokit processor --config-path=$out_dir/v00/config --data-dir=$raw_dir run -- --help

[1mNAME[0m
    cytokit processor run --config-path=/home/erika/Documents/Projects/CODEX/Data/output/v00/config run - Run processing and cytometry pipeline

[1mSYNOPSIS[0m
    cytokit processor run --config-path=/home/erika/Documents/Projects/CODEX/Data/output/v00/config run [4mOUTPUT_DIR[0m <flags>

[1mDESCRIPTION[0m
    This application can execute the following operations on either raw or already processed data:
        - Drift compensation
        - Deconvolution
        - Selection of best focal planes within z-stacks
        - Cropping of tile overlap
        - Cell segmentation and quantification
        - Illumination correction
        - Spectral Unmixing

    Nothing beyond an input data directory and an output directory are required (see arguments
    below), but GPU information should be provided via the `gpus` argument to ensure that
    all present devices are utilized.  Otherwise, all arguments have reasonable defaults that
    should only need 

In [19]:
# Show configuration fields with processor arguments
!cat $out_dir/v00/config/experiment.yaml | grep -A 6 processor