Skip to content

hubmapconsortium/codex-pipeline

Repository files navigation

Build Status Code style: black

codex-pipeline

A CWL pipeline for processing CODEX image data, using Cytokit.

Pipeline steps

  • Collect required parameters from metadata files.
  • Perform illumination correction with Fiji plugin BaSiC
  • Find sharpest z-plane for each channel, using variation of Laplacian
  • Perform stitching of tiles using Fiji plugin BigStitcher
  • Create Cytokit YAML config file containing parameters from input metadata
  • Run Cytokit's processor command to perform tile pre-processing, and nucleus and cell segmentation.
  • Run Cytokit's operator command to extract all antigen fluoresence images (discarding blanks and empty channels).
  • Generate OME-TIFF versions of TIFFs created by Cytokit.
  • Stitch tiles with segmentation masks
  • Perform downstream analysis using SPRM.

Requirements

Please use HuBMAP Consortium fork of cwltool to be able to run pipeline with GPU in Docker and Singularity containers.
For the list of python packages check environment.yml.

How to run

cwltool pipeline.cwl subm.yaml

If you use Singularity containers add --singularity. Example of submission file subm.yaml is provided in the repo.

Expected input directory and file structure

codex_dataset/
src_data OR raw
    ├── channelnames.txt
    ├── channelnames_report.csv
    ├── experiment.json
    ├── exposure_times.txt
    ├── segmentation.json
    ├── Cyc1_reg1 OR Cyc001_reg001  
    │     ├── 1_00001_Z001_CH1.tif
    │     ├── 1_00001_Z001_CH2.tif
    │     │              ...
    │     └── 1_0000N_Z00N_CHN.tif
    └── Cyc1_reg2 OR Cyc001_reg002  
          ├── 2_00001_Z001_CH1.tif
          ├── 2_00001_Z001_CH2.tif
          │             ...
          └── 1_0000N_Z00N_CHN.tif

Images should be separated into directories by cycles and regions using the following pattern Cyc{cycle:d}_reg{region:d}. The file names must contain region, tile, z-plane and channel ids starting from 1, and follow this pattern {region:d}_{tile:05d}_Z{zplane:03d}_CH{channel:d}.tif.

Necessary metadata files that must be present in the input directory:

  • experiment.json - acquisition parameters and data structure;
  • segmentation.json - which channel from which cycle to use for segmentation;
  • channelnames.txt - list of channel names, one per row;
  • channelnames_report.csv - which channels to use, and which to exclude;
  • exposure_times.txt - not used at the moment, but will be useful for background subtraction.

Examples of these files are present in the directory metadata_examples. Note: all fields related to regions, cycles, channels, z-planes and tiles start from 1, and xyResolution, zPitch are measured in nm.

Output file structure

pipeline_output/
├── expr
│   ├── reg001_expr.ome.tiff
│   └── reg002_expr.ome.tiff
└── mask
    ├── reg001_mask.ome.tiff
    └── reg002_expr.ome.tiff

Where expr directory contains processed images and mask contains segmentation masks. The output of SPRM will be different, see https://github.com/hubmapconsortium/sprm .

Development

Code in this repository is formatted with black and isort, and this is checked via Travis CI.

A pre-commit hook configuration is provided, which runs black and isort before committing. Run pre-commit install in each clone of this repository which you will use for development (after pip install pre-commit into an appropriate Python environment, if necessary).

Building containers

Two Dockerfiles are included in this repository. A docker_images.txt manifest is included, which is intended for use in the build_docker_containers script provided by the multi-docker-build Python package. This package can be installed with

python -m pip install multi-docker-build

Release process

The master branch is intended to be production-ready at all times, and should always reference Docker containers with the latest tag.

Publication of tagged "release" versions of the pipeline is handled with the HuBMAP pipeline release management Python package. To release a new pipeline version, ensure that the master branch contains all commits that you want to include in the release, then run

tag_releae_pipeline v0.whatever

See the pipeline release managment script usage notes for additional options, such as GPG signing.