Batch processing for image-analysis on high-throughput screening data; developed for a Covid19 IF assay. See Microscopy-based assay for semi-quantitative detection of SARS-CoV-2 specific antibodies in human sera for a full description of the assay.
- install the conda environment via
conda env create -f environment-gpu.yamlorconda env create -f environment-gpu.yaml -n custom_env_name. - activate the environment and install
batchlibusingsetup.py, e.g. via runningpip install -e .in this directory to install in development mode - to check your installation, go to the
antibodiesdirectory and run the following example:
python cell_analysis_workflow.py -c configs/test_cell_analysis.confThis should run through without throwing an error and create a folder test locally containing 9 h5 files with the results per image.
The analysis workflows for antibody screening are in the folder antibodies.
For now, we have one stable workflow:
cell_analysis_worfklow: cell-instance segmentation based analysis workflow
These scripts use configargparse to read options from a config file and enable over-riding options
from the command line. The default configurations to run on ialgpu03 are in antibodies/configs.
Workflows operate on a single folder containing the data to be processed for a given experiment (usually all images from one plate). They consist of indvidual jobs that apply an operation to all files that match a given pattern. Jobs can be rerun on a folder when more data has been added and will only process the new files.
There are som more advanced methods of execution, they can be activated by passing the corresponding flag to run_workflow or the job's __call__ method:
force_recompute: Run computation for ALL files.ignore_invalid_inputs: Don't throw an error if there are invalid input files, but continue computing on the valid ones.ignore_failed_outputs: Don't throw an error if the computation fails, but continue with the next job,
- Can I see the progress of my job? All files related to running workflows are stored in the
batchlibsubfolder of the folder where the data is being processed. There is a.statusfile for the workflows and individual jobs that keep track of the progress. There is also a.logfile that contains everything that was logged. - I have failed inputs or outputs. Look into the
.statusfile of the job, It will contain the paths to the files for which it failed. Fix the issues and rerun the job. - I try to run a workflow but it does not start. There is a
.lockfile in thebatchlabfolder that prevents multiple workflows being run on the same folder at the same time. It might not be deleted properly if a job gets killed or segaults. Just delete it and rerun.
The intermediate image data associated with one raw image is stored in an h5 container with the same name as the image.
All images are stored with one group per image channel. The group layout for a channel called data looks like this:
/data
/s0
/s1
/s2
...
The sub-datasets sI store the channel's image data as multi-scale image pyramid.
In addition, the group data contains metadata to display the image in the plateViewer fiji plugin.
Tables are stored in the group /tables with the following layout:
/tables
/table-name (can be nested)
/cells (contains the table values as 2d dataset of strings)
/columns (contains the column names as 1d dataset of strings)
/visible (should the columns be shown in the plate-viewer? 1d dataset of bools)
Three different kinds of tables are supported by the plate viewer:
cell tables: containing object level information for the cell segmentation. Must be stored in/tables/<NAME OF SEGMENTATION>/<NAME>and the first column must contain the corresponding object ids and be calledlabel_idimage tables: containing image level information. Must be stored in a separate file:<PLATE NAME_tables.hdf5> (note that we don't use the.h5extension to avoid matching this file). In this file, it must be stored in/tables/images/<NAME>(the plate-viewer will load the table calleddefaulton start, but other tables can be selected) . The first column must contain the image file name and be calledimage_name, the second column must contain the site name,<WELL NAME-ID IN WELL>and be calledsite name.well_tables: containing well level information. Must be stored in same file asimage tablesin/tables/wells/<NAME>. The first column must cotain the well name and be calledwell_name.
- Inherit from
batchlib.BatchJobor batchlib.BatchJobOnContainer` - Implement
self.runwith function signaturerun(self, input_files, output_files, **kwargs) - Constraints:
- Output should be a single file per input file. If you need multiple files per input, create a sub-directory and store them in there.
- For image data, intermediate formats are either
h5orn5. Use the methodsread_image/write_imageto read / write data in the batchlib data model. - Use
batchlib.io.open_filein your job instead ofh5py.Fileto support bothh5andn5 - Jobs should always be runnable with cpu only and should default to running on the cpu. gpu support should be activated via kwarg in run method.
Global log level can be passed via an environment variable LOGLEVEL during the workflow execution, e.g.
LOGLEVEL=DEBUG python cell_analysis_workflow.py -c configs/test_cell_analysis.confThe workflow logger (named Workflow) is where all of the file/console handlers are registered, so make sure any new
logger is a child of the Workflow logger, e.g. Workflow.MyJob. All log events in the child loggers will automatically
be propagated to the parent Workfow logger. As an example:
from batchlib.util.logger import add_file_handler, get_logger
root_logger = get_logger('Workflow') # get root logger
add_file_handler(root_logger, 'work_dir', 'workflow_name') # add file handler to a given workflow
# in the job class or somewhere else
logger = get_logger('Workflow.Job1') # this is the child logger of the 'Workfow' logger
logger.info('some message') # the message will be propagated to all the handlers registered in the parent logger, i.e. STDOUT and FILE