Cytometry analysis pipeline for large and complex datasets (CAPX) (beta)
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
Demo dataset

Cytometry analysis pipeline for large and complex datasets (CAPX).

The Cytometry Analysis Pipeline for large and compleX datasets (CAPX) is a set of scripts that brings together existing clustering and data visualisation tools into a single pipeline. This is used as an analysis pipeline at the Sydney Cytometry facility for high-dimensional flow and mass cytometry data.

How to download

Go to 'releases' above ( and download source code for the latest version.


If you use these scripts in your work, please cite this github using the information below. You can cite the specific version that you used in your work (most recent version = v2.5).

Ashhurst, T. M. (2018). Cytometry Analysis Pipeline for large and compleX dataests v2.5. GitHub repository. DOI: TBC, repository:


Data (including datasets of tens of millions of cells) is clustered using FlowSOM (, subsampled (with differential downsampling options), and visualisated using tSNE (, via the rtsne package ( Subsequently, this script will also use the code from 'tSNEplots' ( to generate coloured tSNE plot images for each marker and each sample. Other scripts can be used on the output data files to give an identity to each cluster (ClusterPlots, SumTables, HeatMap_MFI).

Version history

v1.0-beta - pre-release of v1.0. Fully functioning scripts, but two bugs present: a) reading .fcs files not currently compatible with the rbindlist package, so only reading .csv files will work; b) some users report that when output .fcs files are loaded in FlowJo, the ability to manipulate the axis has been removed.

Extra scripts for additional functions

tSNEplots ( can be used to automatically create a coloured tSNE plot for every marker and sample (and group).

AutoGraph ( can be used to automatically plot dot plots to compare measurements (cells per tissue, median fluorescence intensity (MFI) etc) of each cluster/population between groups.

The following scripts provide helpful functionality, and can be found in CytoTools

SumTables - generates a table summarising the analysed dataset: samples vs clusters -- number of cells per cluster per sample, MFI of each marker on each cluster in each sample, etc.

HeatMaps - generates a heatmap for measuring the number of cells in each cluster in each sample or the MFI for each marker on each cluster, per sample. Includes 'fold-change' visualisation options.

ClusterPlots - automated generation of coloured tSNE plots showing clusters.

FlexiPlots - a script with adjustable parameters for visualising plots


Packages used














Specific references

Sofie Van Gassen, Britt Callebaut and Yvan Saeys (2017). FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data.,

L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9(Nov):2579-2605, 2008.

L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research 15(Oct):3221-3245, 2014.

Jesse H. Krijthe (2015). Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation, URL: