forked from labsyspharm/naivestates
-
Notifications
You must be signed in to change notification settings - Fork 1
/
naivestates.xml
191 lines (155 loc) · 8.86 KB
/
naivestates.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
<tool id="naivestates" name="naivestates" version="@VERSION@.0" profile="17.09">
<description> Inference of cell states using Naive Bayes</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="requirements"/>
@VERSION_CMD@
<command detect_errors="exit_code"><![CDATA[
@CMD_BEGIN@
-i '$counts'
#if $markers
-m $markers
#end if
--mct $mct
-p $plots
#if $id
--id $id
#end if
--log $log
#if $sfx
--sfx $sfx
#end if
#if $umap
--umap
#end if
--comb $comb
-o .
&&
mv *-states.csv states.csv;
#if $plots != "off"
mv plots/*-probs.${plots} plots/probs.${plots};
mv plots/*-summary.${plots} plots/summary.${plots};
mv plots/*-allfits.${plots} plots/allfits.${plots};
#end if
]]></command>
<inputs>
<param name="counts" type="data" format="csv" label="Quantified Cell Matrix"/>
<param name="markers" type="data" format="txt" optional="true" label="Markers to model"/>
<param name="mct" type="data" format="csv" label="Marker-State Association Map"/>
<param name="plots" type="select" label="Generate plots showing the fit">
<option selected="true" value="png">png</option>
<option value="pdf">pdf</option>
<option value="off">off</option>
</param>
<param name="id" type="text" value="" label="Column name containing cell IDs"/>
<param name="log" type="select" label="Log Transform" help="Whether to apply a log transform">
<option selected="true" value="auto">auto</option>
<option value="yes">yes</option>
<option value="no">no</option>
</param>
<param name="sfx" type="text" value="_cellMask" optional="true" label="Common suffix" help="Common suffix on marker columns (e.g., _cellMask)"/>
<param name="comb" type="select" label="">
<option selected="true" value="gmean">gmean</option>
<option value="hmean">hmean</option>
</param>
<param name="umap" type="boolean" checked="true" label="Generate UMAP plots"/>
</inputs>
<outputs>
<data format="csv" name="states" from_work_dir="states.csv" label="${tool.name} on ${on_string}: States CSV"/>
<data format="png" name="probs-png" from_work_dir="plots/probs.png" label="${tool.name} on ${on_string}: Probabilities">
<filter>plots == 'png'</filter>
</data>
<data format="png" name="summary-png" from_work_dir="plots/summary.png" label="${tool.name} on ${on_string}: Summary">
<filter>plots == 'png'</filter>
</data>
<data format="png" name="allfits-png" from_work_dir="plots/allfits.png" label="${tool.name} on ${on_string}: AllFits">
<filter>plots == 'png'</filter>
</data>
<data format="pdf" name="probs-pdf" from_work_dir="plots/probs.pdf" label="${tool.name} on ${on_string}: Probabilities">
<filter>plots == 'pdf'</filter>
</data>
<data format="pdf" name="summary-pdf" from_work_dir="plots/summary.pdf" label="${tool.name} on ${on_string}: Summary">
<filter>plots == 'pdf'</filter>
</data>
<data format="pdf" name="allfits-pdf" from_work_dir="plots/allfits.pdf" label="${tool.name} on ${on_string}: AllFits">
<filter>plots == 'pdf'</filter>
</data>
</outputs>
<help><![CDATA[
naivestates - Inference of cell states using Naive Bayes
This work is supported by the NIH Grant 1U54CA225088: Systems Pharmacology of Therapeutic and Adverse Responses to Immune Checkpoint and Small Molecule Drugs and by the NCI grant 1U2CCA233262: Pre-cancer atlases of cutaneous and hematologic origin (PATCH Center).
Introduction
naivestates is a label-free, cluster-free tool for inferring cell types from quantified marker expression data, based on known marker <-> cell type associations. The tool is designed to be run as a Docker container, but can also be installed in a Conda environment or as an R package. naivestates expects as input information about marker expression on a per-cell basis, provided in .csv format. One of the columns must contain cell IDs. An example input file may look as follows:
CellID,KERATIN,FOXP3,SMA
1,64.18060200668896,193.00334448160535,303.5016722408027
2,54.850202429149796,151.19433198380565,176.3846153846154
3,63.94712643678161,210.43218390804597,483.9448275862069
4,142.01320132013203,227.85808580858085,420.76897689768975
5,56.66379310344828,197.01896551724138,343.7810344827586
6,69.97454545454545,187.59636363636363,267.9709090909091
7,67.57754010695187,185.63368983957218,351.7914438502674
8,64.012,190.02,349.348
9,56.9622641509434,159.79245283018867,236.43867924528303
...
Installation
Download the container image
Pull the latest version with
docker pull labsyspharm/naivestates
Alternatively, you can pull a specific version, which is recommended to ensure reproducibility of your analyses. For example, v1.2.0 can be pulled with
docker pull labsyspharm/naivestates:1.2.0
Examine the tool usage instructions
docker run --rm labsyspharm/naivestates:1.2.0 /app/main.R -h
replacing 1.2.0 with the version you are working with. Omit :1.2.0 entirely if you pulled the latest version above. The flag --rm tells Docker to delete the container instance after it finishes displaying the help message.
Basic usage
At minimum, the tool requires an input file and the list of marker names:
docker run --rm -v /path/to/data/folder:/data labsyspharm/naivestates:1.2.0 \
/app/main.R -i /data/myfile.csv -m aSMA,CD45,panCK
where we can make a distinction between Docker-level arguments:
--rm once again cleans up the container instance after it finishes running the code
-v /path/to/data/folder:/data maps the local folder containing your data to /data inside the container
:1.2.0 specifies the container version that we pulled above
and tool-level arguments:
-i /data/myfile.csv specifies which data file to process
-m aSMA,CD45,panCK specifies the markers of interest (NOTE: comma-delimited, no spaces)
If there is a large number of markers, place their names in a standalone file markers.txt with one marker per line. Ensure that the file lives in /path/to/data/folder/ and modify the Docker call to use the new file:
docker run --rm -v /path/to/data/folder:/data labsyspharm/naivestates:1.2.0 \
/app/main.R -i /data/myfile.csv -m /data/markers.txt
Additional parameters
The following parameters are optional, but may be useful in certain scenarios:
--plots <off|pdf|png> - (default: off) Produces QC plots of individual marker fits and summary UMAP plots in .png or .pdf format.
--id - (default: CellID) Name of the column that contains cell IDs
--log <yes|no|auto> - (default: auto) When a log10 transformation should be applied prior to fitting the data. The tool will do this automatically if it detects large values. Use --log no to force the use of original, non-transformed values instead.
-o - (default: /data) Alternative output directory. (Note that any file written to a directory that wasn't mapped with docker -v will not persist when the container is destroyed.)
--mct - The tool has a basic marker -> cell type (mct) mapping in typemap.csv. More sophisticated mct mappings can be defined by creating a custom-map.csv file with two columns: Marker and State. Ensure that custom-map.csv is in /path/to/data/folder and point the tool at it with --mct (e.g., /app/main.R -i /data/myfile.csv --mct /data/custom-map.csv -m aSMA,CD45,panCK)
Alternative execution environments
Running in a Conda environment
If you are working in a computational environment that doesn't support Docker, the repository provides a Conda-based alternative. Ensure that conda is installed on your system, then 1) clone this repository, 2) instantiate the conda environment and 3) install the tool.
git clone https://github.com/labsyspharm/naivestates.git
cd naivestates
conda env create -f conda.yml
conda activate naivestates
R -s -e "devtools::install_github('labsyspharm/naivestates')"
The tool can now be used as above by running main.R:
./main.R -h
./main.R -i /path/to/datafile.csv -m aSMA,CD45,panCK
Running as an R package
The tool can also be installed as an R package directly from GitHub:
if( !require(devtools) ) install.packages("devtools")
devtools::install_github( "labsyspharm/naivestates" )
Example usage:
library( tidyverse )
library( naivestates )
# Load the original data
X <- read_csv( "datafile.csv" )
# Fit models to channels aSMA, CD45 and panCK
# Specify that cell IDs are in column CellID
GMM <- GMMfit( X, CellID, aSMA, CD45, panCK )
# Plot a fit to one of the markers
plotFit( GMM, "CD45" )
# Write out the results to results.csv
GMMreshape(GMM) %>% write_csv( "results.csv" )
OHSU Wrapper Repo: https://github.com/ohsu-comp-bio/naivestates
]]></help>
<expand macro="citations" />
</tool>