forked from franciscolobo/KOMODO2
/
CALANGO_Parameters.Rmd
330 lines (178 loc) · 7.67 KB
/
CALANGO_Parameters.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
---
title: "CALANGO Parameters"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{CALANGO Parameters}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
<img src="https://github.com/fcampelo/CALANGO/raw/master/inst/images/CALANGO_LOGO.svg" height="150" alt="CALANGO logo. Drawn by Brazilian artist Berze - https://www.facebook.com/berzearte">
This document lists the input parameters expected / accepted in the CALANGO
definition files (or, alternatively, in the `defs` list).
***
# General parameters
### annotation.files.dir
**Type**: character string
**Description**: path to the directory where annotation files are located
**Required**: YES
**Default**: none
### output.dir
**Type**: character string
**Description**: path to the output directory where results should be saved
**Required**: YES
**Default**: none
### dataset.info
**Type**: character string
**Description**: path to a file containing the genome
metadata. It should contain _at least_, for each genome:
(1) path for annotation data; (2) phenotype data (numeric);
(3) normalization data (numeric)
It must be a _tab-separated value_ file with no column headers.
**Required**: YES
**Default**: none
### x.column
**Type**: integer/numeric
**Description**: index of the column from the file specified in `dataset.info` containing the phenotype data, which will be used to sort the genomes and
find annotation terms associated to that phenotype.
**Required**: YES
**Default**: none
### short.name.column
**Type**: integer/numeric
**Description**: index of the column from the file specified in `dataset.info` containing the short names for species/lineages to be used when plotting data.
**Required**: YES
**Default**: none
### group.column
**Type**: integer/numeric
**Description**: index of the column from the file specified in `dataset.info` containing the group to be used for coloring the heatmaps
**Required**: YES
**Default**: none
### ontology
**Type**: character string.
**Description**: which dictionary data type to use? Accepts _"GO"_ or
_"other"_
**Required**: YES
**Default**: none
### dict.path
**Type**: character string
**Description**: path to dictionary file (a two-column _tab-separated value_
file containing annotation IDs and their descriptions). Not needed if
`ontology = "GO"`.
**Required**: NO
**Default**: none
### column
**Type**: character string
**Description**: the _name_ of the column in the annotation file that should be used.
**Required**: YES
**Default**: none
### denominator.column
**Type**: integer/numeric
**Description**: index of the column from the file specified in `dataset.info` containing the normalization data.
**Required**: NO
**Default**: none
### tree.path
**Type**: character string
**Description**: path to the tree file.
**Required**: YES
**Default**: none
### tree.type
**Type**: character string
**Description**: tree file type. Accepts _"nexus"_ or _"newick"_.
Case-sensitive.
**Required**: YES
**Default**: none
### type
**Type**: character string
**Description**: type of analysis to perform. Currently accepts only
_"correlation"_
**Required**: YES
**Default**: none
### MHT.method
**Type**: character string
**Description**: type of multiple hypothesis testing correction to apply.
Accepts all methods listed in `stats::p.adjust.methods`.
**Required**: NO
**Default**: _"BH"_
### cores
**Type**: integer/numeric
**Description**: Number of cores to use. Must be a positive integer.
**Required**: NO
**Default**: 1
***
# Cutoff values
Cutoffs are used to regulate how much graphical output is produced by CALANGO. The _tab-separated value_ files that are generated at the end of the analysis
(and saved in the _output.dir_) will always contain all, unfiltered results.
**q-value cutoffs** are used for correlation and phylogeny-aware linear models. Only entries with q-values _smaller_ than these cutoffs will be shown.
### spearman.qvalue.cutoff
**Type**: numeric between 0 and 1
**Required**: NO
**Default**: 1
### pearson.qvalue.cutoff
**Type**: numeric between 0 and 1
**Required**: NO
**Default**: 1
### kendall.qvalue.cutoff
**Type**: numeric between 0 and 1
**Required**: NO
**Default**: 1
### linear_model.qvalue.cutoff
**Type**: numeric between 0 and 1
**Required**: NO
**Default**: 1
***
**correlation cutoffs** are used to establish thresholds of positive/negative correlation values for the graphical output. **Important**: these parameters are a bit counter-intuitive. Please check the example below for clarity.
### spearman.cor.lower.cutoff / spearman.cor.upper.cutoff
**Type**: numeric values between 0 and 1
**Description**: Thresholds for Spearman correlation values. The selection criteria is:
(Spearman correlation < lower.cutoff) OR (Spearman correlation > upper.cutoff)
**Required**: NO
**Defaults**: `spearman.cor.upper.cutoff = -1`;
`spearman.cor.lower.cutoff = 1` (i.e., no filtering)
**Example 1**: If you set `spearman.cor.upper.cutoff = 0.8` and
`spearman.cor.lower.cutoff = -0.8`, only pairs with Spearman correlation values smaller than `-0.8` OR greater than `0.8` will be shown.
**Example 2**: If you set `spearman.cor.upper.cutoff = 0` and
`spearman.cor.lower.cutoff = -1`, pairs with Spearman correlation values smaller than `-1` OR greater than `0` will be shown. Since the Spearman correlation cannot be smaller than `-1`, this means that only positively correlated pairs will be shown.
**Example 3**: If you set any values such that `spearman.cor.upper.cutoff < spearman.cor.lower.cutoff`, all pairs are shown (no filtering is performed).
### pearson.cor.lower.cutoff / pearson.cor.upper.cutoff
**Type**: numeric values between 0 and 1
**Description**: Thresholds for Pearson correlation values. The selection criteria is:
(Pearson correlation < lower.cutoff) OR (Pearson correlation > upper.cutoff)
**Required**: NO
**Defaults**: `pearson.cor.upper.cutoff = -1`;
`pearson.cor.lower.cutoff = 1` (i.e., no filtering)
### kendall.cor.lower.cutoff / kendall.cor.upper.cutoff
**Type**: numeric values between 0 and 1
**Description**: Thresholds for Kendall correlation values. The selection criteria is:
(Kendall correlation < lower.cutoff) OR (Kendall correlation > upper.cutoff)
**Required**: NO
**Defaults**: `kendall.cor.upper.cutoff = -1`;
`kendall.cor.lower.cutoff = 1` (i.e., no filtering)
**standard deviation and coefficient of variation cutoffs** (only values greater than cutoff will be shown)
### sd.cutoff
**Type**: non-negative numeric value
**Required**: NO
**Default**: 0
### cv.cutoff
**Type**: non-negative numeric value
**Required**: NO
**Default**: 0
**sum of annotation terms cutoff** (only values greater than cutoff will be shown)
### annotation_size.cutoff
**Type**: non-negative integer/numeric value
**Required**: NO
**Default**: 0
**prevalence and heterogeneity cutoffs** (only values greater than cutoff will be shown). **Prevalence** is defined as the percentage of lineages where annotation term was observed at least once. **Heterogeneity** is defined as the percentage of lineages where annotation term count is different from the median.
### prevalence.cutoff
**Type**: numeric value between 0 and 1
**Required**: NO
**Default**: 0
### heterogeneity.cutoff
**Type**: numeric value between 0 and 1
**Required**: NO
**Default**: 0
***
# Advanced configurations
### raw_data_sd_filter
**Type**: character string. Accepts _"TRUE"_ or _"FALSE"_
**Description**: If _"TRUE"_ all annotation terms where standard deviation for annotation raw values before normalization is zero are removed. This filter is used to remove the (quite common) bias when QPAL (phenotype) and normalizing factors are strongly associated by chance.
**Required**: YES
**Default**: "TRUE"