-
Notifications
You must be signed in to change notification settings - Fork 1
/
User-Guide-tugHall.Rmd
365 lines (240 loc) · 15.3 KB
/
User-Guide-tugHall.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
---
title: "tugHall version 1.1: USER-GUIDE-tugHall"
#author: "Iurii Nagornov and Mamoru Kato"
## date: "`r Sys.Date()`"
bibliography: tugHall/Code/CanSim.bib
output:
rmarkdown::html_vignette:
citation_package: natbib
vignette: >
%\VignetteIndexEntry{Vignette Title}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
fig.path='tugHall/Figures/',
echo=FALSE,
warning=FALSE,
message=FALSE,
collapse = TRUE,
comment = "#>"
)
Sys.setenv("TimeZone" = "Japan")
```
## Requirements for tugHall simulation:
R version **3.3** or later
libraries: **stringr**
Note that the program has two different procedures in general: the first is the simulation and the second is the analysis of the simulation results.
Please, pay attention that the requirements for these procedures are **different**.
This User-Guide pertains to the **simulation procedure** alone.
<br />
# Table of Contents
1. [Quick start guide](#quick)
2. [Structure of directories](#directories)
3. [Inputs](#inputs)
4. [Outputs](#outputs)
5. [How to run](#run)
<a name="quick"></a>
# 1. Quick start guide
The simplest way to run tugHall:
- Save the **/tugHall/** directory to the working folder;
- Run **tugHall.R**.
The code has its initial input parameters and input files in the **/Input/** folder. After the simulation the user can see results of the simulation (please, see **User-Guide-Analysis** for details) in the dialogue box, which will save to the **/Output/** and **/Figures/** folders. Note that the analysis procedure requires additional libraries and a higher version of R - 3.6.0.
<a name="directories"></a>
# 2. Structure of directories
### Root directory:
**User-Guide-tugHall.Rmd** - user guide for simulation in the Rmd format.
**User-Guide-tugHall.html** - user guide for simulation in the html format.
**User-Guide-tugHall.pdf** - user guide for simulation in the pdf format.
**User-Guide-Analysis.Rmd** - user guide for analysis and report generation in the Rmd format.
**User-Guide-Analysis.html** - user guide analysis and report generation in the html format.
**User-Guide-Analysis.html** - user guide analysis and report generation in the pdf format.
dir **/tugHall/** - directory that contains the program.
<br />
### **/tugHall/** directory:
**tugHall.R** - program to run a simulation and define the parameters.
dir **/Code/** - folder with the code and the function library.
dir **/Input/** - folder with the input files.
dir **/Output/** - folder with the output files.
dir **/Figures/** - folder with the plot figures.
<br />
### **/Code/** directory:
**CanSim.bib, pic_lic.jpg** - files necessary files for the user guide.
**tugHall_functions.R** - file that contains the functions for the simulation / core of program.
**Analysis.R** - file to analyze the results of a simulation and plot figures.
**Functions.R** - file with the functions for the analysis of results.
<br />
### **/Input/** directory:
**cellinit.txt** - file with a list of initial cells with/without destroyed genes.
**gene_cds2.txt** - file with hallmark variables and weights.
<br />
### **/Output/** directory:
**cellout.txt** - file with simulation output.
**geneout.txt** - file with information about hallmark variables and the weights.
**log.txt** - file with information about all parameters.
**Weights.txt** - file with information about weights between hallmarks and genes.
**Order_of_dysfunction.txt** - see **USER-GUIDE-Analysis**.
**VAF.txt** - see **USER-GUIDE-Analysis**.
<br />
### **/Figures/** directory
In the **/Figures/** directory there are figures in \*.jpg format, which appear after the analysis of the simulation results. See **USER-GUIDE-Analysis**. :
<a name="inputs"></a>
# 3. Inputs
## Input of hallmark variables and gene weights
The file **tugHall/Input/gene_cds2.txt** defines the halmark variables and weights (only first 10 lines are presented here):
```{r, echo=FALSE, results='asis'}
x <- read.csv(file = "tugHall/Input/gene_cds2.txt",header = FALSE, sep = "\t", nrows = 10)
knitr::kable(x, col.names = c("Genes","length CDS","Hallmark","Suppressor or Oncogene","Weights"), align = "c", caption = "**Table 1. Input file for genes.** Example of input file for hallmarks and weights in the file _**tugHall/Input/gene_cds2.txt**_.")
```
1. **Genes** - name of gene, e.g., TP53, KRAS. The names must be typed carefully. The program detects all the unique gene names.
2. **length CDS** - length of CDS for each gene, e.g., 2724, 10804.
3. **Hallmark** - hallmark name, e.g., "apoptosis". Available names:
- apoptosis
- immortalization
- growth
- anti-growth
- angiogenesis
- invasion
Note that "growth" and "anti-growth" are related to the single hallmark "growth/anti-growth".
Note that "invasion" is related to "invasion/metastasis" hallmark.
4. **Suppressor or oncogene.** - Distinction of oncogene/suppressor:
- o: oncogene
- s: suppressor
- ?: unknown (will be randomly assigned)
5. **Weights** - Hallmark weights for genes, e.g., 0.333 and 0.5. For each hallmark, the program checks the summation of all the weights. If it is not equal to 1, then the program normalizes it to reach unity. Note that, if the gene belongs to more than one hallmark type, it must be separated into separate lines.
---
After that, the program defines all the weights, and all the **unknown weights** are set equal to 0. Program performs normalization so that the sum of all weights should be equal to 1 for each column. The **tugHall/Output/Weights.txt** file saves these final input weights for the simulation. Only the first 10 lines are presented here:
```{r, echo=FALSE, results='asis'}
x <- read.csv(file = "tugHall/Output/Weights.txt", header = TRUE, sep = "\t", nrows = 10)
knitr::kable(x, col.names = c("Genes", "Apoptosis, $H_a$", "Angiogenesis, $H_b$", "Growth / Anti-growth, $H_d$", "Immortalization, $H_i$",
"Invasion / Metastasis, $H_{im}$"), align = "c", caption = "**Table 2. Weights for hallmarks.** Example of weights for hallmarks and genes from _**tugHall/Output/Weights.txt**_ file. Unknown values equal 0.")
```
1. **Genes** - name of genes.
2. **Apoptosis, $H_a$** - weights of hallmark "Apoptosis".
3. **Angiogenesis, $H_b$** - weights of hallmark "Angiogenesis".
4. **Growth / Anti-growth, $H_d$** - weights of hallmark "Growth / Anti-growth".
5. **Immortalization, $H_i$** - weights of hallmark "Immortalization".
6. **Invasion / Metastasis, $H_{im}$** - weights of hallmark "Invasion / Metastasis".
---
## Input the probabilities
The input of the probabilities used in the model is possible in the code for parameter value settings, **"tugHall.R"**:
| Probability variable and value | Description |
|:---|:---|
| **E0 <- 2E-4** | Parameter $E0$ in the division probability |
| **F0 <- 1E0** | Parameter $F0$ in the division probability |
| **m <- 1E-6** | Mutation probability $m'$ |
| **uo <- 0.5** | Oncogene mutation probability $u_o$ |
| **us <- 0.5** | Suppressor mutation probability $u_s$ |
| **s <- 10** | Parameter in the sigmoid function $s$ |
| **k <- 0.1** | Environmental death probability $k'$ |
| <img width=250/> | <img width=270/> |
---
## Filename input
Also in the code **"tugHall.R"** user can define names of input and output files, and additional parameters of simulation:
| Variables and file names | Description |
|:---|:---|
| **genefile <- 'gene_cds2.txt'** | File with information about weights |
| **cellfile <- 'cellinit.txt'** | Initial Cells |
| **geneoutfile <- 'geneout.txt'** | Gene Out file with hallmarks |
| **celloutfile <- 'cellout.txt'** | Output information of simulation |
| **logoutfile <- 'log.txt'** | Log file to save the input information of simulation |
| **censore_n <- 30000 ** | Max cell number where the program forcibly stops |
| **censore_t <- 200** | Max time where the program forcibly stops |
| <img width=200/> | <img width=350/> |
---
## Input of the initial cells
The initial states of cells are defined in **"tugHall/Input/cellinit.txt"** file:
| Cell ID | List of mutated genes |
|:---|:---|
| 1 | "" |
| 2 | "APC" |
| 3 | "APC, KRAS" |
| 4 | "KRAS" |
| 5 | "TP53, KRAS" |
| ... | ... |
| 1000 | "" |
| <img width=50/> | <img width=150/> |
1. **Cell ID** - ID of cell, e.g., 1, 324.
2. **List of mutated genes** - list of mutated genes for each cell, e.g. "", "KRAS, APC". The values are comma separated. The double quotes ("") indicate a cell without mutations.
---
<a name="outputs"></a>
# 4. Outputs
The output data consists of several files after the simulation. The "log.txt" and "geneout.txt" files contain the input information about variables and gene names. "Weights.txt" has information about the weights of genes for hallmarks (Please refer the section ["Inputs"](#inputs)). "Cellout.txt" has information about the dynamics of cell evolution and all variables.
## "log.txt" file
The file **"log.txt"** contains information about probabilities and file names. These variables are explained in the ["Inputs"](#inputs).
```{r, echo=FALSE, results='asis'}
x <- read.csv(file = "tugHall/Output/log.txt",header = FALSE, sep = "\t", nrows = 20, col.names = c("Variable","Value"))
x[is.na(x)] <- ""
knitr::kable(x, align = "c", caption = "**Table 3. log.txt file.** Example of log.txt file.")
```
## "geneout.txt" file
The file **"geneout.txt"** contains input information about the weights that connect the hallmarks and genes, which are defined by the user. These variables also are explained in the ["Inputs"](#inputs).
```{r, echo=FALSE, results='asis'}
x <- read.csv(file = "tugHall/Output/geneout.txt",header = FALSE, sep = "\t", nrows = 10, col.names = c("Gene_name","Hallmark_name", "Weight", "Suppressor_or_oncogene"))
x[is.na(x)] <- ""
knitr::kable(x, align = "c", caption = "**Table 4. geneout.txt file.** Given below is an example of the geneout.txt file.")
```
## "cellout.txt" file
The file **"cellout.txt"** contains the results of the simulation and includes the evolution data: all the output data for each cell at each timestep (only the first 10 lines are presented):
```{r, echo=FALSE, results='asis'}
x <- read.csv(file = "tugHall/Output/cellout.txt",header = TRUE, sep = "\t", nrows = 10)
x[is.na(x)] <- ""
knitr::kable(x, align = "c", caption = "**Table 5. Output data.** Example of output data for all cells. The names of columns are related to the description in the Tables 1,2 and *USER-GUIDE-Analysis*'s figures.")
```
1. **Time** - the time step, e.g., 1, 50.
2. **AvgOrIndx** - "avg" or "index": "avg" is for a line with averaged values across different (index) lines at the same time step; "index" shows the cell's index at the current time step, e.g., avg, 4,7.
3. **ID** - the unique ID of cell, e.g., 1, 50.
4. **ParentID.Birthday** - the first number is the parent ID, the second number is the birthday time step, e.g., 0:0, 45:5.
5. **c** - the counter of cell divisions for the cell.
6. **d** - the probability of division for the cell, e.g., 0.1, 0.8.
7. **i** - the probability of immortalization for the cell, e.g., 0.1, 0.8.
8. **im** - the probability of invasion/metastasis for the cell, e.g., 0.1, 0.8.
9. **a** - the probability of apoptosis for the cell, e.g., 0.1, 0.8.
10. **k** - the probability of death due to the environment, e.g., 0.1, 0.8.
11. **E** - the E coefficient for the function of the division probability, e.g., 10^4, 10^5.
12. **N** - the number of primary tumor cells at this time step, e.g., 134, 5432.
13. **Nmax** - the theoretically maximal number of primary tumor cells, e.g., 10000, 5000.
14. **M** - the number of metastasis cells at this time step, e.g., 16, 15439.
15. **Ha** - the value of the hallmark "Apoptosis" for the cell, e.g., 0.1, 0.4444.
16. **Him** - the value of the hallmark "Invasion / Metastasis" for the cell, e.g., 0.1, 0.4444.
17. **Hi** - the value of the hallmark "Immortalization" for the cell, e.g., 0.1, 0.4444.
18. **Hd** - the value of the hallmark "Growth / Anti-growth" for the cell, e.g., 0.1, 0.4444 .
19. **Hb** - the value of the hallmark "Angiogenesis" for the cell, e.g., 0.1, 0.4444 .
20. **type** - the type of the cell: "0" is primary tumor cell, "1" is the metastatic cell, e.g., 0, 1.
21. **mut_den** - the density of mutations (tumor mutation burden) for the cell, e.g., 0, 0.32.
The columns from 22 to 25 are related to names in the form **PosDriver. _gene name_**, where **_gene name_** is related to user defined genes.
The number of columns equals the number of the genes.
These columns show the position(s) of driver mutation(s) in a gene: the first number is the mutational site on the gene and the second number is the time step of the mutation, e.g., 3493:4, 4531:34.
22. **PosDriver.(Gene_1="APC")** - for the first gene.
23. **PosDriver.(Gene_2="KRAS")** - for the second gene.
24. **PosDriver.(Gene_...)** - ...
25. **PosDriver.(Gene_last="PIK3CA")** - for the last gene.
The columns from 26 to 29 are related to names in the form **PosPassngr. _gene name_**, where **_gene name_** is related to user defined genes.
The number of columns equals the number of the genes.
These columns show the position(s) of **passenger** mutation(s) in a gene: the first number is the mutational site on the gene and the second number is the time step of the mutation, e.g., 8952:43, 531:4.
26. **PosPassngr.(Gene_1="APC")** - for the first gene.
27. **PosPassngr.(Gene_2="KRAS")** - for the second gene.
28. **PosPassngr.(Gene_...)** - ...
29. **PosPassngr.(Gene_last="PIK3CA")** - for the last gene.
<br />
30. **Clone.number** - the clone number is calculated from the binary code of driver mutations. If a gene is mutated, then its binary code value is 1, and if not, it is 0. For example, the cells have only 4 genes in simulation, so "Clone.number" can have binary numbers from 0000 to 1111, which is related decimal numbers from 0 to 15, e.g., 15, 4.
31. **Passengers.Clone.number** - same as for "Clone.number", but for passenger mutations, e.g., 15, 4.
32. **Mix.Clone.number** - same as for "Clone.number", but for passenger and driver mutations together. In this case, the length of the binary number is two times larger than for the driver case, e.g., 35, 16.
---
<a name="run"></a>
# 5. How to run
In order to make the simulation, please follow the procedure:
1. Copy **/tugHall/** directory into the working directory.
2. CD to the **/tugHall/** directory.
3. Run the **tugHall.R** file, using the command line like
`R --vanilla < tugHall.R`
or using the line by line procedure in **R Studio**. In this case we have:
- **`load library(stringr)`** and **`source(file = "Code/tugHall_functions.R")`**;
- create the Output and Figures directories, if needed;
- define the simulation parameters;
- make the input file for initial cells, if needed;
- run the *model()* function to simulate;
- run **`source("Code/Analysis.R")`** in order to analyze the results and plot the figures in the dialogue box (see **User-Guide-Analysis**).
4. To obtain analysis reports of the simulation, please refer to **User-Guide-Analysis.RMD**.
In **User-Guide-Analysis.RMD**, commands are embeded to include files under **Output/** and **Figure/**. So, after analysis with tugHall, you can generate analysis reports automatically from **User-Guide-Analysis.RMD**. For more details, please refer to "Writing reproducible reports in R" on the github (https://nicercode.github.io/guides/reports/).