-
Notifications
You must be signed in to change notification settings - Fork 15
/
infercna_tutorial.Rmd
259 lines (187 loc) · 7.59 KB
/
infercna_tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
---
title: "Tutorial with a single-cell dataset"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Tutorial}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(collapse = T, comment = "#>")
options(tibble.print_min = 4L, tibble.print_max = 4L)
knitr::opts_knit$set(warning = FALSE, error = TRUE)
library(infercna)
library(magrittr)
set.seed(1014)
```
## Prepare your data
### Set your genome with `useGenome()`
```{r}
useGenome('hg19')
retrieveGenome()
```
### Load example data with `useData()`
``` {r}
m = useData(mgh125)
dim(m)
range(m)
lengths(refCells)
```
## Infer CNAs and plot a heatmap
### Infer cells' CNA profiles with `infercna()`
``` {r}
args(infercna)
cna = infercna(m = m, refCells = refCells, n = 5000, noise = 0.1, isLog = TRUE, verbose = FALSE)
cnaM = cna[, !colnames(cna) %in% unlist(refCells)]
```
### Plot heatmap with `cnaPlot()`
``` {r}
args(cnaPlot)
obj = cnaPlot(cna = cnaM,
order.cells = TRUE,
subtitle = 'Copy-Number Aberrations in a patient with Glioblastoma')
names(obj)
obj$data
obj$p
```
### Ordering cells in `cnaPlot()`
In cases where you can see the subclones in the heatmap but don't yet have the subclone assignments to direct the cell clustering, you can instead give a set of chromosome and chromosome arms that you would like the clustering to use. In the example above, we probably want the cells belonging to the presumed subclones (chromosome 2, chromosome arm 4p, etc) to be grouped together. We can specify this using the `order.with` argument in `infercna::cnaPlot`:
```{r}
obj = cnaPlot(cna = cnaM,
orderCells = TRUE,
order.with = c(2, "4p", 7, 10, 12, 13, 15),
subtitle = 'Copy-Number Aberrations in a patient with Glioblastoma')
obj$p
```
## Find malignant cells
In the above section, we filtered out the `infercna::refCells` that we know are normal prior to looking at the CNA heatmap. In reality, our reference normal cells will only comprise a subset of all normal cells in the data, and we will need to identify the remaining ones to exclude them from downstream analyses concerning the malignant cells *only*.
infercna defines two parameters, `infercna::cnaCor` and `infercna::cnaSignal` that quantify the *extent* of copy-number aberrations in individual cells and thus help to separate the malignant and non-malignant subsets.
### Compute `cnaSignal()` and `cnaCor()`
#### `cnaSignal()`
CNA signal reflects the overall extent of CNAs. It's defined as the mean of the squares of CNA values across the genome and should therefore accentuate genome-wide differences in CNA profiles between malignant and non-malignant cells.
```{r}
args(cnaSignal)
```
#### `cnaCor()`
CNA corrleation refers to the correlation between the CNA profile of each cell and the average CNA profile of all cells. Best results are seen when correlating cells to the **average CNA profile of cells from the corresponding tumour** and, if possible, **excluding from the average those already classified as non-malignant**.
```{r}
args(cnaCor)
```
#### Visualise CNA Signal and Correlation with `cnaScatterPlot()`
Plotting `cnaCor` against `cnaSignal` for all cells is a good first approximation to the malignant and non-malignant groups. `infercna::cnaScatterPlot` simply integrates these two function calls and their respective parameters into one function, and plots the results.
```{r}
args(cnaScatterPlot)
```
Let's quickly go over the note-worthy arguments:
* `<threshold>` : of which genes to include in calculation; it determines what fraction of genes with top CNA signal to keep.
+ in `cnaSignal()`
+ in `cnaCor()`
* `<cor.threshold>` : inherits from `<threshold>`; can be supplied if the threshold intended is specific to `infercna::cnaCor` calculation.
+ in `cnaCor()`
* `<signal.threshold>` : inherits from `<threshold>`; can be supplied if the threshold intended is specific to `infercna::cnaSignal` calculation.
+ in `cnaSignal()`
* `<samples>` : used to determine tumour-specific CNA profiles; can be one of:
+ names of tumour samples (unique)
+ vector of tumour samples of length equal to the number of columns and ordered correspondingly.
* `<excludeFromAvg>` : names of the (normal) cells to be excluded from the average CNA profile(s).
```{r}
cnaScatterPlot(cna = cna,
signal.threshold = NULL,
main = 'Default')
cnaScatterPlot(cna = cna,
signal.threshold = 0.9,
main = 'threshold: 0.9')
```
```{r}
cnaScatterPlot(cna = cna,
signal.threshold = 0.9,
main = 'signal threshold = 0.9')
cnaScatterPlot(cna = cna,
threshold = 0.9,
main = 'signal and cor threshold: 0.9')
cnaScatterPlot(cna = cna,
threshold = 0.9,
samples = 'MGH125',
excludeFromAvg = unlist(refCells),
main = "threshold: 0.9, samples:'MGH125', excludeFromAvg",
group = unlist(refCells))
```
#### `threshold` finds top genes with `cnaHotspotGenes()`
When the `<threshold>` argument is set (to a value between 0 and 1) as above, `infercna::cnaHotspotGenes()` is called, and does two things:
1. Compute CNA signal values for each *gene*
2. Return genes with values of CNA signal in the top nth quantile
+ *where n is the value in threshold*
```{r}
args(cnaHotspotGenes)
# Number of genes in top 50%
length(cnaHotspotGenes(cna = cnaM, threshold = .5))
# Number of genes in top 10%
length(cnaHotspotGenes(cna = cnaM, threshold = .9))
```
### Find malignant cells with `findMalignant()`
`infercna::findMalignant()` provides a first attempt at identifying malignant (and non-malignant) cells in the data. It fits bimodal distributions to the cells' `cnaSignal()` and `cnaCor()` values, respectively. If two modes are found for each parameter, these are cross-checked and -- if compatible --, joined and returned. If any of these steps fail, the return value is `FALSE`.
```{r}
args(findMalignant)
```
```{r, warning = FALSE, results = 'hide', message = FALSE}
Modes = findMalignant(cna, signal.threshold = .9, samples = 'MGH125')
```
```{r}
lengths(Modes)
scrabble::jaccard(Modes, refCells)
names(Modes) = c('nonmalignant', 'malignant')
```
## Find genetic subclones
```{r}
args(findClones)
cnaCancer = cna[, Modes$malignant]
L = Map(filterGenes, value = c(2, '4p', 15), attribute = c('chr', 'arm', 'chr'), MoreArgs = list(cnaCancer))
```
```{r, warning = FALSE, message = FALSE, results = 'hide'}
out = sapply(L, fitBimodal, assign = T)
```
```{r}
lengths(out)
```
```{r, warning = FALSE, message = FALSE, results = 'hide'}
out[[3]] = fitBimodal(L[[3]], assign = T, bySampling = T, nsamp = 300)
```
```{r}
lengths(out)
sapply(out, lengths)
expandToClones(out, greaterThan = 3) %>% lengths %>% as.matrix
```
## Additional featuers
* `splitGenes()`
* `filterGenes()`
* `orderGenes()`
* `genesOn()`
* `modality()`
* `fitBimodal()`
### Split, Filter and Order vectors or matrices by Chromosome (Arms)
* `splitGenes()`
```{r}
genesByChr = splitGenes(rownames(m), by = 'chr')
lengths(genesByChr)
datByChr = splitGenes(m, by = 'arm')
names(datByChr)
```
* `filterGenes()`
```{r}
filterGenes(m, '4p', 'arm') %>% nrow
filterGenes(m, '4p', 'arm', out = TRUE) %>% nrow
filterGenes(m, 2, 'chr') %>% nrow
filterGenes(m, c(2, '4p'), c('chr', 'arm')) %>% nrow
```
* `orderGenes()`
```{r}
# you can provide a matrix or ch
all(orderGenes(rownames(m)) == rownames(orderGenes(m)))
```
### Find genes with `genesOn()`
```{r}
chr7 = genesOn(7, 'chr')
# First 5 genes on chromosome 7:
head(chr7, 5)
'EGFR' %in% chr7
```