-
Notifications
You must be signed in to change notification settings - Fork 1
/
rnaseq.Rmd
86 lines (60 loc) · 2.29 KB
/
rnaseq.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
title: "Analyzing RNA-seq data"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Analyzing RNA-seq data}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = '#>',
message = FALSE,
fig.align = 'center',
fig.retina = 2,
eval = FALSE)
foreach::registerDoSEQ()
```
## Introduction
Here we show two options for using `limorhyde2` to analyze RNA-seq data: [limma-voom](https://bioconductor.org/packages/release/bioc/html/limma.html) and [DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html). The two approaches give very similar results.
This vignette assumes you are starting with the output of [tximport](https://bioconductor.org/packages/release/bioc/html/tximport.html).
## Load the data
You will need two objects:
1. `txi`, a list from `tximport`
2. `metadata`, a `data.frame` having one row per sample
The rows in `metadata` must correspond to the columns of the elements of `txi`.
```{r}
library('limorhyde2')
# txi = ?
# metadata = ?
```
## Filter out lowly expressed genes
There are many reasonable strategies to do this, here is one.
```{r}
keep = rowSums(edgeR::cpm(txi$counts) >= 0.5) / ncol(txi$counts) >= 0.75
txiKeep = txi
for (name in c('counts', 'length')) {
txiKeep[[name]] = txi[[name]][keep, ]}
```
## Fill in counts for sample-gene pairs having zero counts
This avoids unrealistically low log2 CPM values and thus artificially inflated effect size estimates.
```{r}
for (i in seq_len(nrow(txiKeep$counts))) {
idx = txiKeep$counts[i, ] > 0
txiKeep$counts[i, !idx] = min(txiKeep$counts[i, idx])}
```
## Option 1: limma-voom
```{r}
y = edgeR::DGEList(txiKeep$counts)
y = edgeR::calcNormFactors(y)
fit = getModelFit(y, metadata, ..., method = 'voom') # replace '...' as appropriate for your data
```
## Option 2: DESeq2
The second and third arguments to `DESeqDataSetFromTxImport()` are required, but will not be used by `limorhyde2`.
```{r}
y = DESeq2::DESeqDataSetFromTximport(txiKeep, metadata, ~1)
fit = getModelFit(y, metadata, ..., method = 'deseq2') # replace '...' as appropriate for your data
```
## Continue using `limorhyde2`
Regardless of which option you choose, the next steps are the same: `getPosteriorFit()`, `getRhythmStats()`, etc.