-
Notifications
You must be signed in to change notification settings - Fork 14
/
dmrseq.Rd
172 lines (151 loc) · 7.68 KB
/
dmrseq.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dmrseq.R
\name{dmrseq}
\alias{dmrseq}
\title{Main function for detecting and evaluating significance of DMRs.}
\usage{
dmrseq(bs, testCovariate, adjustCovariate = NULL, cutoff = 0.1,
minNumRegion = 5, smooth = TRUE, bpSpan = 1000, minInSpan = 30,
maxGapSmooth = 2500, maxGap = 1000, verbose = TRUE,
maxPerms = 10, matchCovariate = NULL, BPPARAM = bpparam(),
stat = "stat", block = FALSE, blockSize = 5000, chrsPerChunk = 1)
}
\arguments{
\item{bs}{bsseq object containing the methylation values as well as the
phenotype matrix that contains sample level covariates}
\item{testCovariate}{Character value indicating which variable
(column name) in \code{pData(bs)} to test
for association of methylation levels.
Can alternatively specify an integer value indicating
which of column of
\code{pData(bs)} to use. This is used to construct the
design matrix for the test statistic calculation. To run using a
continuous or categorial covariate with more than two groups, simply pass in
the name of a column in `pData` that contains this covariate. A continuous
covariate is assmued if the data type in the `testCovariate` slot is
continuous, with the exception of if there are only two unique values
(then a two group comparison is carried out).}
\item{adjustCovariate}{an (optional) character value or vector
indicating which variables (column names) in \code{pData(bs)}
will be adjusted for when
testing for the association of methylation value with the
\code{testCovariate}.
Can alternatively specify an
integer value or vector indicating
which of the columns of \code{pData(bs)} to adjust for.
If not NULL (default), then this is also used to
construct the design matrix for the test statistic calculation.}
\item{cutoff}{scalar value that represents the absolute value (or a vector
of two numbers representing a lower and upper bound) for the cutoff of
the single CpG coefficient that is used to discover
candidate regions. Default value is 0.10.}
\item{minNumRegion}{positive integer that represents the minimum number of
CpGs to consider for a candidate region. Default value is 5.
Minimum value is 3.}
\item{smooth}{logical value that indicates whether or not to smooth the
CpG level signal when discovering candidate regions.
Defaults to TRUE.}
\item{bpSpan}{a positive integer that represents the length in basepairs
of the smoothing span window if \code{smooth} is TRUE. Default value is
1000.}
\item{minInSpan}{positive integer that represents the minimum number of
CpGs in a smoothing span window if \code{smooth} is TRUE.
Default value is 30.}
\item{maxGapSmooth}{integer value representing maximum number of basepairs
in between neighboring CpGs to be included in the same
cluster when performing smoothing (should generally be larger than
\code{maxGap})}
\item{maxGap}{integer value representing maximum number of basepairs in
between neighboring CpGs to be included in the same DMR.}
\item{verbose}{logical value that indicates whether progress messages
should be printed to stdout. Defaults value is TRUE.}
\item{maxPerms}{a positive integer that represents the maximum number
of permutations that will be used to generate the global null
distribution of test statistics. Default value is 10.}
\item{matchCovariate}{An (optional) character value
indicating which variable (column name) of \code{pData(bs)}
will be blocked for when
constructing the permutations in order to
test for the association of methylation value with the
\code{testCovariate}, only to be used when \code{testCovariate}
is a two-group factor and the number of permutations possible is less
than 500000.
Alternatively, you can specify an integer value indicating
which column of \code{pData(bs)} to block for.
Blocking means that only permutations with balanced
composition of \code{testCovariate} values will be used (for example if
you have samples from different gender and this is not your covariate of
interest,
it is recommended to use gender as a matching covariate to avoid one
of the permutations testing entirely males versus females; this violates
the null hypothesis and will decrease power).
If not NULL (default), then no blocking is performed.}
\item{BPPARAM}{a \code{BiocParallelParam} object to specify the parallel
backend. The default
option is \code{BiocParallel::bpparam()} which will automatically creates
a cluster appropriate for the operating system.}
\item{stat}{a character vector indicating the name of the column of the
output to use as the region-level test statistic. Default value is 'stat'
which is the region level-statistic designed to be comparable across the
genome.
It is not recommended to change this argument, but it can be done for
experimental purposes. Possible values are: 'L' - the number of loci
in the region, 'area' - the sum of the smoothed loci statistics,
'beta' - the effect size of the region, 'stat' - the test statistic for
the region, or 'avg' - the average smoothed loci statistic.}
\item{block}{logical indicating whether to search for large-scale (low
resolution) blocks of differential methylation (default is FALSE, which
means that local DMRs are desired). If TRUE, the parameters for
\code{bpSpan}, \code{minInSpan}, and \code{maxGapSmooth} should be adjusted
(increased) accordingly. This setting will also merge
candidate regions that (1) are in the same direction and (2) are less than
1kb apart with no covered CpGs separating them. The region-level model used
is also slightly modified - instead of a loci-specific intercept for each
CpG in theregion, the intercept term is modeled as a natural spline with
one interior knot per each 10kb of length (up to 10 interior knots).}
\item{blockSize}{numeric value indicating the minimum number of basepairs
to be considered a block (only used if \code{block}=TRUE). Default is
5000 basepairs.}
\item{chrsPerChunk}{a positive integer value indicating the number of
chromosomes per chunk. The default is 1, meaning that the data will be
looped through one chromosome at a time. When pairing up multiple
chromosomes per chunk, sizes (in terms of numbers of CpGs) will be taken
into consideration to balance the sizes of each chunk.}
}
\value{
a \code{GRanges} object that contains the results of the inference.
The object contains one row for each candidate region, sorted by q-value
and then chromosome. The standard
\code{GRanges} chr, start, and end are included, along with at least
7 metadata
columns, in the following order:
1. L = the number of CpGs contained in the region,
2. area = the sum of the smoothed beta values
3. beta = the coefficient value for the condition difference (there
will be more than one column here if a multi-group comparison
was performed),
4. stat = the test statistic for the condition difference,
5. pval = the permutation p-value for the significance of the test
statistic, and
6. qval = the q-value for the test statistic (adjustment
for multiple comparisons to control false discovery rate).
7. index = an \code{IRanges} containing the indices of the region's
first CpG to last CpG.
}
\description{
Performs a two-step approach that (1) detects candidate regions, and
(2) scores candidate regions with an exchangeable (across the genome)
statistic and evaluates statistical significance using a
permuation test on the pooled null distribution of scores.
}
\examples{
# load example data
data(BS.chr21)
# the covariate of interest is the 'CellType' column of pData(BS.chr21)
testCovariate <- 'CellType'
# run dmrseq on a subset of the chromosome (10K CpGs)
regions <- dmrseq(bs=BS.chr21[240001:250000,],
cutoff = 0.05,
testCovariate=testCovariate)
}
\keyword{inference}