man/dmrseq.Rd

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dmrseq.R
\name{dmrseq}
\alias{dmrseq}
\title{Main function for detecting and evaluating significance of DMRs.}
\usage{
dmrseq(bs, testCovariate, adjustCovariate = NULL, cutoff = 0.1,
  minNumRegion = 5, smooth = TRUE, bpSpan = 1000, minInSpan = 30,
  maxGapSmooth = 2500, maxGap = 1000, verbose = TRUE,
  maxPerms = 10, matchCovariate = NULL, BPPARAM = bpparam(),
  stat = "stat", block = FALSE, blockSize = 5000, chrsPerChunk = 1)
}
\arguments{
\item{bs}{bsseq object containing the methylation values as well as the 
phenotype matrix that contains sample level covariates}

\item{testCovariate}{Character value indicating which variable
(column name) in \code{pData(bs)} to test
for association of methylation levels. 
Can alternatively specify an integer value indicating
which of column of
\code{pData(bs)} to use. This is used to construct the 
design matrix for the test statistic calculation. To run using a 
continuous or categorial covariate with more than two groups, simply pass in
the name of a column in `pData` that contains this covariate. A continuous
covariate is assmued if the data type in the `testCovariate` slot is 
continuous, with the exception of if there are only two unique values 
(then a two group comparison is carried out).}

\item{adjustCovariate}{an (optional) character value or vector 
indicating which variables (column names) in \code{pData(bs)} 
will be adjusted for when 
 testing for the association of methylation value with the 
 \code{testCovariate}. 
Can alternatively specify an
integer value or vector indicating
 which of the columns of \code{pData(bs)} to adjust for.
 If not NULL (default), then this is also used to 
 construct the design matrix for the test statistic calculation.}

\item{cutoff}{scalar value that represents the absolute value (or a vector 
of two numbers representing a lower and upper bound) for the cutoff of 
the single CpG coefficient that is used to discover 
candidate regions. Default value is 0.10.}

\item{minNumRegion}{positive integer that represents the minimum number of
CpGs to consider for a candidate region. Default value is 5. 
Minimum value is 3.}

\item{smooth}{logical value that indicates whether or not to smooth the 
CpG level signal when discovering candidate regions.
Defaults to TRUE.}

\item{bpSpan}{a positive integer that represents the length in basepairs
of the smoothing span window if \code{smooth} is TRUE.  Default value is 
1000.}

\item{minInSpan}{positive integer that represents the minimum number of
CpGs in a smoothing span window if \code{smooth} is TRUE.  
Default value is 30.}

\item{maxGapSmooth}{integer value representing maximum number of basepairs
in between neighboring CpGs to be included in the same 
cluster when performing smoothing (should generally be larger than
\code{maxGap})}

\item{maxGap}{integer value representing maximum number of basepairs in 
between neighboring CpGs to be included in the same DMR.}

\item{verbose}{logical value that indicates whether progress messages
should be printed to stdout. Defaults value is TRUE.}

\item{maxPerms}{a positive integer that represents the maximum number 
of permutations that will be used to generate the global null 
distribution of test statistics.  Default value is 10.}

\item{matchCovariate}{An (optional) character value 
indicating which variable (column name) of \code{pData(bs)} 
will be blocked for when 
constructing the permutations in order to
test for the association of methylation value with the 
\code{testCovariate}, only to be used when \code{testCovariate}
is a two-group factor and the number of permutations possible is less
than 500000.
Alternatively, you can specify an integer value indicating
which column of \code{pData(bs)} to block for.
Blocking means that only permutations with balanced
composition of \code{testCovariate} values will be used (for example if
you have samples from different gender and this is not your covariate of
interest, 
it is recommended to use gender as a matching covariate to avoid one 
of the permutations testing entirely males versus females; this violates
the null hypothesis and will decrease power).
If not NULL (default), then no blocking is performed.}

\item{BPPARAM}{a \code{BiocParallelParam} object to specify the parallel 
backend. The default 
option is \code{BiocParallel::bpparam()} which will automatically creates
a cluster appropriate for the operating system.}

\item{stat}{a character vector indicating the name of the column of the 
output to use as the region-level test statistic. Default value is 'stat'
which is the region level-statistic designed to be comparable across the
genome.
It is not recommended to change this argument, but it can be done for
experimental purposes. Possible values are: 'L' - the number of loci
in the region, 'area' - the sum of the smoothed loci statistics,
'beta' - the effect size of the region, 'stat' - the test statistic for
the region, or 'avg' - the average smoothed loci statistic.}

\item{block}{logical indicating whether to search for large-scale (low
resolution) blocks of differential methylation (default is FALSE, which
means that local DMRs are desired). If TRUE, the parameters for 
\code{bpSpan}, \code{minInSpan}, and \code{maxGapSmooth} should be adjusted
(increased) accordingly. This setting will also merge
candidate regions that (1) are in the same direction and (2) are less than 
1kb apart with no covered CpGs separating them. The region-level model used 
is also slightly modified - instead of a loci-specific intercept for each 
CpG in theregion, the intercept term is modeled as a natural spline with 
one interior knot per each 10kb of length (up to 10 interior knots).}

\item{blockSize}{numeric value indicating the minimum number of basepairs 
to be considered a block (only used if \code{block}=TRUE). Default is 
5000 basepairs.}

\item{chrsPerChunk}{a positive integer value indicating the number of 
chromosomes per chunk. The default is 1, meaning that the data will be 
looped through one chromosome at a time. When pairing up multiple 
chromosomes per chunk, sizes (in terms of numbers of CpGs) will be taken
into consideration to balance the sizes of each chunk.}
}
\value{
a \code{GRanges} object that contains the results of the inference. 
   The object contains one row for each candidate region, sorted by q-value
   and then chromosome. The standard 
   \code{GRanges} chr, start, and end are included, along with at least
   7 metadata
   columns, in the following order: 
   1. L = the number of CpGs contained in the region,
   2. area = the sum of the smoothed beta values
   3. beta = the coefficient value for the condition difference (there 
      will be more than one column here if a multi-group comparison
      was performed),
   4. stat = the test statistic for the condition difference,
   5. pval = the permutation p-value for the significance of the test
   statistic, and 
   6. qval = the q-value for the test statistic (adjustment
   for multiple comparisons to control false discovery rate).
   7. index = an \code{IRanges} containing the indices of the region's 
      first CpG to last CpG.
}
\description{
Performs a two-step approach that (1) detects candidate regions, and
(2) scores candidate regions with an exchangeable (across the genome)
statistic and evaluates statistical significance using a 
permuation test on the pooled null distribution of scores.
}
\examples{

# load example data 
data(BS.chr21)

# the covariate of interest is the 'CellType' column of pData(BS.chr21)
testCovariate <- 'CellType'

# run dmrseq on a subset of the chromosome (10K CpGs)
regions <- dmrseq(bs=BS.chr21[240001:250000,],
                 cutoff = 0.05,
                 testCovariate=testCovariate)

}
\keyword{inference}