# esterpantaleo/multiseq

Switch branches/tags
Nothing to show
Fetching contributors…
Cannot retrieve contributors at this time
174 lines (127 sloc) 8.04 KB

# An Introduction to the multiseq package

``````library(knitr)
opts_chunk\$set(out.extra='style="display:block; margin: auto"', fig.align="center")
``````

The multiseq package is an R package for multiscale sequence analysis and is ongoing work in the Stephens lab at the University of Chicago. Multiseq has two main modes of operation: smoothing and effect estimation....

In this vignette you will learn how to install the package, run multiseq in its two different modes of operation, and visualize its output (and input data) either in R or in the UCSC Genome Browser as a Track Hub. In the Genome Browser you will be able to compare your track with any of the available tracks (already uploaded) and to fine-tune the display (e.g. zoom and scroll the tracks display, highlight a region, change the order of the displayed tracks, etc).

``````
``````

## Running `multiseq`

In this section you will learn how to use `multiseq` to smooth a signal and to estimate an effect given a covariate `g`.

First load the package then load the example data included in the package.

``````    library(multiseq)
#load example data - R object
data(example1, package="multiseq")
x          <- dat\$x
g          <- dat\$g
``````

Then select a subset of the signal and apply `multiseq` to perform smoothing.

``````    #smoothing
par(mfrow=c(2,1), oma = c(5,4,0,0) + 0.1, mar = c(0,0,1,1) + 0.1)
for (i in which(g==1)) plot(log(x[i,]), type="l", xlab="Position", ylab="log(x)")
unique(sort(log(x[1,])))
#plot baseline mean +/- 2 posterior standard deviations
invisible(dev.off())
plot(res0, fra=2, what="baseline")
``````

Now, given a covariate `g`, you can estimate an effect:

``````    #estimating an effect
#plot estimated effect mean +/- 2 posterior standard deviations
plot(res, fra=2)

#print intervals where `multiseq` found a strong effect (zero is outside of +/- fra posterior standard deviations
res\$intervals <- get.effect.intervals(res, fra=2)
res\$intervals
``````

Function `get.effect.intervals` outputs intervals where `multiseq` found strong effect (zero is outside of +/- a specified multiple of the posterior standard deviations). Output interval is in bed format (start is 0-based, end is 1-based).

## Running `multiseq` on sequencing data

Special tools are required to handle next generation sequencing data because of their high throughput nature. Multiseq's function `get.counts` uses `samtools`, the UCSC tools, or the R package rhdf5 to read data in bam, bigWig, or hdf5 format.

To run on sequencing data, multiseq requires a samplesheet with the following format:

``````SampleID Type Replicate Peaks ReadDepth bigWigPath
A1 Control 1 peakA1.bb 16335812 A1.bw
A2 Control 2 peakA2.bb 18197248 A2.bw
B1 Test 1 peakB1.bb 24225586 B1.bw
B2 Test 2 peakB2.bb 12378544 B2.bw
``````

The following fields are required: "SampleID" containing sample IDs, "ReadDepth" specifying sequencing depth for each sample, "bigWigPath" and/or "h5FullPath" and/or "bamReads" specifying the path to count data files in bam, bigWig, or hdf5 format, respectively. Field "Peaks" is not required but can be used to specify the path to a bed or bigBed file (e.g., the path to a bigBed file with ChipSeq peaks). If "Peaks" is specified then also field "Type" is required.

You can load data using the samplesheet as follows:

``````    setwd(file.path(path.package("multiseq"),"extdata","sim"));
samplesheet <- file.path(path.package("multiseq"),"extdata","sim","samplesheet.sim.txt")
g <- factor(samples\$Type)
g <- match(g, levels(g))-1
if (noExecutable("wigToBigWig")){
data(example2, package="multiseq")
}else{
region      <- "chr1:154206209-154214400"
x           <- get.counts(samples, region)
}
``````

To smooth a subset of the data:

``````    par(mfrow=c(2,1), oma = c(5,4,0,0) + 0.1, mar = c(0,0,1,1) + 0.1)
for (i in which(g==0)) plot(log(x[i,]), type="l", xlab="Position", ylab="log(x)")
#plot estimated log baseline +- 2 s.d.
invisible(dev.off())
res0\$region <- region
plot(res0, fra=2, what="baseline")
``````

To estimate an effect given the covariate `g`:

``````    res <- multiseq(x=x, g=g, minobs=1, lm.approx=FALSE, read.depth=samples\$ReadDepth)
#plot estimated effect and s.d.
par(mfrow=c(2,1), oma = c(5,4,0,0) + 0.1, mar = c(0,0,1,1) + 0.1)
res\$region <- region
plot(res, fra=2, is.xaxis=FALSE)
transcripts   <- get.transcripts(file.path(path.package("multiseq"),"extdata","sim","hg19.ensGene.part.gp"), region)
plot(transcripts, region)
``````

### Visualizing input data using `samplesheetToTrackHub`

With function `samplesheetToTrackHub` you can create a Track Hub and visualize your input data in the UCSC Genome Browser. Before you use this function, follow installation instructions in the "Optional steps" paragraph above.

``````    setwd(file.path(path.package("multiseq"),"extdata","sim"))
hub_name <- "testMultiseq/sim"
samplesheetToTrackHub(samplesheet, hub_name, chr="chr1")
``````

Function `samplesheetToTrackHub` will create a Track Hub in folder /some/path/testMultiseq/sim/ and will print the following message:

``````go to http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hubUrl=https:some/address/testMultiseq/sim/hub.txt
center the genome browser on the region of interest
and make track visible
``````

If the read tracks or the bed files are large, make sure enough memory is available to run `simulationToTrackHub`.

This is a screenshot of the Track Hub in the UCSC Genome Browser:

## Visualizing output data using `multiseqToTrackHub`

After running `multiseq`, you can use function `multiseqToTrackHub` to create a Track Hub that can be visualized in the UCSC Genome Browser to display

• the effect +- a specified multiple (fra) of the posterior standard deviation
• the intervals where multiseq found strong effect (zero is outside of +/- a multiple of the posterior standard deviation ' e.g. 2 or 3).
``````    res\$region    <- region
res\$intervals <- get.effect.intervals(res, fra=2)
multiseqToTrackHub(res, fra=2, hub_name="testMultiseq/multiseq_sim")
``````

Function `multiseqToTrackHub` will create a Track Hub named multiseq_sim in the https:some/address/testMultiseq/ folder and will print the following message:

``````go to http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hubUrl=https:some/address/testMultiseq/multiseq_sim/hub.txt
center your genome browser around chr1:154206209-154214400
and make track visible
``````

By default `multiseqToTrackHub` uses the human genome hg19 but default settings can be changed by specifying additional arguments (see help). This is a screenshot of the Track Hub in the UCSC Genome Browser:

## Note

If the Genome Browser doesn't show the Track Hub try going to http://genome.ucsc.edu/cgi-bin/hgHubConnect/MyHubs and checking the Track Hub.