This section should give you an overview of how to do many common tasks. We're using screenshots from Galaxy here. If you're using the command-line version you can easily follow the given examples since the vast majority of parameters is either indicated in Galaxy, too. Otherwise, just type the program name and the help option (e.g. /deepTools/bin/bamCoverage --help
), which will show you all the parameters and options available. Alternatively, you can follow the respective link to the tool documentation here on readthedocs.
Note
Do let us know if you spot things that are missing or should be explained better! Just send an email to deeptools@googlegroups.com.
All protocols assume that you have uploaded your files into a Galaxy instance with a deepTools installation, e.g., deepTools Galaxy. If you need help to get started with Galaxy in general, e.g. to upload your data, see help_galaxy_intro
and help_galaxy_dataup
.
Tip
If you would like to try out the protocols with sample data, go to deepTools Galaxy --> "Shared Data" --> "Data Libraries" --> "deepTools Test Files". Simply select BED/BAM/bigWig files and click, "to History". You can also download the test data sets to your computer by clicking "Download" at the top.
How to do...?
- tool:
tools/bamCoverage
- input: your
BAM
file with aligned reads
Of course, you could also look at your BAM file in the genome browser. However, generating a bigWig file of read coverages will not drastically reduce the size of the file, it also allows you to normalize the coverage to 1x sequencing depth, which makes a visual comparison of multiple files more feasible.
Typically, you're going to be interested in the correlation of the read coverages for different replicates and different samples. What you want to see is that replicates should correlate better than non-replicates. The ENCODE consortium recommends that for messenger RNA, (...) biological replicates [should] display 0.9 correlation for transcripts/features. For more information about correlation calculations, see the background description for tools/plotCorrelation
.
- tools:
tools/multiBamSummary
followed bytools/plotCorrelation
- input: BAM files
- you can compare as many samples as you want, though the more you use the longer the computation will take
Tip
If you would like to do a similar analysis based on bigWig files, use the tool multiBigwigSummary
instead.
- input:
BAM
file - use the tool
tools/computeGCBias
on that BAM file (default settings, just make sure your reference genome and genome size are matching)
- have a look at the image that is produced and compare it to the examples
here <computeGCBias_example_image>
if your sample shows an almost linear increase in exp/obs coverage (on the log scale of the lower plot), then you should consider correcting the GC bias - if you think that the biological interpretation of this data would otherwise be compromised (e.g. by comparing it to another sample that does not have an inherent GC bias)
- the GC bias can be corrected with the tool
tools/correctGCBias
using the second output of the computeGCbias tool that you had to run anyway
- the GC bias can be corrected with the tool
Warning
correctGCbias
will add reads to otherwise depleted regions (typically GC-poor regions), that means that you should not remove duplicates in any downstream analyses based on the GC-corrected BAM file. We therefore recommend removing duplicates before doing the correction so that only those duplicate reads are kept that were produced by the GC correction procedure.
- input: you need two BAM files, one for the input and one for the ChIP-seq experiment
- tool:
tools/bamCompare
with ChIP = treatment, input = control sample
- tool:
tools/plotFingerprint
- input: as many BAM files of ChIP-seq samples as you'd like to compare (it is helpful to include the input control to see what a hopefully non-enriched sample looks like)
Tip
For more details on the interpretation of the plot, see tools/plotFingerprint
or select the tool within the deepTools Galaxy and scroll down for more information.
How do I get a (clustered) heatmap of sequencing-depth-normalized read coverages around the transcription start site of all genes?
- tools:
tools/computeMatrix
, thentools/plotHeatmap
- inputs:
- 1
bigWig
file of normalized read coverages (e.g. the output oftools/bamCoverage
ortools/bamCompare
) - 1
BED
or INTERVAL file of genes, e.g. obtained through Galaxy via "Get Data" --> "UCSC main table browser" --> group: "Genes and Gene Predictions" --> (e.g.) "RefSeqGenes" --> send to Galaxy (see screenshots below)
- 1
- use
tools/computeMatrix
with the bigWig file and the BED file - indicate
reference-point
(and whatever other option you would like to tune, see screenshot below)
- use the output from
tools/computeMatrix
withtools/plotHeatmap
- if you would like to cluster the signals, choose
k-means clustering
(last option of "advanced options") with a reasonable number of clusters (usually between 2 to 7)
- if you would like to cluster the signals, choose
- use the output from
How can I compare the average signal for X-specific and autosomal genes for 2 or more different sequencing experiments?
Make sure you're familiar with computeMatrix and plotProfile before using this protocol.
- tools:
- Filter data on any column using simple expressions
- computeMatrix
- plotProfile
- (plotting the summary plots for multiple samples)
- inputs:
- several bigWig files (one for each sequencing experiment you would like to compare)
- two BED files, one with X-chromosomal and one with autosomal genes
- download a full list of genes via "Get Data" --> "UCSC main table browser" --> group:"Genes and Gene Predictions" --> tracks: (e.g.) "RefSeqGenes" --> send to Galaxy
filter the list twice using the tool "Filter data on any column using simple expressions"
- first use the expression: c1=="chrX" to filter the list of all genes --> this will generate a list of X-linked genes
- then re-run the filtering, now with c1!="chrX", which will generate a list of genes that do not belong to chromosome X (!= indicates "not matching")
use
tools/computeMatrix
for all of the signal files (bigWig format) at once- supply both filtered BED files (click on "Add new regions to plot" once) and label them
- indicate the corresponding signal files
now use
tools/plotProfile
on the resulting file- important: display the "advanced output options" and select "save the data underlying the average profile" --> this will generate a table in addition to the summary plot images