-
Notifications
You must be signed in to change notification settings - Fork 13
Frequently Asked Questions
Use the 7th column in the reported narrowPeak file. Following the specification of the narrowPeak format, this should be the "signal" value. But in JAMM's output, it's actually the peak score.
"filtered.peaks.narrowPeak" is a subset of "all.peaks.narrowPeak". In the terminal output from JAMM, there are two lines at the very end of JAMM's terminal output, for example:
Minimum Peak Width Used to Produce Filtered List: 45
Minimum Peak Score Used to Produce Filtered List: 241.142054066168
So JAMM estimates a minimum peak width and a minimum score (column 7 in the narrowPeak file), then throws out all peaks that have a score or width lower than those numbers to give you the filtered list. Note though that this filtered list is NOT the "final" list of "highly-confident" peaks (see the next question).
JAMM is written to output a large number of peaks (even in the filtered list). However, the peaks are ranked (by the 7th column) and you can use this ranking to do reproducibility/consistency analysis and decide in a sound way how many peaks you should take (For example, use the IDR method).
JAMM is designed this way because the idea behind it is different from the usual: JAMM does not to develop a statistical test to threshold the peaks. JAMM gives back as many peaks as it can find and you can decide on the proper method to threshold your list if needed (my personal favourite is the IDR pipeline). In addition, there are cases when the downtream analysis program requires a ranked peaks list where the peaks towards the bottom are not confident (for example, this strategy is helpful for cERMIT (a motif finder).
Finally, since JAMMv1.0.6revX, users have the option to apply a signal-to-background fold enrichment cutoff to select only the "top ranking" peaks (see the documentation for "-e" parameter). However, like mentioned above, it is probably better to use something more theoretically sound like IDR.
If you have have a paired-end BAM file. You should first name sort it using samtools sort -n
. Then, convert the sorted BAM file to a BEDPE file using bedtools bamtobed -bedpe
. Make sure the resulting file ends in .bed
. JAMM will only recognize files that end in .bed
JAMM writes temporary information to desk, by default in the /tmp
directory. In some shared environments, this becomes problematic because the /tmp
directory is allocated small space. You can switch the temporary directory that JAMM uses very easily: Go to the JAMM.sh
file and edit line 79 from
wdir=$(mktemp -d)
to
wdir="/path/to/your/prefered/directory-$ran"
No. JAMM is not a deterministic program per se**, but for all practical reasons, it is deterministic. Since version 1.0.7rev1, all randomiztaion/sampling steps are done using the same seed. The seed used is reported in JAMM's output and users can change or randomize the seed (see Secondary JAMM parameters in the documentation).**
The main sources for variability are:
- Background sampling when there are no control files (example, no ChIP-Seq input): When JAMM is not given any control files, it samples a background distribution that is estimated from the average normalized-extended read counts over the entire chromosome. Of course every time you run JAMM the sampling will be slightly different. This will likely result in different number of peaks but should NOT affect the confident peaks towards the top of the ranked peak list (in other words, this should not affect the number of peaks you should select at the end, see the above question).
When control files are available, this is no longer an issue.
- Clustering model initialization: JAMM uses a mixture model clustering EM algorithm to refine the peaks in enriched windows. Any EM algorithm needs to be initialized. Since v1.0.5, JAMM initializes the model for each chromosome using at most 20 randomly chosen windows from the top quarter windows (in terms of number of reads) found in the chromosome. The reasoning behind this is that the model is then not skewed towards regions with extremely high read counts and can find "weaker" peaks.
In JAMMv1.0.4rev1 (the one used in publication), JAMM initializes the model using the top 10^-3 windows (all of them, no random selection!) in the chromosome.
Starting JAMMv1.0.7rev1, both approaches are available for the user to choose from (see the documentation for "-i" parameter).
This is NOT an error!
What you see with fragment length calculation is it was trying to estimate the fragment length but the calculation for this particular chromosome gave either a very small number (<60) or a very high one (>500), so it decided not to use that number. JAMM will tell you what fragment length it decided to use at the end of the fragment length calculation. If all chromosomes fail, it uses 100 by default.
Summary: this is not an error and it's very normal and expected.