Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird-looking plots #24

Closed
DarwinAwardWinner opened this issue Oct 4, 2016 · 5 comments
Closed

Weird-looking plots #24

DarwinAwardWinner opened this issue Oct 4, 2016 · 5 comments

Comments

@DarwinAwardWinner
Copy link

When I run idr on peak lists produced by MACS on my data, I get some bizarre-looking plots:

idrValues.png

The plots look like fractal versions of the typical IDR plots shown here: http://ccg.vital-it.ch/var/sib_april15/cases/landt12/idr.html
Typical IDR plots

Do you have any idea what might be causing this? I'm invoking the script as:

idr --samples sample1.narrowPeak sample2.narrowPeak \
    --peak-list oraclepeaks.narrowPeak --input-file-type narrowPeak \
    --output-file idrValues.txt --output-file-type narrowPeak \
    --log-output-file idr.log --plot --random-seed 1986

I can share my peak files if you want them.

@DarwinAwardWinner
Copy link
Author

Here's a link to example files that priduce plots like the above: https://www.dropbox.com/sh/k2193eqe1j8qun9/AAASAJG9BkzXHXPDHKdlLVhha?dl=0

@DarwinAwardWinner
Copy link
Author

DarwinAwardWinner commented Oct 5, 2016

Ok, I think I know what the problem is. The peak caller I'm using (MACS2) is returning lots of identical enrichment scores, which means that peaks with those scores are essentially sorted randomly, throwing off the IDR algorithm. The patterns of identical scores exactly match the stair-step patterns seen in the top plots.

@DarwinAwardWinner
Copy link
Author

DarwinAwardWinner commented Oct 5, 2016

Here's a look at an example plot for one sample's peak call scores vs rank: qqplot-score-vs-rank

@DarwinAwardWinner
Copy link
Author

It turns out that the answer was to use the -log10(p-value) column instead of score or signal value, since this column seems to have the greatest number of unique values for MACS2. In contrast, for Epic, the column with the most unique values is score. So the lesson is to look out your peak output and figure out which potential ranking column has the fewest duplicates.

@DarwinAwardWinner
Copy link
Author

Also, I think the above plots look weird partially because all the red points have black outlines. So in areas of high point density, the red points look black because all you see are the black outlines.

(Also also: MACS2 outputs up to millions of peaks if you let it, so one should filter to only the best 150k or so, or else idr will take forever to run.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant