narrowPeak score (5th) column value above 1000 #123

alexbarrera · 2016-03-30T20:07:44Z

According to the UCSC Encode narrowPeak format, the fifth column (score) should be a value between 0-1000. Occasionally, MACS2 will produce peaks with a score value above 1000 (in our analysis, on average, this happens for less than 0.4% of the reported peaks). Still, some of the values go well beyond 1000, so I wonder how is the score computed in MACS2 and which is the score range used.

I found out this to be an issue when submitting data to the ENCODE project, which would not accept narrowPeaks files containing peaks with scores above 1000.

alexbarrera · 2017-08-16T16:08:31Z

Answering my own old question in case someone else lands here..

Today I observed (by chance) that the values of the 5th column in narrowPeak files are equal to the integer part of 9th column (-log10qvalue) multiplied by 10. int(-10*log10qvalue)

I have created a PR #209 for improving the README description of the column, but it might be worthy rescaling the value to be in the [0-1000] range.

mortunco · 2018-04-26T15:56:15Z

Thank you very much!

amitjavilaventura · 2019-07-31T09:17:00Z

I tried what you say, but it did not work in my data. So maybe column 9 (-log10qvalue) is not proportional to column 5 (score).

head(as.integer(m_WTplusWNT_1b_narrow4$V9*10)==(m_WTplusWNT_1b_narrow4$V5))

[1] FALSE FALSE FALSE FALSE FALSE FALSE

alexbarrera · 2019-07-31T21:54:13Z

@amitjavilaventura are you sure you are correctly loading your narrowPeak file in R? If I do the same, I can't reproduce the error and I do get the expected values:

narrowPeak <- read.table('A549.BCL3.dex.00h.rep1.dedup.masked.sorted_peaks.narrowPeak', sep='\t')
head(as.integer(narrowPeak$V9*10) == narrowPeak$V5)
TRUE TRUE TRUE TRUE TRUE TRUE

If you think can't spot the error, can you perhaps post a few lines from your narrowPeak file?

amitjavilaventura · 2019-08-01T08:12:23Z

I have tried it once more, and I cannot find the error.
I load the file
> m_APC_1b_narrow4 <- read.table("myfile.narrowPeak", header = FALSE, sep = "\t")

The max value of columns 9 and 5, respectively, are the following ones:

> max(m_APC_1b_narrow4$V9)
[1] 2417.319
> max(m_APC_1b_narrow4$V5)
[1] 24267

This doesn't make sense, because both maximum values should be proportional.

Nevertheless I just tried to do the same with the -log10(p-value) instead of the -log10(q-value) and it just worked. At least for the maximum values.

> max(m_APC_1b_narrow4$V8)
[1] 2426.74
> max(m_APC_1b_narrow4$V5)
[1] 24267

I suppose that your quote depends on what you use to "filter" the peaks in peakcalling: a threshold in the p-value or a threshold on the q-value.

> head(as.integer(m_APC_1b_narrow4$V8*10)==(m_APC_1b_narrow4$V5))
[1] TRUE TRUE TRUE TRUE TRUE TRUE

I don't really need the information of the 5th column now, however it was quite annoying to discover that there was an "error" in the information, because theoretically, this value cannot surpass a value of 1000.

Thank you for your help.

alexbarrera · 2019-08-01T18:30:58Z

Oh, I see. Right, that make sense. If you choose to use p-values (default) instead of q-values to apply any significance thresholds (or to filter like you said), the 5th column values are computed using p-values. I always filter on q-values, that's why I didn't notice the issue, but since the default is to filter on p-values, I agree this should be reflected in the README.md as well. Good catch!

amitjavilaventura · 2019-08-02T07:41:31Z

Thank you!

Actually, I think that the default is q-value. I am not sure but, from MACS manual:

-q/--qvalue
The qvalue (minimum FDR) cutoff to call significant regions. Default is 0.05. For broad marks, you can try 0.05 as cutoff. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
-p/--pvalue
The pvalue cutoff. If -p is specified, MACS2 will use pvalue instead of qvalue.

amitjavilaventura · 2019-08-02T07:57:50Z

I have already started a request to change it in the README.md.

taoliu · 2019-08-13T19:39:02Z

Thank you for your input! Changes have been integrated into master branch. Will be included in the next release.

taoliu added a commit that referenced this issue Aug 13, 2019

fix description on 5th column of narrowPeak file. Related to #123

e3acb1a

taoliu added the Bug Report label Aug 13, 2019

taoliu mentioned this issue Aug 13, 2019

Chore documents: Update description on min-length, max-gap, and 5th column of narrowPeak #301

Merged

taoliu closed this as completed Aug 13, 2019

donaldcampbelljr mentioned this issue Feb 22, 2024

Specify different flavors of BED databio/bedboss#34

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

narrowPeak score (5th) column value above 1000 #123

narrowPeak score (5th) column value above 1000 #123

alexbarrera commented Mar 30, 2016

alexbarrera commented Aug 16, 2017

mortunco commented Apr 26, 2018

amitjavilaventura commented Jul 31, 2019

alexbarrera commented Jul 31, 2019

amitjavilaventura commented Aug 1, 2019

alexbarrera commented Aug 1, 2019 •

edited

amitjavilaventura commented Aug 2, 2019

amitjavilaventura commented Aug 2, 2019

taoliu commented Aug 13, 2019

narrowPeak score (5th) column value above 1000 #123

narrowPeak score (5th) column value above 1000 #123

Comments

alexbarrera commented Mar 30, 2016

alexbarrera commented Aug 16, 2017

mortunco commented Apr 26, 2018

amitjavilaventura commented Jul 31, 2019

alexbarrera commented Jul 31, 2019

amitjavilaventura commented Aug 1, 2019

alexbarrera commented Aug 1, 2019 • edited

amitjavilaventura commented Aug 2, 2019

amitjavilaventura commented Aug 2, 2019

taoliu commented Aug 13, 2019

alexbarrera commented Aug 1, 2019 •

edited