New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
narrowPeak score (5th) column value above 1000 #123
Comments
Answering my own old question in case someone else lands here.. Today I observed (by chance) that the values of the 5th column in narrowPeak files are equal to the integer part of 9th column (-log10qvalue) multiplied by 10. I have created a PR #209 for improving the README description of the column, but it might be worthy rescaling the value to be in the [0-1000] range. |
Thank you very much! |
I tried what you say, but it did not work in my data. So maybe column 9 (-log10qvalue) is not proportional to column 5 (score).
|
@amitjavilaventura are you sure you are correctly loading your narrowPeak file in R? If I do the same, I can't reproduce the error and I do get the expected values:
If you think can't spot the error, can you perhaps post a few lines from your narrowPeak file? |
I have tried it once more, and I cannot find the error. The max value of columns 9 and 5, respectively, are the following ones:
This doesn't make sense, because both maximum values should be proportional. Nevertheless I just tried to do the same with the -log10(p-value) instead of the -log10(q-value) and it just worked. At least for the maximum values.
I suppose that your quote depends on what you use to "filter" the peaks in peakcalling: a threshold in the p-value or a threshold on the q-value.
I don't really need the information of the 5th column now, however it was quite annoying to discover that there was an "error" in the information, because theoretically, this value cannot surpass a value of 1000. Thank you for your help. |
Oh, I see. Right, that make sense. If you choose to use p-values (default) instead of q-values to apply any significance thresholds (or to filter like you said), the 5th column values are computed using p-values. I always filter on q-values, that's why I didn't notice the issue, but since the default is to filter on p-values, I agree this should be reflected in the README.md as well. Good catch! |
Thank you! Actually, I think that the default is q-value. I am not sure but, from MACS manual:
|
I have already started a request to change it in the README.md. |
Thank you for your input! Changes have been integrated into |
According to the UCSC Encode narrowPeak format, the fifth column (score) should be a value between 0-1000. Occasionally, MACS2 will produce peaks with a score value above 1000 (in our analysis, on average, this happens for less than 0.4% of the reported peaks). Still, some of the values go well beyond 1000, so I wonder how is the score computed in MACS2 and which is the score range used.
I found out this to be an issue when submitting data to the ENCODE project, which would not accept narrowPeaks files containing peaks with scores above 1000.
The text was updated successfully, but these errors were encountered: