-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Negative values in functional annotations #442
Comments
No, I don't expect this to happen. |
Thank you for super fast reply! |
I see. Any way you can share the |
I will zip it, and send you a link to it. My command:SqueezeMeta.pl -m sequential -s samples.tsv -f ../../$fqs -extdb ../../mydb.list --nobins -b 8 -t 6 If it is important:Samples were quality controlled with TrimGalore and host DNA was removed with bowtie2 before SqueezeMeta run. |
I can see where the problem is. The ORF table (which we use to calculate the function abundances in SQMtools) has extra reads somewhere. The difference is not very big, but in this case you have very high mapping percentages (97% of your reads map back to the assembly), meaning that this small difference is enough to push the sum of mapped reads to the ORF table above 100% hence the negative values in the This only happens for the ORF table, the contig table has the right result. I wonder if this can be due to the presence of overlapping ORFs, that then make the same reads be counted twice... we'll look into that. Ignore this for now, results should be ok otherwise. However we'll look into this more carefully. |
Thank you for your reply! |
Leave it open, that way we don't forget to fix it! |
So after looking at this a bit more carefully, the problem is indeed due to the ORF table having more counts than the total number of reads present in the sample. This stems from the way we assign reads to features. As long as a read mapping to the contig partially maps to a given features (as defined in the gff file) that ORF will get a count. Therefore, when a single read maps to two different features, the ORF table will get two counts (even if there was really only one read). This will happen:
This is normally not a big deal in practice, but in your case the mapping percentage was so already so high (almost 98% in the project you sent me) that it was enough to push the total counts in the ORF table above the total number of reads. Not by a large margin (10721 out of 77 million reads) but enough to generate a negative number of Unmapped reads in the function tables. You can safely ignore the negative number (as long as it happens in the "Unmapped" row). We will remove that row from the function tables in the next version. We will consider whether to recommend using the number of mapped bases as a safer approach (SqueezeMeta also keeps track of that) but we think that the difference will be negligible in most cases. |
Thank you for detailed clarification. I would be very surprised if the issue was only in the overlapping ORFs since it is not very common and as far as I know mostly spreaded among viruses. But you explanation about closely located small functions and parallel counts from different tools solved the mystery for me. Now it is crystal clear and (most important for me) I do not need to rerun my analyses (just removed several Tb of raw data on the server). |
Dear developers.
I know that there is similar issue that was already closed: #401 (comment)
I got negative values in unmapped row with SqueezeMeta 1.4.1 (coassembly mode) and found above mentioned issue but the answer from there was to rerun the same samples with newer version of SqueezeMeta (1.5.1) and it was closed due to the lack of activity.
I installed SqueezeMeta 1.5.1 and also installed denovo all databases. I run my new dataset in the sequential mode, and pooled resulted outputs. Still, I have negative values in the 'unmapped' row.
Is it normal to have negative values in unmapped row?
Can I just remove this row to perform DESeq2 analysis?
Slice of KO abundances table is attached for an example.
combined.KO.abund.zip
And thanks again for the tool. It is very convenient to use.
The text was updated successfully, but these errors were encountered: