-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
checkm.out strange 0 completeness and marker genes for large bins #54
Comments
And same issue on another sample (for what it is worth atlas only found 5 bins for this sample). |
Hi George, I have some theories and it might be normal/expected behaviour from aviary, but first just need to confirm a few things:
Cheers, |
Turns out I was getting an issue like this metagenome-atlas/atlas#216 It was caused by the fact the TMPDIR set on my cluster was not in my home directory. Defining TMPDIR to my home directory in my slurm submission script solved the issue. George |
So the bin information produced by aviary was all as expected? |
No, not at all. Most of the bins had 90+% completeness when I re-ran it - the output looks good now. The error must have resulted in some output being written still. I'm not exactly sure how. George |
Okay, I'm going to re-open this then. I'll have to figure out if this is an aviary issue or not |
I'll upload an example "correct" output later if you would like, the issue seems to be related to how Snakemake sets the tmpdir resource. |
I haven't been able to reproduce this, all of the checkm results I've been testing seem to be correct. I've now added a kind of verification step where the completeness and contamination scores are reviewed by CheckM2 at the final stage and merged into the Aviary does output low completeness/high contamination bins that would generally be ignored by other binning algorithms in case you were noticing some of them |
I haven't had the issue since I set TMPDIR=" " before running aviary.Also as an aside, I am not sure that the full pipeline is running to completion for me yet, I don't have 'bin_info.tsv', 'coverm_abundances.tsv' or 'checkm_minimal.tsv' in my output bins directory, only 'checkm.out' and a symlink to the bins - so when I run aviary cluster it does not work. My application is little a bit unusual I guess in that I'm more interested in the bins themselves than the abundances/de-replication, so it is enough for me for now - I will wait until you're done implementing checkm2 and other fixes before I hassle you some more! |
Oh, that's odd. The full output should definitely be there, if you have time it would be helpful if you could search your log files for any errors towards the end of the pipeline that might be causing that. If not I'll see if I can also replicate that behaviour |
I might go ahead and close this issue, it does not seem to be reproducible at least with newer versions. Please reopen if it is still an issue |
Hi Rhys,
I was successfully able to run aviary very easily on a test sample (it was much easier to install, lorikeet issue notwithstanding, and runs a lot smoother for me than atlas for what it is worth).
t's awesome and I love the output from rosella so thanks for that too - the UMAPs make the bins very clear!
I am parsing the output files now and I noticed possible issues with the check.out output file. I have uploaded it. I have also uploaded the equivalent from atlas from the same sample. (fwiw, aviary found 3 extra small bins vs atlas, which is nice!)
Essentially, I think there must be some issue with checkm because I am getting 0 or near 0 completeness for bins that I am sure are quite complete (based off the atlas output).
9 of the bins are over 1MB, with 8 around 2MB, and also I have blasted large chunks of the contigs just to confirm that they are indeed the correct species/genera. But they all seem to have 0 completeness and 0 marker genes found in the checkm.out file, which seems very wrong to me. So I am thinking it is likely an issue with checkm.
checkm.out.txt
atlas_completeness.txt
George
The text was updated successfully, but these errors were encountered: