Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checkm.out strange 0 completeness and marker genes for large bins #54

Closed
gbouras13 opened this issue Jul 20, 2022 · 11 comments
Closed

checkm.out strange 0 completeness and marker genes for large bins #54

gbouras13 opened this issue Jul 20, 2022 · 11 comments

Comments

@gbouras13
Copy link

Hi Rhys,

I was successfully able to run aviary very easily on a test sample (it was much easier to install, lorikeet issue notwithstanding, and runs a lot smoother for me than atlas for what it is worth).

t's awesome and I love the output from rosella so thanks for that too - the UMAPs make the bins very clear!

I am parsing the output files now and I noticed possible issues with the check.out output file. I have uploaded it. I have also uploaded the equivalent from atlas from the same sample. (fwiw, aviary found 3 extra small bins vs atlas, which is nice!)

Essentially, I think there must be some issue with checkm because I am getting 0 or near 0 completeness for bins that I am sure are quite complete (based off the atlas output).

9 of the bins are over 1MB, with 8 around 2MB, and also I have blasted large chunks of the contigs just to confirm that they are indeed the correct species/genera. But they all seem to have 0 completeness and 0 marker genes found in the checkm.out file, which seems very wrong to me. So I am thinking it is likely an issue with checkm.

checkm.out.txt

atlas_completeness.txt

George

@gbouras13 gbouras13 changed the title check.out strange 0 completeness and marker genes for large bins checkm.out strange 0 completeness and marker genes for large bins Jul 20, 2022
@gbouras13
Copy link
Author

sample_2_checkm.out.txt

And same issue on another sample (for what it is worth atlas only found 5 bins for this sample).

@rhysnewell
Copy link
Owner

Hi George,

I have some theories and it might be normal/expected behaviour from aviary, but first just need to confirm a few things:

  • Does that checkm.out file contain all of the bins that aviary recovered or have you trimmed it down to only contain the strange bins?
  • Also, have you tried running checkm directly on those bins to see if the results are the same?

Cheers,
Rhys

@gbouras13
Copy link
Author

Turns out I was getting an issue like this metagenome-atlas/atlas#216

It was caused by the fact the TMPDIR set on my cluster was not in my home directory.

Defining TMPDIR to my home directory in my slurm submission script solved the issue.

George

@rhysnewell
Copy link
Owner

So the bin information produced by aviary was all as expected?

@gbouras13
Copy link
Author

No, not at all. Most of the bins had 90+% completeness when I re-ran it - the output looks good now.

The error must have resulted in some output being written still. I'm not exactly sure how.

George

@rhysnewell
Copy link
Owner

Okay, I'm going to re-open this then. I'll have to figure out if this is an aviary issue or not

@rhysnewell rhysnewell reopened this Jul 21, 2022
@gbouras13
Copy link
Author

I'll upload an example "correct" output later if you would like, the issue seems to be related to how Snakemake sets the tmpdir resource.

@rhysnewell
Copy link
Owner

I haven't been able to reproduce this, all of the checkm results I've been testing seem to be correct. I've now added a kind of verification step where the completeness and contamination scores are reviewed by CheckM2 at the final stage and merged into the bin_info.tsv file. But yeah, haven't seen anything weird.

Aviary does output low completeness/high contamination bins that would generally be ignored by other binning algorithms in case you were noticing some of them

@gbouras13
Copy link
Author

I haven't had the issue since I set TMPDIR="

" before running aviary.

Also as an aside, I am not sure that the full pipeline is running to completion for me yet, I don't have 'bin_info.tsv', 'coverm_abundances.tsv' or 'checkm_minimal.tsv' in my output bins directory, only 'checkm.out' and a symlink to the bins - so when I run aviary cluster it does not work.

My application is little a bit unusual I guess in that I'm more interested in the bins themselves than the abundances/de-replication, so it is enough for me for now - I will wait until you're done implementing checkm2 and other fixes before I hassle you some more!

@rhysnewell
Copy link
Owner

Oh, that's odd. The full output should definitely be there, if you have time it would be helpful if you could search your log files for any errors towards the end of the pipeline that might be causing that. If not I'll see if I can also replicate that behaviour

@rhysnewell
Copy link
Owner

I might go ahead and close this issue, it does not seem to be reproducible at least with newer versions. Please reopen if it is still an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants