Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diamond looking for wrong faa file #15

Closed
SheikGeomicro opened this issue Jul 30, 2021 · 7 comments
Closed

diamond looking for wrong faa file #15

SheikGeomicro opened this issue Jul 30, 2021 · 7 comments

Comments

@SheikGeomicro
Copy link

Hi,

I just downloaded gunc using a conda install (v 1.0.4) and and diamond keeps failing because it is looking for "merged.genecalls.faa" and prodigal has created called genes file "input_genome_bin.faa". my command I've tried is: gunc run -i input_genome_bin.fa -r path/to/database/gunc_db_progenomes2.1.dmnd -t 2.

Seems like this is hard coded in and not something we can specify? But I might be missing something.

Thanks in advance.

@fullama
Copy link
Contributor

fullama commented Jul 31, 2021

Hmm.. Could you maybe post the full error message? Is it possible that no genes were called?

@SheikGeomicro
Copy link
Author

SheikGeomicro commented Jul 31, 2021 via email

@fullama
Copy link
Contributor

fullama commented Aug 3, 2021

so what it looks like here is that diamond just doesnt map anything to the reference db.. (this merged.genecalls.faa is a temporary file that you wouldnt get to see) Are there genes in the gene calls file? ill try make the error message more clear for the next version..

@SheikGeomicro
Copy link
Author

So to test whether it was my genomes being too novel relative to the database, I ran gunc using an E. coli MG1655 genome. I'm still getting the same error message that it can't find the merged.genecalls.faa. file. The prodigal output is perfectly fine it just can't find the merged.genecalls.faa file.
Screen Shot 2021-08-03 at 8 58 18 AM

Interestingly, I ran it again and specified the tmp directory. Got a different output error (see below) but the merged.genecalls.faa file was created this time.

Screen Shot 2021-08-03 at 9 09 34 AM

Finally I specified an out directory just to see if it would help. But same error as above.

I'm not sure what's happening. gunc 1.0.4 is running in its own conda environment and so should be isolated from anything interfering with it.

@fullama
Copy link
Contributor

fullama commented Aug 3, 2021

that is weird.. so i ran that same ecoli genome and it ran fine for me.. could you be having a problem with your storage..?
what happens if you run with --temp_dir /dev/shm ?

@SheikGeomicro
Copy link
Author

I'm not sure whats happening.... I did a complete reinstall, including the database, and still same problem is happening. I tried giving it the protein calls from prodigal and still getting an error. The only thing I can think of is there some python module I'm missing on our server that's running CentOS? Other than that I've hit a brick wall!

@fullama
Copy link
Contributor

fullama commented Aug 5, 2021

ok so after more investigation: the reason you got a different behaviour when you specified a temp_dir above (--tmp_dir gunc_temp) was that the directory gunc_temp didnt exist.. I will make sure to add a check for that in the next version..

So at least we have ruled that part out and we are back to why diamond fails.. How much effort do you want to go to here..?

If you wanted to, you could replace your /home/sheikc/cssheik/anaconda3/envs/GUNC/lib/python3.9/site-packages/gunc/external_tools.py file with https://gist.github.com/fullama/d22863fcea9bbd1c66a691f8df990f5e
this will output a more detailed error message about why diamond failed..

..I will add better debugging output to the list of things for the next release..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants