Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Couldnt open GFF file #314

Closed
ivaatanas opened this issue Mar 7, 2017 · 14 comments
Closed

Error: Couldnt open GFF file #314

ivaatanas opened this issue Mar 7, 2017 · 14 comments

Comments

@ivaatanas
Copy link

Hello! I am trying to use Roary to make a core-genome alignment for around 900 isolates of Pseudomonas aeruginosa. For each isolate I have an assembled genome annotated with prokka. I did some test runs on 400 random files and this works fine. When I try doing a run on the entire data set of 900 files, I get this error: Couldnt open GFF file at /usr/local/share/perl5/Bio/Roary/ContigsToGeneIDsFromGFF.pm line 24.

This happens after 8 hours of the run and some files do get generated: accessory_binary_genes.fa, _combined_files.groups, blast_identity_frequency.Rtab, _inflated_mcl_groups, _clustered, _inflated_unsplit_mcl_groups, _clustered.clstr, _labeled_mcl_groups, clustered_proteins, _uninflated_mcl_groups,_combined_files.

The accessory_binary_genes.fa is the only empty file. Do you maybe know what's the problem? Could it be that I have too many files I am trying to run? Thank you :)

Iva

@andrewjpage
Copy link
Member

andrewjpage commented Mar 7, 2017 via email

@ivaatanas
Copy link
Author

Dear Andrew,

Thank you very much on your fast reply! I am using Feodra, and my Awk version is 4.0.1. Sed is 4.2.1. So it looks like both of these are available on my system. Is the problem that Awk is not 3.1.8? Or it might be something else I have to change?

Thank you again!

Iva

@ivaatanas
Copy link
Author

(In other words - my installation worked fine and I managed to do runs on up to 400 files. Now when I am trying to run 900 files, it gives the aforementioned error.)

@andrewjpage
Copy link
Member

andrewjpage commented Mar 7, 2017 via email

@ivaatanas
Copy link
Author

The GFF files I am working with are stored on my computer. Regarding the available memory, I will copy in the dc - h output: devtmpfs (available 7.8 G, mounted on /dev), tmpfs (avilable 7.8 G, mounted on /dev/shm), tmpfs (available 7.1 G, mounted on /run), tmpfs (available 7.8 G, mounted on /sys/fs/cgroup), /dev/sdb3 (available 30 G, mounted on /), tmpfs (available 7.8 G, mounted on /tmp), /dev/sdb5 (available 103 G, mounted on /tmp), /dev/sdb1 (available 297 M), mounted on /boot). I am running Roary on files in the /home directory,where I have 103 G available.

@andrewjpage
Copy link
Member

andrewjpage commented Mar 7, 2017 via email

@ivaatanas
Copy link
Author

memory.txt

@ivaatanas
Copy link
Author

I have 962 GFF files, which is 8.9 GB. Maybe it is also important to point out that I was running everything on 8 threads. Thank you again Andrew for replying so quickly!

@ivaatanas
Copy link
Author

This could also help in solving the puzzle: In my previous runs I had 4 separate batches of gff files. I was running Roary with the mafft command for each of these batches, and it worked perfectly fine (size of core and other numbers from summary statistics look ok). So I know that all my gff files should be fine. Now I have to pull all of these 4 batches into one, and to run Roary on all 962 files together. This is where I get the error.

@andrewjpage
Copy link
Member

It is most likely an issue of insufficient resources if smaller batches work fine and a combined larger batch does not. I would recommend trying to run it on a bigger machine (or VM on the Amazon cloud) with more RAM and disk space.

@ivaatanas
Copy link
Author

Dear Andrew,

Thank you again for the fast reply. I truly hope that this is the problem. I will try to get access to one of the servers at our department. I will get back to you and hopefuly close this question if the run goes fine.

@ivaatanas
Copy link
Author

Dear Andrew,

You think it would be possible to add Roary on Galaxy? I managed to get access to the CLIMB server and I would like to use it for running Roary on my dataset. My aplogies if this question was discussed somewhere else before.

@andrewjpage
Copy link
Member

I'm afraid we dont use Galaxy, but if you want to integrate it, fire ahead. I use CLIMB as well and I find SSHing in works best for me.

@andrewjpage
Copy link
Member

@ivaatanas Thanks to the great work of @Slugger70 Roary will be in Galaxy very soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants