-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: Couldnt open GFF file #314
Comments
Hi Iva,
The point at which it fails indicates that GNU awk or sed are not available
on your system. Linux systems should be fine, but Unix systems (like OSX)
can have a different implementation installed by default with the OSX
installation instructions taking this into account. If you type 'awk
--version' and 'sed --version' it should look like:
$ awk --version
GNU Awk 3.1.8
Copyright (C) 1989, 1991-2010 Free Software Foundation.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
$ sed --version
GNU sed version 4.2.1
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.
GNU sed home page: <http://www.gnu.org/software/sed/>.
General help using GNU software: <http://www.gnu.org/gethelp/>.
E-mail bug reports to: <bug-gnu-utils@gnu.org>.
Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.
…On 7 March 2017 at 12:49, ivaatanas ***@***.***> wrote:
Hello! I am trying to use Roary to make a core-genome alignment for around
900 isolates of Pseudomonas aeruginosa. For each isolate I have an
assembled genome annotated with prokka. I did some test runs on 400 random
files and this works fine. When I try doing a run on the entire data set of
900 files, I get this error: Couldnt open GFF file at
/usr/local/share/perl5/Bio/Roary/ContigsToGeneIDsFromGFF.pm line 24.
This happens after 8 hours of the run and some files do get generated:
accessory_binary_genes.fa, _combined_files.groups,
blast_identity_frequency.Rtab, _inflated_mcl_groups, _clustered,
_inflated_unsplit_mcl_groups, _clustered.clstr, _labeled_mcl_groups,
clustered_proteins, _uninflated_mcl_groups,_combined_files.
The accessory_binary_genes.fa is the only empty file. Do you maybe know
what's the problem? Could it be that I have too many files I am trying to
run? Thank you :)
Iva
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#314>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABeV7AuWmlxuY9ELV_u4RGoi3WtMSg1ks5rjVJsgaJpZM4MVcIY>
.
|
Dear Andrew, Thank you very much on your fast reply! I am using Feodra, and my Awk version is 4.0.1. Sed is 4.2.1. So it looks like both of these are available on my system. Is the problem that Awk is not 3.1.8? Or it might be something else I have to change? Thank you again! Iva |
(In other words - my installation worked fine and I managed to do runs on up to 400 files. Now when I am trying to run 900 files, it gives the aforementioned error.) |
Yes indeed, since you've run it successfully before you installation is
working. Roary has been run on 10,000 genomes, so its not the size of the
dataset thats the issue. Do you have enough free disk space, or are the GFF
files you are working with on a network storage system?
…On 7 March 2017 at 13:39, ivaatanas ***@***.***> wrote:
(In other words - my installation worked fine and I managed to do runs on
up to 400 files. Now when I am trying to run 900 files, it gives the
aforementioned error.)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#314 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABeVzj_KFzk-wReVE1NRrTdNgg4Rt_6ks5rjV4mgaJpZM4MVcIY>
.
|
The GFF files I am working with are stored on my computer. Regarding the available memory, I will copy in the dc - h output: devtmpfs (available 7.8 G, mounted on /dev), tmpfs (avilable 7.8 G, mounted on /dev/shm), tmpfs (available 7.1 G, mounted on /run), tmpfs (available 7.8 G, mounted on /sys/fs/cgroup), /dev/sdb3 (available 30 G, mounted on /), tmpfs (available 7.8 G, mounted on /tmp), /dev/sdb5 (available 103 G, mounted on /tmp), /dev/sdb1 (available 297 M), mounted on /boot). I am running Roary on files in the /home directory,where I have 103 G available. |
A quick back of an envelope indicates your GFF files are about 13 GBytes in
size. If roary happens to be writing to anything other than the 103GB
partition then you will run into issues. Could you send me the raw output
of 'df -h' because something looks off about the layout of your disks.
|
I have 962 GFF files, which is 8.9 GB. Maybe it is also important to point out that I was running everything on 8 threads. Thank you again Andrew for replying so quickly! |
This could also help in solving the puzzle: In my previous runs I had 4 separate batches of gff files. I was running Roary with the mafft command for each of these batches, and it worked perfectly fine (size of core and other numbers from summary statistics look ok). So I know that all my gff files should be fine. Now I have to pull all of these 4 batches into one, and to run Roary on all 962 files together. This is where I get the error. |
It is most likely an issue of insufficient resources if smaller batches work fine and a combined larger batch does not. I would recommend trying to run it on a bigger machine (or VM on the Amazon cloud) with more RAM and disk space. |
Dear Andrew, Thank you again for the fast reply. I truly hope that this is the problem. I will try to get access to one of the servers at our department. I will get back to you and hopefuly close this question if the run goes fine. |
Dear Andrew, You think it would be possible to add Roary on Galaxy? I managed to get access to the CLIMB server and I would like to use it for running Roary on my dataset. My aplogies if this question was discussed somewhere else before. |
I'm afraid we dont use Galaxy, but if you want to integrate it, fire ahead. I use CLIMB as well and I find SSHing in works best for me. |
@ivaatanas Thanks to the great work of @Slugger70 Roary will be in Galaxy very soon. |
Hello! I am trying to use Roary to make a core-genome alignment for around 900 isolates of Pseudomonas aeruginosa. For each isolate I have an assembled genome annotated with prokka. I did some test runs on 400 random files and this works fine. When I try doing a run on the entire data set of 900 files, I get this error: Couldnt open GFF file at /usr/local/share/perl5/Bio/Roary/ContigsToGeneIDsFromGFF.pm line 24.
This happens after 8 hours of the run and some files do get generated: accessory_binary_genes.fa, _combined_files.groups, blast_identity_frequency.Rtab, _inflated_mcl_groups, _clustered, _inflated_unsplit_mcl_groups, _clustered.clstr, _labeled_mcl_groups, clustered_proteins, _uninflated_mcl_groups,_combined_files.
The accessory_binary_genes.fa is the only empty file. Do you maybe know what's the problem? Could it be that I have too many files I am trying to run? Thank you :)
Iva
The text was updated successfully, but these errors were encountered: