Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A limit in number of input files? #256

Closed
dianalaucw opened this issue Nov 9, 2023 · 2 comments
Closed

A limit in number of input files? #256

dianalaucw opened this issue Nov 9, 2023 · 2 comments
Assignees

Comments

@dianalaucw
Copy link

I planned to use panaroo on 2386 .gff files that is created by prokka from 2386 assembly files of 2386 strains. Each .gff files is in a separate directory with name {strain}_prokka. For instance, GU693__24742_1_80.gff is in directory GU693__24742_1_80_prokka.

I tried to run panaroo using this command panaroo -i /annotated_assembly/*prokka/*.gff -o /users/panaroo/ --clean-mode strict -t 16 with 16 cpus.
It gave me the following errors : /usr/bin/singularity: Argument list too long.

I wondered if there is a limit on size of input files. If so, would the approach of merging panaroo graphs works? For instance, we split the .gff files in some number of manageable parts and then use panaroo-merge to merge them. Yet, the examples of merge panaroo graphs only consist of two datasets(https://gtonkinhill.github.io/panaroo/#/merge/merge_graphs). Is it also workable on more than two? May I also know where is the documentation of panaroo-merge?

@nzmacalasdair
Copy link
Collaborator

nzmacalasdair commented Nov 9, 2023

Hello!

Merging panaroo graphs is one solution that will work, but panaroo will also accept as input a single text file, where each line of the input file is the path to a supported GFF file. Doing this should be much quicker and easier and running it separately on subsets and merging!

Graph merging was designed for very large (>10^4 isolate) datasets, so it shouldn't be necessary in this case. There is no limit to the number of isolates which can be input into panaroo, but practical limitations (ie runitime) with very large or very diverse datasets mean that in those cases, it is better to run subsets of the data and then merge.

(For completeness sake) Yes, it is possible to merge as many datasets as you would like, by providing all of the output directories with the -d argument to panaroo-merge. I'm afraid that the webpage you found is currently all the documentation for panaroo-merge, we are working to improve this.

Let me know if you have any problems with the list data input.

@nzmacalasdair nzmacalasdair self-assigned this Nov 15, 2023
@nzmacalasdair
Copy link
Collaborator

Closing this as it is hopefully resolved! Let me know if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants