Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What clusters end up in gene accessory_binary_genes.fa ? #225

Closed
tseemann opened this issue Jan 20, 2016 · 4 comments
Closed

What clusters end up in gene accessory_binary_genes.fa ? #225

tseemann opened this issue Jan 20, 2016 · 4 comments

Comments

@tseemann
Copy link
Contributor

The manual says:

First of all we construct a FASTA file with the binary presence and absence of genes, where 'A' means a gene is present and 'C' means it is absent. Only the first 4000 genes in the accessory genome are considered

Based on some data we have tried, it seems that singleton clusters do NOT end up in the file?

eg. 10 samples, mostly clonal, but 1 with a plasmid, causes tree to be mostly flat, no sites in the .fa file for the plasmid genes.

@tseemann
Copy link
Contributor Author

I've found this code: https://github.com/sanger-pathogens/Roary/blob/056512409fcb0e817cf16ae554792816b80b9356/lib/Bio/Roary/AccessoryBinaryFasta.pm

And it seems besides the 4000 gene limit, there is some 5% upper and lower bound, which i assume trims clusters that have membership numbers too low or too high?

Is there a way to script / parameter this from the command line tools?

@andrewjpage
Copy link
Member

Hi Torsten,

I wanted to cap the size of the file sent into FastTree since it can be
memory hungry. Running a few tests, I think I may have been a bit too
cautious here. My original thinking was to focus on getting the general
high level groupings in a reasonable order (hence getting rid of the top
and bottom 5%). I'll remove this restriction and see how things go.
Andrew

On 20 January 2016 at 01:56, Torsten Seemann notifications@github.com
wrote:

I've found this code:
https://github.com/sanger-pathogens/Roary/blob/056512409fcb0e817cf16ae554792816b80b9356/lib/Bio/Roary/AccessoryBinaryFasta.pm

And it seems besides the 4000 gene limit, there is some 5% upper and lower
bound, which i assume trims clusters that have membership numbers too low
or too high?

Is there a way to script / parameter this from the command line tools?


Reply to this email directly or view it on GitHub
#225 (comment)
.

@tseemann
Copy link
Contributor Author

Thanks!
Nullarbor now produces pan-genome trees and they seem to have more resolution now.

@jacorvar
Copy link

Hi @andrewjpage ,

is it already possible to get the accessory_binary_genes.fa from the gene_presence_absence.csv file using any script from the command line?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants