Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input file format #3

Closed
tommathers opened this issue Feb 25, 2019 · 5 comments
Closed

Input file format #3

tommathers opened this issue Feb 25, 2019 · 5 comments

Comments

@tommathers
Copy link

Hi,

Can you confirm how the inout genomes should be formatted? Does each genome need to be in the same fasta file? If all genomes are in the same file how are genomes multiple contigs dealt with? Will the program identify alignments within each input genome as well as between genomes?

Thanks,

Tom.

@iminkin
Copy link
Collaborator

iminkin commented Feb 25, 2019

Yes, all sequences should be in the same file and should have unique IDs: alignments are reported with respect to the IDs in FASTA files. SibeliaZ is oblivious to the fact that sequences may come from different genomes and will try to find alignments within each sequence as well as between sequences. So far a contig is treated like a chromosome.

Thanks for your questions, I will update the README to reflect these features, it is quite important.

@tommathers
Copy link
Author

Thanks for the explanation. To generate an output similar to cactus (after conversion to hal) would I need to parse the MAF file to remove self alignments? For example, by adding a species ID to fasta headers?

Cheers,

Tom.

@iminkin
Copy link
Collaborator

iminkin commented Feb 28, 2019

Yes, you will have to do it manually.

@tommathers
Copy link
Author

Hi,

Sorry if this is a daft question, but I am struggling to make the sibeliaz maf file compatible with downstream tools. msa_view from phast and also the hal tools expect the maf file to be organised relative to a reference species (set by the first id in the alignment with the format "species.chromosome"). I think this is a standard convention for maf?

Is it possible to make sibeliaz output alignments relative to a reference species or failing that, do you know any tools that can read in the sibelia output and set a reference species?

I am testing on 3 species with ~400 mb genomes, with one species assembled into chromosomes and the others with thousands of contigs.

Thanks,

Tom.

@iminkin
Copy link
Collaborator

iminkin commented Feb 28, 2019

@tommathers,

Can you show an example of a MAF alignment organised relative to a reference so I can adjust output of SibeliaZ?

@iminkin iminkin closed this as completed Aug 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants