New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative alignment tool #40
Comments
Hi Andrea, It is possible to use .maf as input (I was planning to retire this functionality, but it is still in the code) - but the problem is that Ragout relies on properties of the Cactus alignments. In particular, alignment blocks should be non-intersecting - which is not true for most of the other aligners (since it is in fact a very hard problem to generate such alignments). What are your constraints? Cactus should take ~120 CPU days per mammalian genome, which translates into a few actual days on a single cluster node. There is a recent Sibelia update (https://github.com/medvedevgroup/SibeliaZ) that in fact should handle large genomes and is capable of producing the right maf alignment. I have not tested it yet - but you might try to use it to generate alignment and run Ragout with it. Best, |
Hi Mikhail, Am I right? |
You can use maf directly by adding an extra recipe parameter |
I've just tried to run Ragout on the alignments produced using SibeliaZ (maf format) using the genomes provided in example/E.Coli folder. I've changed the header of the respective fasta so that it shows the organism in the format "dh1.gi", "ms1655.seq1", etc. [16:09:31] DEBUG: >>With block size: 5000 ragout.log |
Looks like there are some caveats with using SibeliaZ. I will be testing it in the near future. |
Hi, The only thing that concerns me is this warning: Too few overlaps (18) between contigs were detected -- refine procedure will be useless Is there a way to improve this? Thank you in advance, Andrea |
Great, I'm glad that you got it working! Did it complain on the synteny block coverage?
This step is optional - it only works good with certain assemblers (like SPAdes) that output contigs with overlapping ends. We usually don't use |
hi,
then I tried to change the headers in a format like
but then that error appeared:KeyError: 'scaffold830' Do I name the headers still in a wrong way or does anybody have any ideas how to solve that problem? if helpful or necessary the code I used for ragout: ragout -o ragout_out -s maf -t 10 recipe_file_for_ragout.txt |
Hi, It's a bit tricky - the MAF alignment should have the genome prefixes, but the FASTA files - should not. I think in your case it will be enough to remove the prefixes from fasta, also make sure that genome names are matching with ones provided in the recipe. Let me know if this helps. Hopefully in the near future I will properly integrate SibeliaZ into the pipeline. |
Hi Mikhail, thank you so much for your reply! Ragout finished successfully after removing the prefixes from the fasta files. Best regards |
Dear Mikhail! I'm dealing with number of green algae genome. Ragout w/ Sibelia (single-threaded) is running for days. I tried to use SibeliaZ and got MAF alignments. My refences are from NCBI (GCF* GCA* FASTA files), target assembly made with SPAdes. What I need to do with input files (FASTA/alignment) to use them with Ragout? |
How to understand following error (debug don't give a hint)?
|
Ok, it seems that you have found the The sequence headers in MAF should be formatted as Sorry for the inconvenience! SibeliaZ support is in the plans, but I didn't get a chance to work on that yet. |
Good day,
I'm currently working on the assembly of a large mammalian genome. I would like to perform the reference-assisted scaffolding using ragout. However, I cannot run progressive cactus on the genomes involved due to computational constraints, and the assembly are simply too large to be efficiently processed with sibelia.
My question is: is there a way to create alignments in a way that is suitable for Ragout2?
For example:
pairwise lastz alignments -> combined multi-species MAF file -> maf2hal -> ragout2
Thank you in advance,
Andrea
The text was updated successfully, but these errors were encountered: