This program will find and stitch together exons from targeted assemblies using amino acid targets and DNA assemblies.
- a text file of all your taxon names
- a fasta file of all the genes in amino acids
- the program exonerate -- https://www.ebi.ac.uk/about/vertebrate-genomics/software/exonerate-user-guide
The taxon and gene names must correspond to the assembly file names. For example the assemblies from atram2.0 will produce a fasta file with the gene name followed by the taxon name. The taxon names and the gene names in the amino acid file must correspond.
This program will group all of the taxa for each gene together. It will use the program exonerate and your amino acid reference file to find exon positions in the assemblies. It will then stitch those exons together. If there is a missing exon then the script will add groups of 3 NNNs in the missing places so that in the end, there will be 1 file for each gene with all of the exons for each taxon and with NNNS in the missing pieces. Therefore all of the exon lengths for each taxa are roughly the same, barring indels and ready for alignment steps.